STinC++ rewrites the programs in Software Tools in Pascal using C++
Suppose that you need to print a text file containing horizontal tab characters on a device that cannot interpret tabs. As a first approximation, you might be content with fixed tab stops every four columns … A tab character is thus replaced by from one to four spaces.
Software Tools shows its age properly for the first time here. Back in 1981 this was a genuine problem, and even 10 or more years later it was entirely possible send a printer into spasms if you weren’t careful. These days of course, printers behave perfectly all the time and never give anyone any bother at all.
While printing might be a solved problem, the question of tabs or spaces in source code remains an eternal problem. The motivation might have changed, but the tool remains relevant.
This final program in Software Tools' first chapter also turns the complexity screw another notch or two. I’m not going to reproduce the whole of the code, but here’s the core of the now familiar read/process/output loop
col := 1
while (getc(c) <> ENDFILE) do
if (c = TAB) then
repeat
putc(BLANK)
col := col + 1
until (tabpos(col, tabstops))
else if (c = NEWLINE) then begin
putc(NEWLINE)
col := 1
end
else begin
putc(c);
col := col + 1
end
end;
We’ve got all the small parts Kernighan and Plauger have built up over the previous four programs - reading and outputing characters, counting characters, special processing on new lines. The new element is that repeat/until
loop, which converts each tab by outputting the requisite number of spaces.
In my C++ programs, I’ve avoided direct output in the body of the code. I’ve used functions that return a value, or written out through iterators. Here, after reading a character I might need to write one or more characters in response. Or, put in a more C++y way, we need to convert a sequence of characters into a sequence of strings. Rummaging once again in the algorithm header leads us to std::transform
, which applies a given function to a sequence and stores the result in another sequence. In other languages you might know this as map
or apply
. JavaScript’s Array.prototype.map
, for example, returns a new array filled with the results of calling the provided function on each element of the calling array. The start of the my C++ implementation is then
std::transform(
std::istreambuf_iterator<char>(in),
std::istreambuf_iterator<char>(),
std::ostream_iterator<std::string>(out),
... something ...
);
The … something … could be a function, a lambda, or a function object. Because the transformation I’m doing here needs a little bit of state - the action performed depends not just on the current character but also where we are in the line - I used a little function object.
detab.cpp
#include "detab.h"
#include "../../lib/chars.h"
#include "../../lib/tab_stops.h"
#include <algorithm>
#include <iostream>
#include <iterator>
#include <string>
struct detabber {
size_t position_;
std::string operator()(char c) { (3)
if (stiX::istab(c)) {
const auto spaces = stiX::distance_to_next_tab_stop(position_); (4)
position_ += spaces;
return std::string(spaces, ' ');
}
if (stiX::isnewline(c)) (5)
position_ = 0;
else
++position_;
return std::string(1, c); (6)
}
};
namespace stiX {
void detab(std::istream &in, std::ostream &out) {
std::transform(
std::istreambuf_iterator<char>(in),
std::istreambuf_iterator<char>(),
std::ostream_iterator<std::string>(out), (1)
detabber() (2)
);
}
}
The implementation is straightforward, and not hugely dissimilar to the Pascal original.
-
I’m reading the input as a stream of characters but outputting strings. The std::istreambuf_iterator<char>
provides that unformatted input, while std::ostream_iterator<std::string>
gives the appropriate output sink. I can’t recall ever having this in C++ before, but I certainly have in other languages which are more relaxed about types.
-
Create a detabber
object that will perform the transformation. Because detabber is a simple object there’s no need for me to write a constructor, as the default behaviour - initialise position_
to 0 - is perfectly fine. std::transform
takes the transformer argument by value, so you might expect this newly constructed detabber
object to be copied as it’s passed in. However in situations like this, where we’re creating a object that has no lifetime beyond this one use, modern compilers are more than clever enough to omit that copy entirely.
-
operator()
is the function call operator, and is the defining feature of a C++ function object. Note that the parameter type char
matches the underlying input iterator type, while the return type std::string
matches the underlying output iterator type. For each character on the input, std::transform
will call operator()
with that character, expecting a std::string
in return.
-
If the provided character c
is a tab, calculate where the next tab stop is and return a string of spaces to pad to it.
-
If c
is not a tab but is a newline reset position_
, otherwise just bump along to the next position.
-
Return c
as a string. Creating a std::string
for every character we see may not be ideal from a performance point of view (even though modern compilers are not only clever enough but actually required to omit copies here). However, I confess to being fairly relaxed on that front. Converting tabs to spaces is unlikely to ever be a time critical activity, and I’d rather think about getting it correct first and worry about performance if I have to. I may be atypical, but performance in time or space has rarely been in a problem I’ve had to deal with.