STinC++ rewrites the programs in Software Tools in Pascal using C++
Software Tools In Pascal’s first chapter races you through five programs, each a little bit more complex than the last, giving you a nice little handful of tools and a tour of the Pascal language. With that extended preamble out of the way, chapter two focuses on filters, programs which read from standard input and write to standard output, making some useful change to the stream of data on the way through. Filters are unremarkable today, so widely has the Unix philosophy become the default model of computing. At the time Kernighan and Plauger were writing it did warrant a brief explanation and this remarkably understated bibliographic note.
The idea of pipelines and filters - a uniform mechanism for connecting the input of one program to the output of another - was conceived by MD McIlroy … Pipelines have an especially clean implementation in the UNIX operating system developer at Bell Labs by KL Thompson and DM Richie. Here the processes in the pipeline run concurrently, with sufficient interlocks to ensure that the receiving process never reads information that isn’t there, and that the sending process never creates too much information before the receiver has read some. The vertical bar is the UNIX syntax for a pipeline and is due to Thompson.
When I first read this, I didn’t really register the enormity of what was being described in this little footnote. Now, many years later, I realise that sentence about concurrent processes must have been pretty jolting for people raised on batch processing and job control languages.
Anyway, to the program. Having replaced tabs with spaces, our authors to take no side in that eternal debate and present entab
, a program to replace tabs with spaces. They give some reason for wanting to do this, again relating to formatting for a particular type of printer that seems incredibly arcane in these days of wireless printer/scanner combos (like the miraculous the Canon MG5750 which genuinely required absolutely no configuration at all to work with all the various computers scattered round the JezUK High Performance Computing Lab).
Kernighan and Plauger foreshadow the further complexity of the upcoming tasks while providing specific advice for implementing entab
The trick to getting most filters right is to find an orderly way of recognizing the components of the input stream, so that the order can be reflected in the flow of control of the program rather than in a collection of switches and flags
… If we think of the input to entab as a repetition of the pattern: zero or more blanks, followed by a non-blank character, then this determines the control structure of the program.
I described the operation of detab
in terms of a sequences, and Kerninghan and Plauger describe entab
in a similar way here. As with detab
, I’m using std::transform
at the core of my program, plugging in a function object. Writing entab
was actually a little more subtle than I’d expected, although with the benefit of hindsight if I’d put together a better set of initial test cases I’d have found it easier.
entab.cpp
class entabber {
public:
std::string operator()(char c) {
if (stiX::isspace(c)) { (2)
++spaces_;
if (stiX::is_tab_stop(spaces_)) {
position_ = spaces_;
return tab; (3)
}
return empty; (4)
}
std::string output = (spaces_ != position_)
? std::string(spaces_ - position_, ' ')
: empty; (5)
output += c;
if (stiX::isnewline(c)) (6)
position_ = 0;
else
++position_;
spaces_ = position_; (7)
return output;
}
private:
size_t position_;
size_t spaces_;
std::string const tab = "\t";
std::string const empty;
};
namespace stiX {
void entab(std::istream &in, std::ostream &out) {
filter(in, out, entabber()); (1)
}
}
-
filter
is a new utility function that wraps up the whole std::tranform
/std::istreambuf_iterator<char>(in)
/std::ostream_iterator<std::string>(out)
mouthful into something a little easier on the eye, and gives it a nice descriptive name.
As with detab
, we need to track some state as we go, so I need function object. entabber
has two member variables - position_
which tracks where we are in the current line, and spaces_
which runs ahead of position_
as new spaces arrive. Initially, both these variables are 0.
-
So here’s our function object, considering each character in the sequence in turn. First question - is it a space? If it is, bump spaces_ forward one.
-
Have we now accumulated enough spaces to arrive at a tab stop? If so, advance position_
and return a tab.
-
We didn’t hit a tab stop, so we’re somewhere in the middle. Return nothing.
-
We’ll reach here only when we’ve got something other than a space. If spaces_
has advanced ahead of position_
we’ve got some spaces buffered up we need to print first, so pop those into our output
, before adding the character itself.
-
At the end of the line? Reset our position_
to the start of the row.
-
Whatever we’ve done up to now, we’ve got nothing buffered up so reset spaces_
to the current position.
I chose to advance spaces_
ahead of the current position, rather then simply keeping count of the buffered spaces. It makes little difference either way - at some point in the code there will be an ugly point where you have to do a bit of arithmetic involving position_
and spaces_
.
Those with particularly sharp memories will notice I’ve declared entabber
as a class
, when previously for detabber
I used a struct
. This little video from Jonathan Boccara was instrumental in changing my mind.