Jez Higgins

Freelance software generalist
software created
extended or repaired


Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About

Monday 07 October 2019 The Forest Road Reader, No 2.31 : STinC++ - entab

Software Tools In Pascal’s first chapter races you through five programs, each a little bit more complex than the last, giving you a nice little handful of tools and a tour of the Pascal language. With that extended preamble out of the way, chapter two focuses on filters, programs which read from standard input and write to standard output, making some useful change to the stream of data on the way through. Filters are unremarkable today, so widely has the Unix philosophy become the default model of computing. At the time Kernighan and Plauger were writing it did warrant a brief explanation and this remarkably understated bibliographic note.

The idea of pipelines and filters - a uniform mechanism for connecting the input of one program to the output of another - was conceived by MD McIlroy …​ Pipelines have an especially clean implementation in the UNIX operating system developer at Bell Labs by KL Thompson and DM Richie. Here the processes in the pipeline run concurrently, with sufficient interlocks to ensure that the receiving process never reads information that isn’t there, and that the sending process never creates too much information before the receiver has read some. The vertical bar is the UNIX syntax for a pipeline and is due to Thompson.

When I first read this, I didn’t really register the enormity of what was being described in this little footnote. Now, many years later, I realise that sentence about concurrent processes must have been pretty jolting for people raised on batch processing and job control languages.

Anyway, to the program. Having replaced tabs with spaces, our authors to take no side in that eternal debate and present entab, a program to replace tabs with spaces. They give some reason for wanting to do this, again relating to formatting for a particular type of printer that seems incredibly arcane in these days of wireless printer/scanner combos (like the miraculous the Canon MG5750 which required absolutely no configuration at all to work with all the various computers scattered round the JezUK High Performance Computing Lab).

Kernighan and Plauger foreshadow the further complexity of the upcoming tasks while providing specific advice for implementing entab

The trick to getting most filters right is to find an orderly way of recognizing the components of the input stream, so that the order can be reflected in the flow of control of the program rather than in a collection of switches and flags …​ If we think of the input to entab as a repetition of the pattern: zero or more blanks, followed by a non-blank character, then this determines the control structure of the program.

I described the operation of detab in terms of a sequences, and Kerninghan and Plauger describe entab in a similar way here. As with detab, I’m using std::transform at the core of my program, plugging in a function object. Writing entab was actually a little more subtle than I’d expected, although with the benefit of hindsight if I’d put together a better set of initial test cases I’d have found it easier.

entab.cpp
class entabber {
public:
    std::string operator()(char c) {
        if (stiX::isspace(c)) { (2)
            ++spaces_;

            if (stiX::is_tab_stop(spaces_)) {
                position_ = spaces_;
                return tab; (3)
            }

            return empty; (4)
        }

        std::string output = (spaces_ != position_)
                ? std::string(spaces_ - position_, ' ')
                : empty; (5)
        output += c;

        if (stiX::isnewline(c)) (6)
            position_ = 0;
        else
            ++position_;
        spaces_ = position_; (7)

        return output;
    }
private:
    size_t position_;
    size_t spaces_;
    std::string const tab = "\t";
    std::string const empty;
};

namespace stiX {
    void entab(std::istream &in, std::ostream &out) {
        filter(in, out, entabber());  (1)
    }
}
  1. filter is a new utility function that wraps up the whole std::tranform/std::istreambuf_iterator<char>(in)/std::ostream_iterator<std::string>(out) mouthful into something a little easier on the eye, and gives it a nice descriptive name. As with detab, we need to track some state as we go, so I need function object. entabber has two member variables - position_ which tracks where we are in the current line, and spaces_ which runs ahead of position_ as new spaces arrive. Initially, both these variables are 0.

  2. So here’s our function object, considering each character in the sequence in turn. First question - is it a space? If it is, bump spaces_ forward one.

  3. Have we now accumulated enough spaces to arrive at a tab stop? If so, advance position_ and return a tab.

  4. We didn’t hit a tab stop, so we’re somewhere in the middle. Return nothing.

  5. We’ll reach here only when we’ve got something other than a space. If spaces_ has advanced ahead of position_ we’ve got some spaces buffered up we need to print first, so pop those into our output, before adding the character itself.

  6. At the end of the line? Reset our position_ to the start of the row.

  7. Whatever we’ve done up to now, we’ve got nothing buffered up so reset spaces_ to the current position.

I chose to advance spaces_ ahead of the current position, rather then simply keeping count of the buffered spaces. It makes little difference either way - at some point in the code there will be an ugly point where you have to do a bit of arithmetic involving position_ and spaces_.

Those with long memories will notice I’ve declared entabber as a class, when previously for detabber I used a struct. This little video from Jonathan Boccara was instrumental in changing my mind.

Source code

Source code for this program, indeed the whole project, is available in the stiX GitHub repository. entab is program 1 of Chapter 2.

Even older than me?

If you are even longer in the tooth than me and were working on the kind of chunky machines that loom silently in the background of Software Tools, I’d love to hear about that. You can email me or get in touch on Twitter.

Endnotes

This whole endeavour relies Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They’re both still in print, but new copies are, frankly, just ridiculously expensive. Happily, here are plenty of second-hand copies floating round, or you can borrow them from the The Internet Archive using the links above.

For this project I’ve been using JetBrain’s CLion, which I liked enough to buy a license. CLion uses CMake to build projects. My previous flirtations with CMake, admittedly many years ago, weren’t a huge success. Not so this time - it’s easy to use and works a treat.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this is my first time really using it. I like it and will use it again.


Tagged code, and software-tools-in-c++


Jez Higgins

Freelance software generalist
software created
extended or repaired

Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About