Jez Higgins

Freelance software generalist
software created
extended or repaired


Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About

Thursday 24 October 2019 The Forest Road Reader, No 2.33 : STinC++ - overstrike

Printing looms large in Brian Kernighan’s professional life and so it’s no surprise our next software tool, overstrike is, like entab, a print preprocessor.

You can overstrike characters on a typewriter be backspacing over what is already typed. This is how you underline words, for one thing; …​ If you send your output to a line printer, however, the result may be a hash, because a typical printer doesn’t know what to do with backspace characters.

Many printers do, however, provide for overstriking entire lines. [The convention] is to provide an extra carriage control character at the beginning of each line: a blank means "space before printing," and a plus sign means "do not space before printing," i.e. overstrike what has gone before.

The overstrike filter looks for backspaces in the incoming text and generates a series of lines with carriage control to reproduce the effect of the backspaces. Given, say,

Let's underline "Hello World!^H^H^H^H^H^H^H^H^H^H^H^H____________"

the filter will produce

 Let's underline "Hello World!
+                 ____________"

As Kernighan and Plauger note, this is not the only way to do it, but it is one of the least complicated. They continue

It is often better to get on with something that does most of the job well enough, then improve and add things as they prove to be worthwhile.

Did someone just say MVP?

A line printer prints a whole line of output at time, hence why they’re not much interested in backspace characters and why overstrike is looking to generate a multiple lines instead. Line printers were high-end bits of kit and pretty quick - some could bash out over 1000 lines a minute which makes that little inkjet in the corner look positively pedestrian. If you’re of a similar vintage to me, the existence of line printers is why your DOS printer port was called LPT1.

For younger readers, ^H is how a backspace character would sometimes display on a terminal that didn’t know any better. ^ means the Ctrl key while H is, well, H and hitting Ctrl and H generated ASCII code 08 aka Backspace. ASCII was the cool kids' character encoding scheme of choice back in 1981. If you use some variety of Unix derived operating system, there’s a high likelihood Ctrl+H will work as Backspace in your terminal window. Go on, try it.

The implementation

Because a single character on the input might produce no output, one character, or many characters of output, for the overstrike implementation I used the same skeleton as detab and entab - using std::transform to read char from the input stream and write std::stream the output.

overstrike.cpp
class overstriker {
public:
    std::string operator()(char c) {
        if (stiX::isbackspace(c)) { (1)
                ++backspaced_;
                return empty;
        }

        std::string output;

        if (backspaced_) {
            position_ = (backspaced_ < position_) ? position_ - backspaced_ : 0; (2)
            output += noskip;  (3)
            output += std::string(position_, ' '); (4)
        } else if (position_ == 0) (5)
            output += skip;

        output += c; (6)

        if (stiX::isnewline(c)) (7)
            position_ = 0;
        else
            ++position_;
        backspaced_ = 0;

        return output; (8)
    }
private:
    std::string const empty;
    std::string const skip = " ";
    std::string const noskip = "\n+";

    size_t position_;
    size_t backspaced_;
};

namespace stiX {
    void overstrike(std::istream &in, std::ostream &out) {
        filter(in, out, overstriker());
    }
}
  1. Is that a backspace? If so, make a note, chuck it away, and we’re done for now.

  2. Wind back the position_ counter for as many backspaces as there were. For well-formed input, the conditional part of this expression guarding against excessive backspacing shouldn’t be necessary but Kernighan and Plauger quietly implement Postel’s Law (as it wasn’t yet known) throughout the book.

  3. We’ve been backspacing, so prefix the line with the + control character.

  4. Pad out the line with appropriate number of spaces. Obviously this only works if the printer is monospaced, but that was absolutely the case when Software Tools In Pascal was written.

  5. If we haven’t been backspacing, are we perhaps at the start of normal row?

  6. Having setup the output with whatever prefix (possibly none) we need, actually output the character we started with.

  7. This is familiar from detab and entab. If we’re at the end of the current line, then reset the counter.

  8. Boom! We’re done. Phew.

This was quite a fun little program to do, although sometimes tricky to visualise the output - especially in the presence of more than one bout of backspacing. It TDDed out very nicely though. I’m aware that I’m in a bit of filter groove at the moment, and am well prepared to think about the kind of little state machines that are implicit in this sort of processing. I do think, though, that what I’ve come up with is a tidy little solution. If you think otherwise (or even if you don’t), I’d be delighted to hear from you. Feel free to email me or get in touch on Twitter.

Source code

Source code for this program, indeed the whole project, is available in the stiX GitHub repository. overstrike is program 2 of Chapter 2.

Wait, I arrived late! What’s going on?

Software Tools In Pascal by Brian Kernighan and PJ Plauger is a book that I love. I’m working through the book, reimplementing their tools in C++.


Tagged code, and software-tools-in-c++


Jez Higgins

Freelance software generalist
software created
extended or repaired

Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About