Let's underline "Hello World!^H^H^H^H^H^H^H^H^H^H^H^H____________"
Freelance software grandad
software created
extended or repaired
Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations
STinC++ rewrites the programs in Software Tools in Pascal using C++
Printing looms large in Brian Kernighan’s professional life and so it’s no surprise our next software tool, overstrike
is, like entab
, a print preprocessor.
You can overstrike characters on a typewriter be backspacing over what is already typed. This is how you underline words, for one thing; … If you send your output to a line printer, however, the result may be a hash, because a typical printer doesn’t know what to do with backspace characters.
Many printers do, however, provide for overstriking entire lines. [The convention] is to provide an extra carriage control character at the beginning of each line: a blank means "space before printing," and a plus sign means "do not space before printing," i.e. overstrike what has gone before.
The overstrike
filter looks for backspaces in the incoming text and generates a series of lines with carriage control to reproduce the effect of the backspaces. Given, say,
Let's underline "Hello World!^H^H^H^H^H^H^H^H^H^H^H^H____________"
the filter will produce
Let's underline "Hello World! + ____________"
As Kernighan and Plauger note, this is not the only way to do it, but it is one of the least complicated. They continue
It is often better to get on with something that does most of the job well enough, then improve and add things as they prove to be worthwhile.
Did someone just say MVP?
A line printer prints a whole line of output at time, hence why they’re not much interested in backspace characters and why overstrike
is looking to generate a multiple lines instead. Line printers were high-end bits of kit and pretty quick - some could bash out over 1000 lines a minute which makes that little inkjet in the corner look positively pedestrian. If you’re of a similar vintage to me, the existence of line printers is why your DOS printer port was called LPT1.
For younger readers, ^H
is how a backspace character would sometimes display on a terminal that didn’t know any better. ^
means the Ctrl key while H
is, well, H and hitting Ctrl and H generated ASCII code 08 aka Backspace. ASCII was the cool kids' character encoding scheme of choice back in 1981. If you use some variety of Unix derived operating system, there’s a high likelihood Ctrl+H will work as Backspace in your terminal window. Go on, try it.
Because a single character on the input might produce no output, one character, or many characters of output, for the overstrike
implementation I used the same skeleton as detab
and entab
- using std::transform
to read char
from the input stream and write std::stream
the output.
class overstriker {
public:
std::string operator()(char c) {
if (stiX::isbackspace(c)) { (1)
++backspaced_;
return empty;
}
std::string output;
if (backspaced_) {
position_ = (backspaced_ < position_) ? position_ - backspaced_ : 0; (2)
output += noskip; (3)
output += std::string(position_, ' '); (4)
} else if (position_ == 0) (5)
output += skip;
output += c; (6)
if (stiX::isnewline(c)) (7)
position_ = 0;
else
++position_;
backspaced_ = 0;
return output; (8)
}
private:
std::string const empty;
std::string const skip = " ";
std::string const noskip = "\n+";
size_t position_;
size_t backspaced_;
};
namespace stiX {
void overstrike(std::istream &in, std::ostream &out) {
filter(in, out, overstriker());
}
}
Is that a backspace? If so, make a note, chuck it away, and we’re done for now.
Wind back the position_
counter for as many backspaces as there were. For well-formed input, the conditional part of this expression guarding against excessive backspacing shouldn’t be necessary but Kernighan and Plauger quietly implement Postel’s Law (as it wasn’t yet known) throughout the book.
We’ve been backspacing, so prefix the line with the + control character.
Pad out the line with appropriate number of spaces. Obviously this only works if the printer is monospaced, but that was absolutely the case when Software Tools In Pascal was written.
If we haven’t been backspacing, are we perhaps at the start of normal row?
Having setup the output with whatever prefix (possibly none) we need, actually output the character we started with.
This is familiar from detab
and entab
. If we’re at the end of the current line, then reset the counter.
Boom! We’re done. Phew.
This was quite a fun little program to do, although sometimes tricky to visualise the output - especially in the presence of more than one bout of backspacing. It TDDed out very nicely though. I’m aware that I’m in a bit of filter groove at the moment, and am well prepared to think about the kind of little state machines that are implicit in this sort of processing. I do think, though, that what I’ve come up with is a tidy little solution. If you think otherwise (or even if you don’t), I’d be delighted to hear from you. Feel free to email me or get in touch on Twitter.
Source code for this program, indeed the whole project, is available in the stiX GitHub repository. overstrike
is program 2 of Chapter 2.
Software Tools In Pascal by Brian Kernighan and PJ Plauger is a book that I love. I’m working through the book, reimplementing their tools in C++.
Freelance software grandad
software created
extended or repaired
Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations