auto lines = std::vector<std::string> { };
Freelance software grandad
software created
extended or repaired
Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations
STinC++ rewrites the programs in Software Tools in Pascal using C++
After a couple of warm-ups (from which, if we’re honest, I’ve probably cooled back down from), the first proper program in Chapter 4 of Software Tools In Pascal is sort
, a program that sorts a text file line by line into increasing lexicographic order. Kernighan and Plauger present this as a filter even though, they point out, it isn’t really a filter because it can’t output anything until its input is complete. (That little note, along with phrases like increasing lexicographic order, is the kind of precise language that just fills me with reading pleasure.)
The first iteration of sort
assumes the data fits in memory. We’ll read the whole of the input in one pass, sort it, then spit it out again.
Pascal’s lack of a proper string type has hampered Kernighan and Plauger throughout, and here it presents a real issue. Using fixed sized character arrays as a substitute string type just isn’t going to cut it. For short strings they’ll waste memory, which is a scarce resource, and for longer strings present a further problem they would need to resolve somehow.
For them, then, their first task is come up with a suitable data representation. They choose to read the input into a single large array, then keep a second array of indexes pointing the start of each line.
Here in the future, I can just write:
auto lines = std::vector<std::string> { };
Having already written a Shell sort earlier in the chapter, Kernighan and Plauger now have to specialise it for their new data structure. It needs to be extended to extract each line under consideration, then further to compare the lines, and finally how to swap lines. This touches almost every part of the sort routine. They do have one small win - swapping two lines only involves switching the indexes rather than fiddling inside the large array.
I need to specialise my sort too …
std::sort(lines.begin(), lines.end());
Having considered their data structure and sort, Kernighan and Plauger need to write procedures to read standard in and populate their data structures, and also to print it back out again.
Let’s have a go at that too…
// read
while (std::cin)
lines.emplace_back(stiX::getline(std::cin));
// write
std::copy(
lines.begin(),
lines.end(),
std::ostream_iterator(std::cout, "\n");
);
stiX::getline
is a tiny convenience wrapper around std::getline
I’ve went back and forth on emplace_back
and push_back
here. I went with emplace_back
seems to better express intention, but I’d be hard pressed to properly explain why.
Arranging the snippets above in the proper order gives
std::vector<std::string> read_lines();
void write_lines(std::vector<std::string> const &lines);
int main() {
auto lines = read_lines();
std::sort(lines.begin(), lines.end());
write_lines(lines);
} // main
std::vector<std::string> read_lines() {
auto lines = std::vector<std::string> { };
while (std::cin)
lines.emplace_back(stiX::getline(std::cin));
return lines;
} // read_lines
void write_lines(std::vector<std::string> const &lines) {
std::copy(
lines.begin(),
lines.end(),
std::ostream_iterator<std::string>(std::cout, "\n")
);
} // write_lines
If we ignore function declarations and pretty printing, this entire program is five lines of code.
The main body of Kernighan and Plauger’s code is equally concise
begin
if (gtext(linpos, nlines, linbuf, STDIN) = ENDFILE) then begin
shell(linpos, nlines, linbuf);
ptext(linpos, nlines, linbuf, STDOUT)
end
else
error('sort: input too big to sort')
end;
Unfortunately, they had to supply all the code behind the gtext
, quick
, and ptext
calls too, while I leaned on standard library components. Expanded out, their full program is somewhat heftier.
{ Copyright (C) 1981 by Bell Laboratories, Inc., and Whitesmiths Ltd. }
{ sort -- sort text lines in memory }
procedure inmemsort;
const
MAXCHARS = 10000; { maximum # of text characters }
MAXLINES = 300; { maximum # of lines }
type
charbuf = array [1..MAXCHARS] of character;
charpos = 1..MAXCHARS;
posbuf = array [1..MAXLINES] of charpos;
pos = 0..MAXLINES;
var
linebuf : charbuf;
linepos : posbuf;
nlines : pos;
{ gtext -- get text lines into linebuf }
function gtext (var linepos : posbuf; var nlines : pos;
var linebuf : charbuf; infile : filedesc) : boolean;
var
i, len, nextpos : integer;
temp : string;
done : boolean;
begin
nlines := 0;
nextpos := 1;
repeat
done := (getline(temp, infile, MAXSTR) = false);
if (not done) then begin
nlines := nlines + 1;
linepos[nlines] := nextpos;
len := length(temp);
for i := 1 to len do
linebuf[nextpos+i-1] := temp[i];
linebuf[nextpos+len] := ENDSTR;
nextpos := nextpos + len + 1 { 1 for ENDSTR }
end
until (done) or (nextpos >= MAXCHARS-MAXSTR)
or (nlines >= MAXLINES);
gtext := done
end;
{ shell -- ascending Shell sort for lines }
procedure shell (var linepos : posbuf; nlines : integer;
var linebuf : charbuf);
var
gap, i, j, jg : integer;
{ cmp -- compare linebuf[i] with linebuf[j] }
function cmp (i, j : charpos; var linebuf : charbuf)
: integer;
begin
while (linebuf[i] = linebuf[j])
and (linebuf[i] <> ENDSTR) do begin
i := i + 1;
j := j + 1
end;
if (linebuf[i] = linebuf[j]) then
cmp := 0
else if (linebuf[i] = ENDSTR) then { 1st is shorter }
cmp := -1
else if (linebuf[j] = ENDSTR) then { 2nd is shorter }
cmp := +1
else if (linebuf[i] < linebuf[j]) then
cmp := -1
else
cmp := +1
end;
{ exchange -- exchange linebuf[lp1] with linebuf[lp2] }
procedure exchange (var lp1, lp2 : charpos);
var
temp : charpos;
begin
temp := lp1;
lp1 := lp2;
lp2 := temp
end;
begin
gap := nlines div 2;
while (gap > 0) do begin
for i := gap+1 to nlines do begin
j := i - gap;
while (j > 0) do begin
jg := j + gap;
if (cmp(linepos[j],linepos[jg],linebuf)<=0) then
j := 0 { force loop termination }
else
exchange(linepos[j], linepos[jg]);
j := j - gap
end
end;
gap := gap div 2
end
end;
{ ptext -- output text lines from linebuf }
procedure ptext (var linepos : posbuf; nlines : integer;
var linebuf : charbuf; outfile : filedesc);
var
i, j : integer;
begin
for i := 1 to nlines do begin
j := linepos[i];
while (linebuf[j] <> ENDSTR) do begin
putcf(linebuf[j], outfile);
j := j + 1
end
end
end;
begin
if (gtext(linepos, nlines, linebuf, STDIN)) then begin
shell(linepos, nlines, linebuf);
ptext(linepos, nlines, linebuf, STDOUT)
end
else
error('sort: input too big to sort')
end;
At around 100 lines it’s not a massive program, but it’s still an undertaking. I’d be feeling pretty pleased with myself to have written something of this size and be confident in it inside an intense day.
What I actually did was write my program directly into the text of this article. I then cut and pasted it across to my IDE, and it turned out I was correct pretty much first go. The only change I needed to make was to add the newline separator into the ostream_iterator
.
Taking pieces you have, which you know to be correct and efficient, and plugging them together is almost the platonic ideal of programming. Join me now in sending good thoughts to the language designers, compiler writers, library implementors, and all those involved in providing the fantastic programming tools we have at our fingertips. Thanks friends!
This whole endeavour relies on Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They are both still in print, but new copies are, frankly, just ridiculously expensive. Happily, there are plenty of second-hand copies floating round, or you can borrow them from The Internet Archive using the links above.
For this project I’ve been using JetBrain’s CLion, which I liked enough to buy and renew a license for, because it’s gone a year already.
The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this project is the first time really getting in and using it. I like it, and it’ll be my first choice in the future.
Freelance software grandad
software created
extended or repaired
Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations