Jez Higgins

Freelance software generalist
software created
extended or repaired


Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About

Tuesday 20 August 2019 The Forest Road Reader, No 2.25 : STinC++ - linecount

So once you can count characters, you might also want to count the number of lines in some input. Instead of counting every character on our input, we just count the number of newline characters we see.

linecount.pas
procedure linecount;
var
    nl : integer;
    c : character;
begin
    nl := 0;
    while (getc(c) <> ENDFILE) do
        if (c = NEWLINE) then
            nl := nl + 1;
    putdec(nl, 1);
    putc(NEWLINE);
end;

It’s at this point that Software Tools in Pascal does rather show its age. In their discussion of the code, Kernighan and Plauger say

The idea that text information is just a string of characters, with arbitrary length lines delimited by explicit NEWLINE characters, seems obvious when you think about how a typewriter or a terminal works. But for all its obviousness, it’s still an uncommon concept in many computing systems, where text must often be forced into either fixed length chunks reminiscent of cards or "records" with inconvenient properties.

There follows a paragraph or two about the implementation of their getc and putc primitives, and how they might be implemented on systems which have fixed length records, differing character sets, disk formats, or whatever. The point of these two functions is to insulate their code from the vagaries of the underlying system. Even if the implementations of getc and putc are trivial on a particular system, it is still worth doing.

But whatever the source or sink, we will stick with our interface and program in terms of typewriter-like text …​ Having a uniform representation for text solves much of the problem of keeping tools uniform.

This kind of insulation is part of what the standard library provides for us. I’ve built this code on three different operating systems, each with a three different file system, using four different compilers, and it’s all just worked. Kernighan and Plauger were not so blessed, and had to build their own low-level library as they went.

Of course our library doesn’t just have low-level stuff. It has mid-level stuff too, like the std::copy and std::distance functions we used previously, and like the std::count we’re using today.

linecount.cpp
#include "linecount.h"

#include <algorithm>
#include <iostream>
#include <iterator>

namespace stiX {
    size_t linecount(std::istream &in) {
        return std::count(
                std::istreambuf_iterator<char>(in),
                std::istreambuf_iterator<char>(),
                '\n'
        );
    }
}

New-line character

The use of \n as the line ending character is so entirely normal that we don’t usually think about it. While writing this, I did go on a hunt through the latest C++ draft standard for it, just to be certain this wasn’t some bit of folk wisdom. There are lots of references to new-line as an abstract concept, but two particular mentions of \n that stood out.

2.14.3 Character Literals

Note 3 describes how certain nongraphic characters and other potentially awkward characters like " can be represented in code with an escape sequence. First one listed? Our friend \n

new-line

NL(LF)

\n

In other words, \n always represents the new-line character on your system.

The other reference is in the description of std::endl.

27.7.3.8 Standard basic_ostream manipulators
namespace std {
  template <class charT, class traits>
    basic_ostream<charT,traits>& endl(basic_ostream<charT,traits>& os);
}

Effects: Calls os.put(os.widen(’\n’)), then os.flush().
Returns: os.

In other words, if you want to output a new-line then stuff a \n down your ostream. Looking to the wider tradition, we know to avoid std::endl in favour of plain old \n.

Source code

Source code for this program, indeed the whole project, is available in the stiX GitHub repository. linecount is program 3 of Chapter 1.

Library References

Endnotes

This whole endeavour relies Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They’re both still in print, but there are plenty of second-hand editions floating round.

For this project I’ve been trying out JetBrain’s CLion, which so far has been pretty great. CLion uses CMake to build projects. My previous flirtations with CMake, admittedly many years ago, weren’t a huge success. Not so this time - it’s easy to use and works a treat.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this is my first time really using it. I like it and will use it again.


Tagged code, and software-tools-in-c++

Thursday 15 August 2019 The Forest Road Reader, No 2.24 : STinC++ - charcount

After devoting six pages to bootstrapping copy, Kernighan and Plauger give their next program, charcount, six sentences of setup. Of those six, three of them are an exhortation to just hang on and trust them

If you can’t think offhand why anyone would want to merely count the characters in something, don’t worry. This is a book on tool building, remember, and tools work best in combination. Applications will occur soon enough.

charcount.pas
procedure charcount;
var
    nc : integer;
    c : character;
begin
    nc := 0;
    while (getc(c) <> ENDFILE) do
        nc := nc + 1
    putdec(nc, 1);
    putc(NEWLINE);
end;

In their copy program, they established we could read a character at a time from the standard input. For charcount, it’s straightforward to modify the loop to keep a count of those characters.

Hmm. I was feeling smug about not having a loop in my implementation at all. Perhaps I’ll have to revisit the whole business?

I chose to treat the standard input as a sequence, and consequently was able to lean entirely on a function provided by C++'s standard library. Copying a sequence from one place to another is merely one of a number of operations the standard library provides. Another is std::distance(InputIt first, InputIt last), a function which returns the number of hops you need to advance from first to last.

In this case, passing in our istreambuf_iterators it gives us the count of characters available on standard input. Again, the standard library provides exactly what’s needed.

charcount.cpp
#include "charcount.h"

#include <algorithm>
#include <iostream>
#include <iterator>

namespace stiX {
    size_t charcount(std::istream &in) {
        return std::distance(
                std::istreambuf_iterator<char>(in),
                std::istreambuf_iterator<char>()
        );
    }
}

While distance maybe isn’t the best name, once you’re familiar with it this implementation is about as minimal and clear as it can be. Again, we’ve avoided an explicit loop, nor have we had to come up with any variable names. Even in a program as brief as charcount Kernighan and Plauger’s version has a comprehension overhead. Their variable c is, essentially, redundant. It’s only there so they can call the getc primitive. The variable we really need to be paying attention to is nc. However, because Pascal requires variables to declared ahead of the procedure body, all variables initially appear to be equally important. It’s a small thing here, but the larger the procedure the bigger a barrier it can be.

Rather than have the charcount function write directly to the output, as Kernighan and Plauger did, I chose to return the count from the function and deal with the output higher up. In turn, this makes our test and executable drivers extremely plain.

test driver
void verifyCharCount(std::string input)
{
    std::istringstream is(input);

    auto count = stiX::charcount(is);

    REQUIRE(count == input.size());
}
main
int main() {
    std::cout << stiX::charcount(std::cin) << std::endl;
}

Source code

Source code for this program, indeed the whole project, is available in the stiX GitHub repository. charcount is program 2 of Chapter 1.

Library References

Endnotes

This whole endeavour relies Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They’re both still in print, but there are plenty of second-hand editions floating round.

For this project I’ve been trying out JetBrain’s CLion, which so far has been pretty great. CLion uses CMake to build projects. My previous flirtations with CMake, admittedly many years ago, weren’t a huge success. Not so this time - it’s easy to use and works a treat.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this is my first time really using it. I like it and will use it again.


Tagged code, and software-tools-in-c++

Sunday 11 August 2019 The Forest Road Reader, No 2.23 : Do

Wrote a do …​ while loop this week. First time for everything, I guess

The JezUK 2019/2020 UK Tour

To my utter surprise, plans for this winter’s tour have already been thrust upon me.

  • The Very Slow Time Machine gets a reprise at Worcester Source on November 27.

  • I’ll be giving a new talk at nor(DEV):con over in Norwich next February. Not sure what on yet - possibly JavaScript, maybe (if I weaken) an introduction to C++. nor(DEV):con, while typographically stylised, is a lovely little conference. Great price too.

Please Enjoy

  • Hamish and Dougal - You’ll Have Had Your Tea? Graeme Garden and Barry Cryer are the eponymous double-entendre spouting Scotsmen in this thoroughly silly sit-com.

  • 105 STL Algorithms in Less Than an Hour - terrific talk by Jonathan Boccara at last year’s CppCon.

  • The Judge John Hodgman Podcast. John Hodgman presides over his fake internet court settling disputes in witty fashion, while always delivering a very satisfying, humane, verdict.

  • I’ve probably said this before but Richard Stark’s Parker novels are the business.

  • My chum Tom took me along to see KISS on their farewell tour. We are neither of us are especially KISS fans and neither of us had any particular expectations, but it turned out to be a genuinely amazing evening out. It was the campest show I ever seen and I recommend it to you if, like us, you can blag your way in for free.


Tagged on-tour

Friday 09 August 2019 The Forest Road Reader, No 2.22 : STinC++ - copy

Software Tools in Pascal starts off pretty gently. The first program Kernighan and Plauger present is copy, which copies its input to its output. The seemingly trivial nature of this task allows them in introduce how they write and present their code - without getting into detail they mention how they cope with different Pascal implementations on different platforms, the general structure of they adopt, and so on. In particular they introduce primitives - functions that interface to the outside world. The two primitives they introduce first are getc and putc which, respectively, read a character from and write a character to somewhere,

an interactive terminal or some secondary storage device like a disk.

These primitives, of which they build up a fair library over the course of the book, are how they make their Pascal portable across platforms. These days, we take that kind of thing for granted as part our language standard library.

After this preamble they present their program, or at least part of it

copy.pas
procedure copy;
var
    c : character;
begin
    while (getc(c) <> ENDFILE) do
        putc(c)
end;

They explain

First, and most obvious to people who have used Pascal before, is that this is not a complete program - it is just a procedure. So it needs some surrounding context before it can actually do anything for us. We intend to present all of our programs this way …​ so we can better focus on the essential ideas.

They do then present the whole program with the procedure in context, and talk it through. They conclude this initial section with a reason why this isn’t as trivial a program as it might appear.

When you encounter a new language, a new operating environment, or just a new way of doing business on a computer, the first hurdle to clear is learning how to run a program. You must master, perhaps: logging on to the computer, creating files with the editor, running the compiler and/or linker, modifying files with the editor, and invoking the program you’ve finally built! With all these potential problem areas, the last thing you need is a complex program to contribute troubles of its own.

The primitives getc and putc come, of course, more or less directly from C’s getc/getchar and putc/putchar. It would be tempting to cast this program directly into C++ by substituting begin and end for a pair of curly brackets and having done with it.

void copy() {
    int c;
    while ((c = getchar()) != EOF)
        putchar(c);
}

Tempting, but lazy.

Does this actually work? Well, by inspection it looks like it should, but because getchar and putchar are tied to our standard input and output (as we call our interactive terminal or some secondary storage device these days) we’d have to run the program to actually test it. Running a whole program to test it is tedious at best, even for a tiny program like this. Perhaps we can raise the level of abstraction a bit, and make it a little easier to test.

void copy(std::istream& in, std::ostream& out) {
    int c;
    while ((c = in.get()) != EOF)
        out.put(c);
}

Now we can copy from any istream (which could be standard input, a file, something in memory, almost anything) to any ostream ((which could be the console, another file, somewhere in memory, almost anything). Nice. We can throw a little test wrapper round that, poke some known inputs through it, check the right thing comes out.

This is where I actually started. I wrote the test first, using the splendid Catch test framework.

test driver
void verifyCopyString(std::string input) {
    std::istringstream is(input);
    std::ostringstream os;

    stiX::copy(is, os);

    REQUIRE( os.str() == input );
}

/* ...
  some strings
... */

TEST_CASE( "Chapter 1 - copy" ) {
    verifyCopyString(empty);
    verifyCopyString(zero_length);
    verifyCopyString(very_short);
    verifyCopyString(longer);
    verifyCopyString(longer_with_line_breaks);
}

I wrote the least amount of code I could to get this to compile.

copy.cpp skeleton
namespace stiX {
    void copy(std::istream& in, std::ostream& out) {
    }
}

I then ran the tests, which naturally nearly all failed but now I had something to work with.

The function signature - void copy(std::istream& in, std::ostream& os) - is pretty perfect, but what to fill it with? Is a while loop really still the state of the art here?

It is not.

The C++ Standard Library provides a function copy(InputIt first, InputIt last, OutputIt d_first) which copies the elements in the range, defined by [first, last), to another range beginning at d_first.

copy is a generic function. It takes a pair of iterators delimiting some input range and copies what it find to an output iterator. Classically, we always used to describe iterators as pointer-like, ie they pointed to something, you could advance them, and you could compare them. Equally classically, we generally thought about iterators as operating over some known and bounded region - a block of memory, or a container of some sort like a vector or a list. We don’t usually think of IO in these terms - a file is a file, writing to the console sends things to the screen, that kind of thing. However, if we start to think more broadly about iterators as moving over a sequence, and consider our input as source of characters and our output as somewhere to put a sequence of characters, it becomes quite natural to want to iterate over console input and output.

So, if I can connect my istream and ostream up to copy, it’ll do the work for me? Perfect!

Happily, C++'s istream and ostream can provide the iterators we’re after. In fact, there are quite a number to choose from. In this case, I want to pull raw characters from the input and poke them straight down the output, so I want istreambuf_iterator<char> and ostreambuf_iterator<char>.

copy.cpp
#include "copy.h"

#include <algorithm>
#include <iostream>
#include <iterator>

namespace stiX {
    void copy(std::istream &in, std::ostream &out) {
        std::copy(
            std::istreambuf_iterator<char>(in),
            std::istreambuf_iterator<char>(),
            std::ostreambuf_iterator<char>(out)
        );
    }
}

I like this a lot. There’s no loop, no comparison, no worrying about special end-of-input sentinel values. When you read it there’s not even the slightest mental gymnastics involved in understanding it, because there’s nothing to comprehend. It does exactly what it says - copy this here to that there.

IOStream Iterators

Underneath each C++ istream or ostream is a streambuf as its source of input or output target. The streambuf does all the work regarding the actual I/O and the stream is only concerned with formatting and transformation or conversion from characters to other types such as strings.

An istream_iterator takes a template argument that says what the unadorned character sequence from the streambuf should be formatted as. An istream_iterator<int>, for instance, will interpret the whitespace-delimited incoming text as series of int values.

An istreambuf_iterator, however, is only concerned with raw characters and reads directly from the associated streambuf of the istream that it gets passed.

Generally, if we’re interested in the raw characters we want an istreambuf_iterator. If we’re after formatted input of any kind, we need an istream_iterator.

As for istream, so for ostream. For unformatted output we want an ostreambuf_iterator, while ostream_iterator provide formatted output.

Source code

Source code for this program is on Github, with the test harness and build files elsewhere in the same repository.

Endnotes

Obviously, I’m leaning hugely on Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books, and commend them to you. They’re both still in print, although pretty pricy new. Second hand will serve you just as well - the words are still the same.

For this project I’ve been trying out JetBrain’s CLion, which so far has been pretty great. CLion uses CMake to build projects. My previous flirtations with CMake, admittedly many years ago, weren’t a huge success. Not so this time - it’s easy to use and works a treat.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this is my first time really using it. I like it and will use it again.


Tagged code, and software-tools-in-c++

Monday 05 August 2019 The Forest Road Reader, No 2.21 : Software Tools In C++

Software Tools In Pascal

One of my very favourite books about software is Kernighan and Plauger’s Software Tools in Pascal. The authors set their stall out in the very first line of the Preface

This book teaches how to write good programs that make good tools, by presenting a comprehensive set, each of which provides lessons in design and implementation.

They continue

The principles of good programming are presented not as abstract sermons but as concrete lessons in the context of working programs. For example, there is no chapter on "efficiency". Instead, throughout the book there are observations on efficiency as it relates to the particular program being developed. Similarly there is no chapter on "top-down design," nor on "structured programming," nor on "testing and debugging." Instead, all of these disciplines are employed as appropriate in every program shown.

I first found Software Tools in Pascal a bit over 20 years ago, hidden in amongst old VAX manuals in the library of the company where I worked at the time. Unusually for a technical book, Software Tools in Pascal has a terrific narrative. It starts with a tiny task – copy everything from the console input to the console output – and presents the correspondingly tiny program. Step by step, program by program, you arrive at the end of the book with an ex-like line editor, a roff-style print formatter, and a macro processor. En route, you take in filtering, file archiving, sorting, and regular expressions. Each incremental step seems so logical and the code presented is so clear, that you just want to keep reading. Chunks of code in a book can be rather tedious, but Kernighan and Plauger’s is a joy. The lessons imparted on simplicity, clarity, efficiency, on tools and the Unix philosophy, in common sense, how each decision effects the finished program – well, they are at the core of what we do, and how we should think about programming.

Your mind can only be blown once, but I re-read bits of this book often to remind myself of the feeling it gave me.

I’ve long toyed with the idea of working through Software Tools in Pascal reimplementing the programs therein in another language. The invitation is, after all, right there on the cover.

Good Programming is not learned from generalities, but by seeing how significant programs can be made clean, easy to read, easy to maintain and modify, human-engineered, efficient, and reliable, by the application of common sense and good programming practices. Careful study and imitation of good programs leads to better writing.

Software Tools in Pascal is itself a revised edition of an earlier book, called simply Software Tools. The programs in Software Tools were presented in RatFor, a modified version of Fortran. Ouroborus-like, one of the tools presented in Software Tools is the RatFor to Fortran preprocessor. While Software Tools in Pascal presents the same tools, the implementations are new Pascal programs, not straight translations of the existing code.

The programs here are not just transliterations into Pascal, however. Almost every program has been improved in some way. Pascal lets us do some things much better than is possible in Fortran. Recursion in particular is a boon. Quicksort and regular expression closure are much simpler when done recursively instead of with a stack or linked list; expression evaluation has been added to the macro processor.

And yet other languages allow us to do some things much better even than that! Languages now provide much larger and more capable standard libraries that came with early-80s vintage Pascal, for instance. Much richer data types, either built in or via libraries, are available, and so on.

Partly because I’m working in C++ at the moment (albeit the C++ of the past) and partly because C++ is what I regard as my programming first language (even though it’s not even close to the first progamming language I learned), I’m thinking about C++. While I’ve done a reasonable amount of C++ work over the past few years, at the same time C++ itself has undergone some fairly major changes and so I’m not bang-up-to-date and I’ll have an excuse to do a bit of catching up. Of course, C++ and Pascal share a broad family lineage and maybe that’ll help and maybe it won’t. There are so many ways to slice this, but whichever, I think it’ll be interesting and fun to write some Software Tools in C++.


Tagged code, and software-tools-in-c++
Older posts are available in the archive or through tags.


Jez Higgins

Freelance software generalist
software created
extended or repaired

Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About