Jez Higgins

Freelance software generalist
software created
extended or repaired


Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About

Wednesday 21 August 2019 The Forest Road Reader, No 2.26 : STinC++ - wordcount

After counting characters and then lines, Kernighan and Plauger take us next to counting words.

wordcount.pas
procedure wordcount;
var
    nl : integer;
    c : character;
    inword: boolean;
begin
    nw := 0;
    inword := false;
    while (getc(c) <> ENDFILE) do
        if (c = BLANK) or (c = NEWLINE) or (c = TAB) then
            inword := false
        else if (not inword) then begin
            inword := true
            nw := nw + 1
        end;
    putdec(nw, 1);
    putc(NEWLINE);
end;

You can see the little path Kernighan and Plauger are on quite clearly now. Start with a simple loop, then extend that loop with a simple counter. Next add a simple conditional, and now extend that conditional into a little state machine. Each little step has added something extra, changing the functionality of each program to provide a new and useful result.

My path has to been to look through the list of functions provided in the algorithm header and pick the one that does what’s needed for this task.

Kernighan and Plauger have a straightforward, although entirely reasonably, definition of a word - the maximal sequence of characters not containing a blank, a tab, or a newline. The additional complexity of Kernighan and Plauger’s wordcount over linecount is to keep track of whitespace delimiters between the words. They are, in effect, splitting up the sequence of characters into a sequence of words.

Previously, I blythely said that C++'s istream and ostream provide a number of different iterators. To get down and deal with the raw character stream we use istreambuf_iterator<char>. For formatted input, that is anything that needs a bit of work to process those raw characters in some way, we want some sort of istream_iterator.

Gathering characters up into whitespace delimited words qualifies as a bit of processing work. It’s the kind of thing people do all the time, and consequently is precisely what the istream_iterator<std::string> provides.

I rather loosely described std::distance(InputIt first, InputIt last) as counting the hops between first and last. More formally, it returns the number of iterator increments needed to go from first and last. Incrementing an istream_iterator<std::string> returns the next word, so plugging that into std::distance counts the number of words in the input.

wordcount.cpp
#include "wordcount.h"

#include <algorithm>
#include <iostream>
#include <iterator>

namespace stiX {
    size_t wordcount(std::istream& in) {
        return std::distance(
                std::istream_iterator<std::string>(in),
                std::istream_iterator<std::string>()
        );
    }
}

Source code

Source code for this program, indeed the whole project, is available in the stiX GitHub repository. wordcount is program 4 of Chapter 1.

Endnotes

This whole endeavour relies Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They’re both still in print, but there are plenty of second-hand editions floating round.

For this project I’ve been trying out JetBrain’s CLion, which is pretty great. CLion uses CMake to build projects. My previous flirtations with CMake, admittedly many years ago, weren’t a huge success. Not so this time - it’s easy to use and works a treat.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this is my first time really using it. I like it and will use it again.


Tagged code, and software-tools-in-c++


Jez Higgins

Freelance software generalist
software created
extended or repaired

Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About