Jez Higgins

Freelance software generalist
software created
extended or repaired


Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About

Tuesday 20 August 2019 The Forest Road Reader, No 2.25 : STinC++ - linecount

So once you can count characters, you might also want to count the number of lines in some input. Instead of counting every character on our input, I just count the number of newline characters I see.

linecount.pas
procedure linecount;
var
    nl : integer;
    c : character;
begin
    nl := 0;
    while (getc(c) <> ENDFILE) do
        if (c = NEWLINE) then
            nl := nl + 1;
    putdec(nl, 1);
    putc(NEWLINE);
end;

It’s at this point that Software Tools in Pascal does rather show its age. In their discussion of the code, Kernighan and Plauger say

The idea that text information is just a string of characters, with arbitrary length lines delimited by explicit NEWLINE characters, seems obvious when you think about how a typewriter or a terminal works. But for all its obviousness, it’s still an uncommon concept in many computing systems, where text must often be forced into either fixed length chunks reminiscent of cards or "records" with inconvenient properties.

There follows a paragraph or two about the implementation of their getc and putc primitives, and how they might be implemented on systems which have fixed length records, differing character sets, disk formats, or whatever. The point of these two functions is to insulate their code from the vagaries of the underlying system. Even if the implementations of getc and putc are trivial on a particular system, it is still worth doing.

But whatever the source or sink, we will stick with our interface and program in terms of typewriter-like text …​ Having a uniform representation for text solves much of the problem of keeping tools uniform.

This kind of insulation is part of what the standard library provides for us. I’ve built this code on three different operating systems, each with a three different file system, using four different compilers, and it’s all just worked. Kernighan and Plauger were not so blessed, and had to build their own low-level library as they went.

Of course our library doesn’t just have low-level stuff. It has mid-level stuff too, like the std::copy and std::distance functions I used previously, and like the std::count I’m using today.

linecount.cpp
#include "linecount.h"

#include <algorithm>
#include <iostream>
#include <iterator>

namespace stiX {
    size_t linecount(std::istream &in) {
        return std::count(
                std::istreambuf_iterator<char>(in),
                std::istreambuf_iterator<char>(),
                '\n'
        );
    }
}

New-line character

The use of \n as the line ending character is so entirely normal that we don’t usually think about it. While writing this, I did go on a hunt through the latest C++ draft standard for it, just to be certain this wasn’t some bit of folk wisdom. There are lots of references to new-line as an abstract concept, but two particular mentions of \n that stood out.

2.14.3 Character Literals

Note 3 describes how certain nongraphic characters and other potentially awkward characters like " can be represented in code with an escape sequence. First one listed? Our friend \n

new-line

NL(LF)

\n

In other words, \n always represents the new-line character on your system.

The other reference is in the description of std::endl.

27.7.3.8 Standard basic_ostream manipulators
namespace std {
  template <class charT, class traits>
    basic_ostream<charT,traits>& endl(basic_ostream<charT,traits>& os);
}

Effects: Calls os.put(os.widen(ā€™\nā€™)), then os.flush().
Returns: os.

In other words, if you want to output a new-line then stuff a \n down your ostream. Looking to the wider tradition, we know to avoid std::endl in favour of plain old \n.

Source code

Source code for this program, indeed the whole project, is available in the stiX GitHub repository. linecount is program 3 of Chapter 1.

Library References

Endnotes

This whole endeavour relies Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They’re both still in print, but there are plenty of second-hand editions floating round.

For this project I’ve been trying out JetBrain’s CLion, which so far has been pretty great. CLion uses CMake to build projects. My previous flirtations with CMake, admittedly many years ago, weren’t a huge success. Not so this time - it’s easy to use and works a treat.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this is my first time really using it. I like it and will use it again.


Tagged code, and software-tools-in-c++


Jez Higgins

Freelance software generalist
software created
extended or repaired

Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About