Jez Higgins

Freelance software generalist
software created
extended or repaired


Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About

Friday 09 August 2019 The Forest Road Reader, No 2.22 : STinC++ - copy

Software Tools in Pascal starts off pretty gently. The first program Kernighan and Plauger present is copy, which copies its input to its output. The seemingly trivial nature of this task allows them in introduce how they write and present their code - without getting into detail they mention how they cope with different Pascal implementations on different platforms, the general structure of they adopt, and so on. In particular they introduce primitives - functions that interface to the outside world. The two primitives they introduce first are getc and putc which, respectively, read a character from and write a character to somewhere,

an interactive terminal or some secondary storage device like a disk.

These primitives, of which they build up a fair library over the course of the book, are how they make their Pascal portable across platforms. These days, we take that kind of thing for granted as part our language standard library.

After this preamble they present their program, or at least part of it

copy.pas
procedure copy;
var
    c : character;
begin
    while (getc(c) <> ENDFILE) do
        putc(c)
end;

They explain

First, and most obvious to people who have used Pascal before, is that this is not a complete program - it is just a procedure. So it needs some surrounding context before it can actually do anything for us. We intend to present all of our programs this way …​ so we can better focus on the essential ideas.

They do then present the whole program with the procedure in context, and talk it through. They conclude this initial section with a reason why this isn’t as trivial a program as it might appear.

When you encounter a new language, a new operating environment, or just a new way of doing business on a computer, the first hurdle to clear is learning how to run a program. You must master, perhaps: logging on to the computer, creating files with the editor, running the compiler and/or linker, modifying files with the editor, and invoking the program you’ve finally built! With all these potential problem areas, the last thing you need is a complex program to contribute troubles of its own.

The primitives getc and putc come, of course, more or less directly from C’s getc/getchar and putc/putchar. It would be tempting to cast this program directly into C++ by substituting begin and end for a pair of curly brackets and having done with it.

void copy() {
    int c;
    while ((c = getchar()) != EOF)
        putchar(c);
}

Tempting, but lazy.

Does this actually work? Well, by inspection it looks like it should, but because getchar and putchar are tied to our standard input and output (as we call our interactive terminal or some secondary storage device these days) we’d have to run the program to actually test it. Running a whole program to test it is tedious at best, even for a tiny program like this. Perhaps I can raise the level of abstraction a bit, and make it a little easier to test.

void copy(std::istream& in, std::ostream& out) {
    int c;
    while ((c = in.get()) != EOF)
        out.put(c);
}

Now I can copy from any istream (which could be standard input, a file, something in memory, almost anything) to any ostream ((which could be the console, another file, somewhere in memory, almost anything). Nice. We can throw a little test wrapper round that, poke some known inputs through it, check the right thing comes out.

This is where I actually started. I wrote the test first, using the splendid Catch test framework.

test driver
void verifyCopyString(std::string input) {
    std::istringstream is(input);
    std::ostringstream os;

    stiX::copy(is, os);

    REQUIRE( os.str() == input );
}

/* ...
  some strings
... */

TEST_CASE( "Chapter 1 - copy" ) {
    verifyCopyString(empty);
    verifyCopyString(zero_length);
    verifyCopyString(very_short);
    verifyCopyString(longer);
    verifyCopyString(longer_with_line_breaks);
}

I wrote the least amount of code I could to get this to compile.

copy.cpp skeleton
namespace stiX {
    void copy(std::istream& in, std::ostream& out) {
    }
}

I then ran the tests, which naturally nearly all failed but now I had something to work with.

The function signature - void copy(std::istream& in, std::ostream& os) - is pretty perfect, but what to fill it with? Is a while loop really still the state of the art here?

It is not.

The C++ Standard Library provides a function copy(InputIt first, InputIt last, OutputIt d_first) which copies the elements in the range, defined by [first, last), to another range beginning at d_first.

copy is a generic function. It takes a pair of iterators delimiting some input range and copies what it find to an output iterator. Classically, we always used to describe iterators as pointer-like, ie they pointed to something, we could advance them, and we could compare them. Equally classically, we generally thought about iterators as operating over some known and bounded region - a block of memory, or a container of some sort like a vector or a list. We don’t usually think of IO in these terms - a file is a file, writing to the console sends things to the screen, that kind of thing. However, if we start to think more broadly about iterators as moving over a sequence, and consider our input as source of characters and our output as somewhere to put a sequence of characters, it becomes quite natural to want to iterate over console input and output.

So, if I can connect my istream and ostream up to copy, it’ll do the work for me? Perfect!

Happily, C++'s istream and ostream can provide the iterators I’m after. In fact, there are quite a number to choose from. In this case, I want to pull raw characters from the input and poke them straight down the output, so I want istreambuf_iterator<char> and ostreambuf_iterator<char>.

copy.cpp
#include "copy.h"

#include <algorithm>
#include <iostream>
#include <iterator>

namespace stiX {
    void copy(std::istream &in, std::ostream &out) {
        std::copy(
            std::istreambuf_iterator<char>(in),
            std::istreambuf_iterator<char>(),
            std::ostreambuf_iterator<char>(out)
        );
    }
}

I like this a lot. There’s no loop, no comparison, no worrying about special end-of-input sentinel values. When you read it there’s not even the slightest mental gymnastics involved in understanding it, because there’s nothing to comprehend. It does exactly what it says - copy this here to that there.

IOStream Iterators

Underneath each C++ istream or ostream is a streambuf as its source of input or output target. The streambuf does all the work regarding the actual I/O and the stream is only concerned with formatting and transformation or conversion from characters to other types such as strings.

An istream_iterator takes a template argument that says what the unadorned character sequence from the streambuf should be formatted as. An istream_iterator<int>, for instance, will interpret the whitespace-delimited incoming text as series of int values.

An istreambuf_iterator, however, is only concerned with raw characters and reads directly from the associated streambuf of the istream that it gets passed.

Generally, if we’re interested in the raw characters we want an istreambuf_iterator. If we’re after formatted input of any kind, we need an istream_iterator.

As for istream, so for ostream. For unformatted output we want an ostreambuf_iterator, while ostream_iterator provide formatted output.

Source code

Source code for this program is on Github, with the test harness and build files elsewhere in the same repository.

Endnotes

Obviously, I’m leaning hugely on Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books, and commend them to you. They’re both still in print, although pretty pricy new. Second hand will serve you just as well - the words are still the same.

For this project I’ve been trying out JetBrain’s CLion, which so far has been pretty great. CLion uses CMake to build projects. My previous flirtations with CMake, admittedly many years ago, weren’t a huge success. Not so this time - it’s easy to use and works a treat.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this is my first time really using it. I like it and will use it again.


Tagged code, and software-tools-in-c++


Jez Higgins

Freelance software generalist
software created
extended or repaired

Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About