Jez Higgins

Freelance software generalist
software created
extended or repaired


Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About

Wednesday 18 September 2019 The Forest Road Reader, No 2.29 : A Short Intermission

I’m in a gap, hopefully a brief one, between gigs at the moment. The work I’ve been doing for the past few months, improving the accessibility of a Windows desktop application for people with visual impairments, has been pretty interesting - I’ll certainly never work on a user interface in the same way again - and it was a good team to be a part of.

As I look forward to the next few months and whatever work comes along, there’s also the JezUK 2019/20 Winter Tour to think about. My thoughts naturally turned not to whatever the hell I was going to talk about, but my bloody speaker biography.

Speaker bios are, I’ve always found, ridiculously hard to write. I’ve long loathed the X started programming Y years ago template, and I’m instantly suspicious of anyone who describes themself as an expert (curiously, even if I know this to be true). In recent years, I’ve toyed with off-track things like Jez Higgins was 2017 Player of the Season for Kings Heath Hockey Club Mens IIIs and Jez is …​ so dedicated to the pursuit of software craftsmanship he once cycled to the conference from Birmingham. Both these things are true, and I like the way they have the kind of minor success anyone could achieve, but they really only work in a context where people already have an idea of who I am.

As the job wound down, I took advantage of our Slack channel to crowdsource (groupsource maybe, there were only six of us) my speaker bio with my colleagues. We came up with

Jez Higgins is a freelance software grandad. He mucks in with programming, lends a hand with build & deployment processes, provides a leg up with TDD practices, keeps an eye on the young 'uns so they don’t fall down the old mine shaft, that kind of thing.

which I’m pretty pleased with. I was also given a more corporate version

Jez Higgins is a long-standing freelance software professional. He gets his hands dirty with programming on a daily basis, has no problem improving build and deployment systems, and was doing TDD long before the term was on everyone’s lips. He is a mentor for the less experienced, trying to prevent them from falling down traditional software crevasses. He has seen the good and bad of software development throughout his career and keeps a pragmatic view on getting things done.

We finished the job on Friday. It’s only Wednesday and I miss my team already.

The JezUK 2019/2020 UK Tour

  • In the event of misfortune at Agile on the Beach On Tour - Birmingham Tech Week on the October 9th, I’m the emergency stand-by speaker. Ideally, nothing goes wrong, I get a good day out, and we all go home happy.

  • The Very Slow Time Machine gets a reprise at Worcester Source on November 27.

  • Over at nor(DEV):con over in Norwich next February, I’ll be serving up A Mouthful of C++, a short, probably quite intense, C++ taster workshop for people who already program in another language. This is something of a new thing, and I’ll be fretting about for a month and won’t stop until it’s done. nor(DEV):con, while typographically stylised, is a lovely little conference. Great price too.


Tagged on-tour

Thursday 05 September 2019 The Forest Road Reader, No 2.28 : STinC++ - Chapter One Wrap Up

I’ve reached the end of the chapter 1 and have written five programs:

  • copy - copy input to output

  • charcount - count the number of characters in the input

  • linecount - count the number of lines in the input

  • wordcount - count the number words in the input

  • detab - copy input to output, replacing tabs with the appropriate number of spaces

For Kernighan and Plauger, each program has been progressively more complex than the last - a simple loop, then a variable, a conditional, then multiple conditions, and several types of loop. They have, through the course of their five programs, managed to show off all the fundamental features of Pascal. In contrast, I have done no such thing in my efforts with C++.

Thanks to the functions provided in the C++ standard library, I have little increase in code complexity for the first four programs. I might have had to get hold of some slightly bigger concepts - iterators, input and output as sequences you can iterate over - but once those are familiar, my wordcount is no more complicated than where I started with copy.

Things did get slightly hairier for detab but not particularly so. There’s a little bit to grapple with around function objects, but nothing too strenuous. That I still haven’t written an explicit loop in C++ helps with understanding the code by eliminating what is essentially boilerplate.

It’s been fun! I’ve learned some new things already - about the tooling I’m using, and also about the C++ library itself. The idea of using std::transform to massage a sequence of characters into a sequence of strings is, in retrospect, obvious but I hadn’t thought of doing anything like that before.

While I’ve hardly mentioned it, all the code has been written in test-first fashion. At this early stage in my progress through the book, I’m pretty much only dealing with a single function, the boundary cases are obvious, plus, in most cases, I’ve barely written any code. Nonetheless, I’d feel a bit icky just diving straight in and, even with code this straightforward, I have found and prevented bugs that might otherwise have eluded me.

On to chapter two!


Tagged code, and software-tools-in-c++

Tuesday 03 September 2019 The Forest Road Reader, No 2.27 : STinC++ - detab

Suppose that you need to print a text file containing horizontal tab characters on a device that cannot interpret tabs. As a first approximation, you might be content with fixed tab stops every four columns …​ A tab character is thus replaced by from one to four spaces.

Software Tools shows its age properly for the first time here. Back in 1981 this was a genuine problem, and even 10 or more years later it was entirely possible send a printer into spasms if you weren’t careful. These days of course, printers behave perfectly all the time and never give anyone any bother at all.

While printing might be a solved problem, the question of tabs or spaces in source code remains an eternal problem. The motivation might have changed, but the tool remains relevant.

This final program in Software Tools' first chapter also turns the complexity screw another notch or two. I’m not going to reproduce the whole of the code, but here’s the core of the now familiar read/process/output loop

col := 1
while (getc(c) <> ENDFILE) do
  if (c = TAB) then
    repeat
      putc(BLANK)
      col := col + 1
    until (tabpos(col, tabstops))
  else if (c = NEWLINE) then begin
    putc(NEWLINE)
    col := 1
  end
  else begin
    putc(c);
    col := col + 1
  end
end;

We’ve got all the small parts Kernighan and Plauger have built up over the previous four programs - reading and outputing characters, counting characters, special processing on new lines. The new element is that repeat/until loop, which converts each tab by outputting the requisite number of spaces.

In my C++ programs, I’ve avoided direct output in the body of the code. I’ve used functions that return a value, or written out through iterators. Here, after reading a character I might need to write one or more characters in response. Or, put in a more C++y way, we need to convert a sequence of characters into a sequence of strings. Rummaging once again in the algorithm header leads us to std::transform, which applies a given function to a sequence and stores the result in another sequence. In other languages you might know this as map or apply. JavaScript’s Array.prototype.map, for example, returns a new array filled with the results of calling the provided function on each element of the calling array. The start of the my C++ implementation is then

    std::transform(
        std::istreambuf_iterator<char>(in),
        std::istreambuf_iterator<char>(),
        std::ostream_iterator<std::string>(out),
        ... something ...
    );

The …​ something …​ could be a function, a lambda, or a function object. Because the transformation I’m doing here needs a little bit of state - the action performed depends not just on the current character but also where we are in the line - I used a little function object.

detab.cpp
#include "detab.h"
#include "../../lib/chars.h"
#include "../../lib/tab_stops.h"

#include <algorithm>
#include <iostream>
#include <iterator>
#include <string>

struct detabber {
    size_t position_;
    std::string operator()(char c) {  (3)
        if (stiX::istab(c)) {
            const auto spaces = stiX::distance_to_next_tab_stop(position_); (4)
            position_ += spaces;
            return std::string(spaces, ' ');
        }

        if (stiX::isnewline(c)) (5)
            position_ = 0;
        else
            ++position_;

        return std::string(1, c); (6)
    }
};

namespace stiX {
    void detab(std::istream &in, std::ostream &out) {
        std::transform(
            std::istreambuf_iterator<char>(in),
            std::istreambuf_iterator<char>(),
            std::ostream_iterator<std::string>(out), (1)
            detabber() (2)
        );
    }
}

The implementation is straightforward, and not hugely dissimilar to the Pascal original.

  1. I’m reading the input as a stream of characters but outputting strings. The std::istreambuf_iterator<char> provides that unformatted input, while std::ostream_iterator<std::string> gives the appropriate output sink. I can’t recall ever having this in C++ before, but I certainly have in other language which are more relaxed about types.

  2. Create a detabber object that will perform the transformation. Because detabber is a simple object there’s no need for me to write a constructor, as the default behaviour - initialise position_ to 0 - is perfectly fine. std::transform takes the transformer argument by value, so you might expect this newly constructed detabber object to be copied as it’s passed in. However in situations like this, where we’re creating a object that has no lifetime beyond this one use, modern compilers are more than clever enough to omit that copy entirely.

  3. operator() is the function call operator, and is the defining feature of a C++ function object. Note that the parameter type char matches the underlying input iterator type, while the return type std::string matches the underlying output iterator type. For each character on the input, std::transform will call operator() with that character, expecting a std::string in return.

  4. If the provided character c is a tab, calculate where the next tab stop is and return a string of spaces to pad to it.

  5. If c is not a tab but is a newline reset position_, otherwise just bump along to the next position.

  6. Return c as a string. Creating a std::string for every character we see may not be ideal from a performance point of view (even though modern compilers are not only clever enough but actually required to omit copies here). However, I confess to being fairly relaxed on that front. Converting tabs to spaces is unlikely to ever be a time critical activity, and I’d rather think about getting it correct first and worry about performance if I have to. I may be atypical, but performance in time or space has rarely been in a problem I’ve had to deal with.

Source code

Source code for this program, indeed the whole project, is available in the stiX GitHub repository. detab is program 5 of Chapter 1.

Library References

Endnotes

This whole endeavour relies Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They’re both still in print, but there are plenty of second-hand editions floating round.

For this project I’ve been trying out JetBrain’s CLion, which is pretty great. CLion uses CMake to build projects. My previous flirtations with CMake, admittedly many years ago, weren’t a huge success. Not so this time - it’s easy to use and works a treat.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this is my first time really using it. I like it and will use it again.


Tagged code, and software-tools-in-c++

Wednesday 21 August 2019 The Forest Road Reader, No 2.26 : STinC++ - wordcount

After counting characters and then lines, Kernighan and Plauger take us next to counting words.

wordcount.pas
procedure wordcount;
var
    nl : integer;
    c : character;
    inword: boolean;
begin
    nw := 0;
    inword := false;
    while (getc(c) <> ENDFILE) do
        if (c = BLANK) or (c = NEWLINE) or (c = TAB) then
            inword := false
        else if (not inword) then begin
            inword := true
            nw := nw + 1
        end;
    putdec(nw, 1);
    putc(NEWLINE);
end;

You can see the little path Kernighan and Plauger are on quite clearly now. Start with a simple loop, then extend that loop with a simple counter. Next add a simple conditional, and now extend that conditional into a little state machine. Each little step has added something extra, changing the functionality of each program to provide a new and useful result.

My path has to been to look through the list of functions provided in the algorithm header and pick the one that does what’s needed for this task.

Kernighan and Plauger have a straightforward, although entirely reasonably, definition of a word - the maximal sequence of characters not containing a blank, a tab, or a newline. The additional complexity of Kernighan and Plauger’s wordcount over linecount is to keep track of whitespace delimiters between the words. They are, in effect, splitting up the sequence of characters into a sequence of words.

Previously, I blythely said that C++'s istream and ostream provide a number of different iterators. To get down and deal with the raw character stream we use istreambuf_iterator<char>. For formatted input, that is anything that needs a bit of work to process those raw characters in some way, we want some sort of istream_iterator.

Gathering characters up into whitespace delimited words qualifies as a bit of processing work. It’s the kind of thing people do all the time, and consequently is precisely what the istream_iterator<std::string> provides.

I rather loosely described std::distance(InputIt first, InputIt last) as counting the hops between first and last. More formally, it returns the number of iterator increments needed to go from first and last. Incrementing an istream_iterator<std::string> returns the next word, so plugging that into std::distance counts the number of words in the input.

wordcount.cpp
#include "wordcount.h"

#include <algorithm>
#include <iostream>
#include <iterator>

namespace stiX {
    size_t wordcount(std::istream& in) {
        return std::distance(
                std::istream_iterator<std::string>(in),
                std::istream_iterator<std::string>()
        );
    }
}

Source code

Source code for this program, indeed the whole project, is available in the stiX GitHub repository. wordcount is program 4 of Chapter 1.

Endnotes

This whole endeavour relies Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They’re both still in print, but there are plenty of second-hand editions floating round.

For this project I’ve been trying out JetBrain’s CLion, which is pretty great. CLion uses CMake to build projects. My previous flirtations with CMake, admittedly many years ago, weren’t a huge success. Not so this time - it’s easy to use and works a treat.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this is my first time really using it. I like it and will use it again.


Tagged code, and software-tools-in-c++

Tuesday 20 August 2019 The Forest Road Reader, No 2.25 : STinC++ - linecount

So once you can count characters, you might also want to count the number of lines in some input. Instead of counting every character on our input, I just count the number of newline characters I see.

linecount.pas
procedure linecount;
var
    nl : integer;
    c : character;
begin
    nl := 0;
    while (getc(c) <> ENDFILE) do
        if (c = NEWLINE) then
            nl := nl + 1;
    putdec(nl, 1);
    putc(NEWLINE);
end;

It’s at this point that Software Tools in Pascal does rather show its age. In their discussion of the code, Kernighan and Plauger say

The idea that text information is just a string of characters, with arbitrary length lines delimited by explicit NEWLINE characters, seems obvious when you think about how a typewriter or a terminal works. But for all its obviousness, it’s still an uncommon concept in many computing systems, where text must often be forced into either fixed length chunks reminiscent of cards or "records" with inconvenient properties.

There follows a paragraph or two about the implementation of their getc and putc primitives, and how they might be implemented on systems which have fixed length records, differing character sets, disk formats, or whatever. The point of these two functions is to insulate their code from the vagaries of the underlying system. Even if the implementations of getc and putc are trivial on a particular system, it is still worth doing.

But whatever the source or sink, we will stick with our interface and program in terms of typewriter-like text …​ Having a uniform representation for text solves much of the problem of keeping tools uniform.

This kind of insulation is part of what the standard library provides for us. I’ve built this code on three different operating systems, each with a three different file system, using four different compilers, and it’s all just worked. Kernighan and Plauger were not so blessed, and had to build their own low-level library as they went.

Of course our library doesn’t just have low-level stuff. It has mid-level stuff too, like the std::copy and std::distance functions I used previously, and like the std::count I’m using today.

linecount.cpp
#include "linecount.h"

#include <algorithm>
#include <iostream>
#include <iterator>

namespace stiX {
    size_t linecount(std::istream &in) {
        return std::count(
                std::istreambuf_iterator<char>(in),
                std::istreambuf_iterator<char>(),
                '\n'
        );
    }
}

New-line character

The use of \n as the line ending character is so entirely normal that we don’t usually think about it. While writing this, I did go on a hunt through the latest C++ draft standard for it, just to be certain this wasn’t some bit of folk wisdom. There are lots of references to new-line as an abstract concept, but two particular mentions of \n that stood out.

2.14.3 Character Literals

Note 3 describes how certain nongraphic characters and other potentially awkward characters like " can be represented in code with an escape sequence. First one listed? Our friend \n

new-line

NL(LF)

\n

In other words, \n always represents the new-line character on your system.

The other reference is in the description of std::endl.

27.7.3.8 Standard basic_ostream manipulators
namespace std {
  template <class charT, class traits>
    basic_ostream<charT,traits>& endl(basic_ostream<charT,traits>& os);
}

Effects: Calls os.put(os.widen(ā€™\nā€™)), then os.flush().
Returns: os.

In other words, if you want to output a new-line then stuff a \n down your ostream. Looking to the wider tradition, we know to avoid std::endl in favour of plain old \n.

Source code

Source code for this program, indeed the whole project, is available in the stiX GitHub repository. linecount is program 3 of Chapter 1.

Library References

Endnotes

This whole endeavour relies Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They’re both still in print, but there are plenty of second-hand editions floating round.

For this project I’ve been trying out JetBrain’s CLion, which so far has been pretty great. CLion uses CMake to build projects. My previous flirtations with CMake, admittedly many years ago, weren’t a huge success. Not so this time - it’s easy to use and works a treat.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this is my first time really using it. I like it and will use it again.


Tagged code, and software-tools-in-c++
Older posts are available in the archive or through tags.


Jez Higgins

Freelance software generalist
software created
extended or repaired

Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About