Jez Higgins

Freelance software generalist
software created
extended or repaired


Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About

Tuesday 03 September 2019 The Forest Road Reader, No 2.27 : STinC++ - detab

Suppose that you need to print a text file containing horizontal tab characters on a device that cannot interpret tabs. As a first approximation, you might be content with fixed tab stops every four columns …​ A tab character is thus replaced by from one to four spaces.

Software Tools shows its age properly for the first time here. Back in 1981 this was a genuine problem, and even 10 or more years later it was entirely possible send a printer into spasms if you weren’t careful. These days of course, printers behave perfectly all the time and never give anyone any bother at all.

While printing might be a solved problem, the question of tabs or spaces in source code remains an eternal problem. The motivation might have changed, but the tool remains relevant.

This final program in Software Tools' first chapter also turns the complexity screw another notch or two. I’m not going to reproduce the whole of the code, but here’s the core of the now familiar read/process/output loop

col := 1
while (getc(c) <> ENDFILE) do
  if (c = TAB) then
    repeat
      putc(BLANK)
      col := col + 1
    until (tabpos(col, tabstops))
  else if (c = NEWLINE) then begin
    putc(NEWLINE)
    col := 1
  end
  else begin
    putc(c);
    col := col + 1
  end
end;

We’ve got all the small parts Kernighan and Plauger have built up over the previous four programs - reading and outputing characters, counting characters, special processing on new lines. The new element is that repeat/until loop, which converts each tab by outputting the requisite number of spaces.

In my C++ programs, I’ve avoided direct output in the body of the code. I’ve used functions that return a value, or written out through iterators. Here, after reading a character I might need to write one or more characters in response. Or, put in a more C++y way, we need to convert a sequence of characters into a sequence of strings. Rummaging once again in the algorithm header leads us to std::transform, which applies a given function to a sequence and stores the result in another sequence. In other languages you might know this as map or apply. JavaScript’s Array.prototype.map, for example, returns a new array filled with the results of calling the provided function on each element of the calling array. The start of the my C++ implementation is then

    std::transform(
        std::istreambuf_iterator<char>(in),
        std::istreambuf_iterator<char>(),
        std::ostream_iterator<std::string>(out),
        ... something ...
    );

The …​ something …​ could be a function, a lambda, or a function object. Because the transformation I’m doing here needs a little bit of state - the action performed depends not just on the current character but also where we are in the line - I used a little function object.

detab.cpp
#include "detab.h"
#include "../../lib/chars.h"
#include "../../lib/tab_stops.h"

#include <algorithm>
#include <iostream>
#include <iterator>
#include <string>

struct detabber {
    size_t position_;
    std::string operator()(char c) {  (3)
        if (stiX::istab(c)) {
            const auto spaces = stiX::distance_to_next_tab_stop(position_); (4)
            position_ += spaces;
            return std::string(spaces, ' ');
        }

        if (stiX::isnewline(c)) (5)
            position_ = 0;
        else
            ++position_;

        return std::string(1, c); (6)
    }
};

namespace stiX {
    void detab(std::istream &in, std::ostream &out) {
        std::transform(
            std::istreambuf_iterator<char>(in),
            std::istreambuf_iterator<char>(),
            std::ostream_iterator<std::string>(out), (1)
            detabber() (2)
        );
    }
}

The implementation is straightforward, and not hugely dissimilar to the Pascal original.

  1. I’m reading the input as a stream of characters but outputting strings. The std::istreambuf_iterator<char> provides that unformatted input, while std::ostream_iterator<std::string> gives the appropriate output sink. I can’t recall ever having this in C++ before, but I certainly have in other language which are more relaxed about types.

  2. Create a detabber object that will perform the transformation. Because detabber is a simple object there’s no need for me to write a constructor, as the default behaviour - initialise position_ to 0 - is perfectly fine. std::transform takes the transformer argument by value, so you might expect this newly constructed detabber object to be copied as it’s passed in. However in situations like this, where we’re creating a object that has no lifetime beyond this one use, modern compilers are more than clever enough to omit that copy entirely.

  3. operator() is the function call operator, and is the defining feature of a C++ function object. Note that the parameter type char matches the underlying input iterator type, while the return type std::string matches the underlying output iterator type. For each character on the input, std::transform will call operator() with that character, expecting a std::string in return.

  4. If the provided character c is a tab, calculate where the next tab stop is and return a string of spaces to pad to it.

  5. If c is not a tab but is a newline reset position_, otherwise just bump along to the next position.

  6. Return c as a string. Creating a std::string for every character we see may not be ideal from a performance point of view (even though modern compilers are not only clever enough but actually required to omit copies here). However, I confess to being fairly relaxed on that front. Converting tabs to spaces is unlikely to ever be a time critical activity, and I’d rather think about getting it correct first and worry about performance if I have to. I may be atypical, but performance in time or space has rarely been in a problem I’ve had to deal with.

Source code

Source code for this program, indeed the whole project, is available in the stiX GitHub repository. detab is program 5 of Chapter 1.

Library References

Endnotes

This whole endeavour relies Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They’re both still in print, but there are plenty of second-hand editions floating round.

For this project I’ve been trying out JetBrain’s CLion, which is pretty great. CLion uses CMake to build projects. My previous flirtations with CMake, admittedly many years ago, weren’t a huge success. Not so this time - it’s easy to use and works a treat.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this is my first time really using it. I like it and will use it again.


Tagged code, and software-tools-in-c++


Jez Higgins

Freelance software generalist
software created
extended or repaired

Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About