Jez Higgins

Freelance software grandad
software created
extended or repaired


Follow me on Twitter
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed

Monday 14 September 2020 The Forest Road Reader, No 2.55 : STinC++ - bubble sort

STinC++ rewrites the programs in Software Tools in Pascal using C++

Here we are in Chapter 4, which is all about sorting. Rather than getting straight into usable tools, Kernighan and Plauger divert slightly from their established course. The sorting chapter isn’t, they say, about sorting algorithms per se, but about packing up a sorting algorithm so that people can use it. A well put together program will be useful and hence used, but “if a sorting program is so poorly packaged that people feel compelled to write their own …​ the quality of its sorting algorithm is surely irrelevant”. I suspect the bit about writing your own sort program might be a reflection of the culture at Bell Labs in the 70s, but the point about the importance of human interface factors - even if your user interface is a command line on the console - is well made.

Before getting into the meat of their program, they do take a page or two to recap a couple of sorting algorithms, the first being bubble sort, and the second being shell sort. They simply present the code, and then briefly describe it. I’m choosing to use these as an excuse to take a little diversion too, and to use them as the basis for a couple of exercises.

Bubble Sort

My first little exercise is writing a bubble sort with an extra C++y twist - a bubble sort implementation with the same interface as std::sort, the standard sort. Just as an aide memoire, std::sort looks like this:

std::sort [1]
template<class RandomIt>
void sort(RandomIt first, RandomIt last);

template<class RandomIt, class Compare>
void sort(RandomIt first, RandomIt last, Compare comp);

std::sort is a generic function that takes a pair of iterators that delimit the sequence to be sorted. There’s an overload, which take a comparison function, should you wish to sort the sequence in the manner of your choosing.

That’s what I want my function to look like on the outside, but what goes on the inside?

How Bubble Sort Works

Here’s some random dice! We want to sort them so the lowest value is on the left, going up to the largest on the right.

a dice showing 4 a dice showing 6 a dice showing 2 a dice showing 1

Pass One

We’ll start by taking the first two dice and comparing them. We want the smaller value on the left.

a dice showing 4 a dice showing 6 a dice showing 2 a dice showing 1

Here, that’s already the case. There’s nothing to do, so we take the next pair.

a dice showing 4 a dice showing 6 a dice showing 2 a dice showing 1

These two are in the wrong order, so we swap them and move to the next pair.

a dice showing 4 a dice showing 2 a dice showing 6 a dice showing 1

This pair is also in the wrong order, so we swap those as well.

a dice showing 4 a dice showing 2 a dice showing 1 a dice showing 6

We’ve now completed one pass, and we can see the largest value - the six - has moved, been bubbled, the right and is in its proper place. Now we start again from the beginning.

Pass Two

a dice showing 4 a dice showing 2 a dice showing 1 a dice showing 6

Two is less than four, so we swap.

a dice showing 2 a dice showing 4 a dice showing 1 a dice showing 6

One is also smaller than four, so we swap this pair too.

a dice showing 2 a dice showing 1 a dice showing 4 a dice showing 6

We don’t need to compare the last pair, because we know we moved the six to the right place in the first pass. We’ve now completed the second pass, and bubbled the second largest value into its place. We go round again.

Pass Three

a dice showing 2 a dice showing 1 a dice showing 4 a dice showing 6

This time we only need to compare the first two dice. We swap them, and our sort is complete.

a dice showing 1 a dice showing 2 a dice showing 4 a dice showing 6

Sorted

Sorting four dice took three passes. The first pass had three comparisons, the second two, and the third has one comparision. That’s a pretty expensive sort, and you can see how the number of comparisions races away with larger datasets.

Live and In Colour

Rather than write an extended essay on how I wrote half a dozen lines of sorting code, I thought it might be fun - or at least a little different - to capture what I did, along with my confused looking face, in a short video.

stiX: Bubble Sort, implemented in C++ with a standard style interface
  • Bonus - see if you can spot where I was interrupted part way though

  • Bonus bonus - can you identify any of the books on the shelf behind me?

Video Postscript

While I was feeling pretty pleased with myself at the end of the video, I did do a little more work on the code afterwards.

bubble_sort.cpp
template<class Iterator, class Comparator = std::less<>>
void bubble_sort(Iterator begin, Iterator end, Comparator comparator = std::less<>()) {
  if (begin == end) return; (1)

  for (auto boundary = std::prev(end); boundary != begin; boundary = std::prev(boundary)) { (2)
    for (auto cur = begin; cur != boundary; cur = std::next(cur)) {
      auto next = std::next(cur); (3)
      if (comparator(*next, *cur)) {
        std::iter_swap(next, cur);
      } // if ...
    } // for ...
  } // for ...
}
  1. Oops! I hadn’t checked what happened with an empty sequence. It wasn’t good.

  2. Instead of calling operator-- and operator++ on the iterators, I switched instead to using std::prev and std::next. I dabbled for a while with std::advance but calling a function named advance to move an iterator backwards just seemed strange.

  3. Went back and forth trying to eliminate next, but couldn’t. Briefly considered

for (auto cur = begin, next = std::next(cur); cur != boundary; cur = std::next(cur), next = std::next(next))

but that really doesn’t help at all. It just looks so weird and uncomfortable.

There’s an optimisation we could make to detect if the sequence is sorted before needing all the passes, but it didn’t add to the fun, and it only occurred to me later.

Pretty sure this is the first time I’ve written a bubble sort, although I’m obviously in that demographic that has watched a bubble sort in action many, many times.

Reference

In everyday code, where you know the types you’re working with, let’s say you’ve got a std::vector<Something>, then you’re probably just going to call the begin and end member functions, and the iterator’s operator++. And you know, I think that’s absolutely fine. For library code though, when we can’t be as certain or when we don’t want to place unnecessary constraints on people using our code, then these little utility functions are just what we need.

Like the container and iterator utilities above, iter_swap and swap use the magic of template deduction, specialisation, and what-not to do the right thing for the types you’ve called the function with. Let’s say we’ve plugged a vector of vectors into the bubble sort. Using the classic two variable swap, each assignment is going to copy the whole vector, which is probably pretty bad in both time and space. Using iter_swap is, somewhere underneath it all, going to find the std::vector specialisation of std::swap, which is going to call the swap member function on one of those vectors, and that’s going to do it all in am exception-safe heartbeat. Chances are the compiler will inline it all too, so there won’t even be a function call overhead. If we invoke stiX::bubble_sort with a C style array of integers, the magic of std::begin and friends will generate a loop optimise for raw memory access.

Endnotes

This whole endeavour relies on Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They are both still in print, but new copies are, frankly, just ridiculously expensive. Happily, there are plenty of second-hand copies floating round, or you can borrow them from The Internet Archive using the links above.

For this project I’ve been using JetBrain’s CLion, which I liked enough to buy and renew, because it’s a year already, a license.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this project is the first time really getting in and using it. I like it, and it’ll be my first choice in the future.

I recorded the video using OBS, with a cheap noname USB mic and an old, although perfectly servicable, Logitech C525 webcam perched precariously on the top of one of my monitors.


1. Actually this isn’t quite the full picture (and is becoming even less so as I type) but it’s near enough for this exercise.

Tagged code, and software-tools-in-c++

Friday 28 August 2020 The Forest Road Reader, No 2.53 : STinC++ - Chapter Three Wrap Up

STinC++ rewrites the programs in Software Tools in Pascal using C++

  • compare - are these two files the same?

  • include - simple file inclusion

  • concat - file concatentation

  • print - paginating file printer

  • makecopy - file copy

  • archive parts 1, 2, 3, & 4 - file archiving utility

I’m not sure that a book written today would spend so much time on file i/o and file system manipulation. While it’s aims are rather different, Stroustrup’s A Tour of C++, one of the few "beginner’s books" I have my shelf, has a section on file i/o which runs to a total of 17 lines. Several times over the past six months (hey, crazy times) I’ve had to remind myself that, for some proportion of Software Tools in Pascal's original audience and perhaps a larger proportion of the audience for its predecessor Software Tools, programs couldn’t rove around the filesystem, that files couldn’t contain arbitrary text, nor could programs copy or move files around in the filesystem. Given that we all been able to do all these things, pretty much regardless of a programming language choice, for the last thirty years or more, most of the programs in this chapter feel a little inconsequential. This is particularly the case for makecopy which is barely more than a wrapper around a library function. The final program, archive, does feel more substantial though. I enjoyed working on it, and it’s really quite a satisfying piece of code.

I found out this morning that the source code for Software Tools in Pascal is on Brian Kernighan’s old Princeton University website in the archive format (well, almost - there’s some extra stuff after the first entry, and the length of MAN/archive.m is wrong). I used my version to extract everything and now I feel kind of validated.

On to chapter four, sorting.

Endnotes

This whole endeavour relies on Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They are both still in print, but new copies are, frankly, just ridiculously expensive. Happily, there are plenty of second-hand copies floating round, or you can borrow them from The Internet Archive using the links above.

For this project I’ve been using JetBrain’s CLion, which I liked enough to buy and renew, because it’s a year already, a license.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this project is the first time really getting in and using it. I like it, and it’ll be my first choice in the future.


Tagged code, and software-tools-in-c++

Monday 24 August 2020 The Forest Road Reader, No 2.52 : STinC++ - archive update, extract

STinC++ rewrites the programs in Software Tools in Pascal using C++

Last time out, back at the start of August, I left myself with three jobs to do on archive - factor out the common loop in the table, delete, and print commands, implement the update command, and finally do the extract command.

Factoring out the loop

I realised I had three functions that all looked pretty similar.

void an_archive_operation(
  ...
) {
  archive_in.peek();

  while (archive_in && !archive_in.eof()) {
    auto header_line = getline(archive_in);
    auto header = parse_header(header_line);

    ... about three lines of code doing the actual thing ...

    archive_in.peek();
  }
}

The loop itself, reading each entry in the archive before we decide what to do with it, isn’t that complex but it’s not entirely trivial and that archive_in.peek() is a little bit clever-clogsy. We’ve written it three times already and with two operations yet to go, we could end up doing it twice more. Let’s go the other way, do the loop once and just drop the interesting part of each operation into the middle.

table.cpp, archive_file.cpp
void table_archive(
  std::istream& archive_in,
  std::ostream& out
) { (1)
  read_archive( (2)
    archive_in,
    [&out]( (3)
      std::istream& archive_in,
      archive_file const& header
    ) {
      out << header.name << '\t' << header.filesize << '\n';

      skip_entry(archive_in, header);
    }
  );
} // table_archive

typedef std::function<
  void(
    std::istream&,
    archive_file const&
  )
> ArchiveReader; (5)

void read_archive(
  std::istream& archive_in,
  ArchiveReader reader
) {
  archive_in.peek();

  while(archive_in && !archive_in.eof()) {
    auto header_line = getline(archive_in);
    auto header = parse_header(header_line);

    reader(archive_in, header); (4)

    archive_in.peek();
  } // while ...
} // read_archive
  1. table_archive prints the archive’s table of contents to the standard output.

  2. read_archive implements our loop. Here at the call site, we need to hand it the work we need to do.

  3. I’ve chosen to wrap that work up in a lambda expression.

  4. Within read_archive we invoke the lambda using the normal function call syntax.

  5. This rather funky-looking typedef is the modern-day equivalent of a function pointer declaration. std::function is a general-purpose polymorphic function wrapper - it can wrap functions, function objects, lambda expressions, pointers to member functions, and more. You name it, std::function can wrap it.

Often in C++, you have the choice of doing something at compile time or at runtime. read_archive could have been written, almost identically, as a template function. Here, I made the runtime choice using std::function. For this application it doesn’t make any kind of noticeable difference - I’m trading an almost unmeasurably slower application against, perhaps, a triflingly longer build time.[1]

Update

The archive program’s update command replaces named members and adds new files onto the end of an existing archive. Kernighan and Plauger treat creating an archive and updating an archive as the same operation. Creation for them is an update where there’s nothing to replace. They also implement updating in place. If you’re updating, say, the second file in an archive, the new version will be also be the second file in the archive.

For me, creation and updating are not the same. Updating is an operation on an archive that already exists. It doesn’t seem ok to create one if it isn’t there. The order of files within the archive doesn’t seem important either. As a user, we can’t say give me the first and third file in the archive, we can only refer to the contents by name. To update an archive, we can remove any existing versions from the archive, then add the new versions to the end of archive. And hey, there’s code to that already.

This piece of code is at a slightly higher level than that previously shown, so it’s got a bit of the file manipulation detail.

update.cpp
void update(std::string const& archive, std::vector<std::string> const& files) {
  auto working = working_file(); (1)

  {
    auto archive_in = std::ifstream(archive);
    auto archive_out = std::ofstream(working);
    delete_from_archive(archive_in, files, archive_out);

    auto input_files = gather_input_files(files); (2)
    append_archive(input_files, archive_out);
  } (3)

  fs::rename(working, archive); (4)
} // update

Beautiful. Really pleased with this.

  1. C++ filesystem library has fs::temp_directory_path() but nothing to generate a temporary filename. working_file() simply creates a random filename, and appends to the temporary directory path.

  2. gather_input_files collects the named input files, ensures they exist and grabs their size. If a file is missing, we can deal with it here rather than later on when we’re down in the details.[2]

  3. These operations are in their own little block so that the archive_in and archive_out streams are destroyed, closing their underlying files before we move the modified working copy over the original.

  4. In C++ as in Unix, renaming a file is the same as moving it. Even if a file with the new name already exists, it’s overwritten.

Extract

That just left the extract operation. I started with a test

test_extract
  auto archive_in = std::istringstream(archive);

  auto filename = std::string();
  auto out = std::ostringstream();

  auto mock_writer = [&filename, &out](std::string const& f) -> std::ostream& {
    filename = f;
    return out;
  };

  stiX::extract_files(archive_in, to_extract, mock_writer);

  REQUIRE(out.str() == expected);

Just as I did with create, I’m not letting extract_files interact directly with the file system. Instead, I’m passing in a function that’ll do whatever needs to be done file-wise and hands back a stream. For the tests, I’m not going to let it go anywhere near a real filesystem.

Test in place, a quick cut/paste/edit of the print operation

extract.hpp
template<typename FileWriteOpener>
void extract_files(
  std::istream& archive_in,
  std::vector<std::string> const& files,
  FileWriteOpener file_opener
) {
  read_archive(
    archive_in,
    [&files, &file_write_opener](
      std::istream& archive_in,
      archive_file const& header
    ) {
      if (of_interest(files, header)) {
        auto& out = file_opener(header.name);
        copy_contents(archive_in, header, out);
      }
      else
        skip_entry(archive_in, header);
    }
  );
} // extract_archive

Boom - tests pass. Just need to hook up to the real file creation function, and job done right?

hooking up extract
std::ofstream file_creator(
  std::string const& filename
);

void extract(
  std::string const& archive,
  std::vector<std::string> const& files
) {
  auto archive_in = std::ifstream(archive);
  extract_files(archive_in, files, file_creator);
} // extract

and …​ oh

/home/jez/work/jezuk/stiX/c++/chapter_3/6_archive/./extract.hpp:23:17: error: cannot bind non-const lvalue reference of type ‘std::basic_ofstream<char>&’ to an rvalue of type ‘std::basic_ofstream<char>’
   23 |           auto& out = file_opener(header.name);
      |                 ^~~

Ok, so instead of auto& out = …​ it should be auto out = …​.

and …​ oh, again

/home/jez/work/jezuk/stiX/c++/chapter_3/6_archive/test/../extract.hpp:23:16: error: use of deleted function ‘std::basic_ostream<_CharT, _Traits>::basic_ostream(const std::basic_ostream<_CharT, _Traits>&) [with _CharT = char; _Traits = std::char_traits<char>]’
   23 |           auto out = file_opener(header.name);
      |                ^~~

Damned with a cannot bind non-const lvalue reference of type ‘std::basic_ofstream<char>&’ to an rvalue of type ‘std::basic_ofstream<char>’ if I do, damned with a use of deleted function ‘std::basic_ostream<_CharT, _Traits>::basic_ostream(const std::basic_ostream<_CharT, _Traits>&) if I don’t. This time the error has come while compiling the tests. It took me a little while to work this out. extract_files is a template function, and I’m instantiating it with two functions that have similar but ever so slightly different signatures.

close, but no cigar
std::ofstream file_creator(
  std::string const& filename
); // the real function

[&filename, &out](std::string const& f) -> std::ostream&
// the test mock, which is more or less
std::ostream& test_mock(std::string const& f);

One returns a stream, the other a stream reference. The rules for auto type deduction are the same as for template type deduction, which basically means the reference gets tossed away. If I force a reference, auto& out …​ it can’t bind to the object returned by file_creator. If we leave it as auto out …​, it’ll try to copy the ostream reference returned by test_mock, which fails because ostream is an abstract class and so can’t be copied.

What we need here is for out to take on the actual return type of the function parameter, and to do that we just change the rules!

decltype(auto) out = file_writer(header.name);

Now we’ll pick up the actual return type of the function, and everything clicks into place.

But wait, there’s more …​

Printing and extracting are almost the same. The extract implementation is just a light edit of print. That can’t stand - it’s a big fat duplication that can be factored away.

Bang, and the duplication is gone
template<typename FileWriteOpener>
void extract_files_to_sink(
  std::istream& archive_in,
  std::vector<std::string> const& files,
  FileWriteOpener file_writer
) {
  read_archive(
    archive_in,
    [&files, &file_writer](
      std::istream& archive_in,
      archive_file const& header
    ) {
      if (of_interest(files, header)) {
        decltype(auto) out = file_writer(header.name);
        copy_contents(archive_in, header, out);
      }
      else
        skip_entry(archive_in, header);
    }
  );
} // extract_files_to_sink

std::ostream& send_to_stdout(std::string const&) {
  return std::cout;
} // send_to_stdout

void print_files(
  std::istream& archive_in,
  std::vector<std::string> const& files
) {
  extract_files_to_sink(
    archive_in,
    files,
    send_to_stdout
  );
} // print_files

std::ofstream file_creator(
  std::string const& filename
) {
  ...
} // file_creator

void extract_files(
  std::istream& archive_in,
  std::vector<std::string> const& files
) {
  extract_files_to_sink(
    archive_in,
    files,
    file_write_opener
  );
} // extract_files

And that, at last, is that

Six weeks (crikey) of start-stop (mainly stop) work from when I started, archive is done, and so is chapter three. Blimey.

My first encounter with a file archiving program was in the summer of 1989 during which I had the use of a friend’s Amstrad CPC 6128. In between writing programs to calculate 4 colour Mandelbrot Sets (I’d set them running in the morning when I went out to work, returning several hours later to see what I’d find), I spent the time poking around his collection of floppy disks. Some of those disks had ARC files on them, and when I discovered those files had files inside them, it just blew my mind. Writing a file archiver of my own, albeit with a strong steer, is really quite satisfying.

Source Code

Source code for this program, indeed the whole project, is available in the stiX GitHub repository. archive is the sixth program of Chapter 3.

Endnotes

This whole endeavour relies on Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They’re both still in print, but new copies are, frankly, just ridiculously expensive. Happily, there are plenty of second-hand copies floating round, or you can borrow them from The Internet Archive using the links above.

For this project I’ve been using JetBrain’s CLion, which I liked enough to buy and renew a license.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this project is the first time really getting in and using it. I like it, and it’ll be my first choice in the future.


1. No, I didn’t even try to measure either of those things.
2. Of course, it’s possible that a file might be deleted in the time between to the call to gather_input_files and when we try to open it for reading, but I don’t feel that’s a case it’s reasonable to cover here. Or in most cases, come to that.

Tagged code, and software-tools-in-c++
Older posts are available in the archive or through tags.


Jez Higgins

Freelance software grandad
software created
extended or repaired

Follow me on Twitter
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed