Jez Higgins

Freelance software grandad
software created
extended or repaired


Follow me on Twitter
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed

Saturday 14 November 2020 The Forest Road Reader, No 2.57 : Culture, innit

Here are some things I have enjoyed listening to or watching or reading over the past little while. If you enjoy the same kind of things I do, then you’ll probably enjoy them too.

Listening

The Last Post - Daily news podcast from an alternate dimension.

The Sky Machine - When cloud formation specialist David Forrester and oceanographer Susan Leistrammer meet at a conference, they trigger events that plunge them into a world they never imagined …​ Exciting story, beautifully made.

Blue Hearts - Bob Mould's energetic spiky new album.

Seeker! The Ken Campbell Podcast - I really only came to appreciate Ken Campbell and his work after his death. This podcast features a number of the solo shows he performed later in his career. Plugging them directly into your ears is a transformative experience.

Thirteen Minutes To The Moon - Astronauts are amazing people.

Reading

Gideon the Ninth - Gloomy gothic lesbian witch fun in space.

Beasts of Burden: Wise Dogs and Eldritch Men - The dogs and cats of Burden Hill do their best to protect their neighbourhood from the supernatural events that threaten. Beasts of Burden might be the best horror comic published in the last, I don’t know, ages anyway. It’s terrific.

The Raven Tower - Nicely chewy story about a big stone.

The Spy Who Came In From The Cold - An absolute belter of a spy thriller.

2000AD - It calls itself the Galaxy’s Greatest Comic! and, you know what, that’s the absolute truth.

Watching

Picard - Thoroughly satisfying Star Trek revival.

Alias - J.J.Abrams' breathless double-crossing triple-crossing conspiracy-espionage action-adventure series is just as much ludicrous fun now as it was twenty years ago.

AEW Dynamite - I loves me some prowrestling, and AEW is delivering the wrestling I want to see right now. I’ve long been a fan of Eddie Kingston and the deep emotion he brings, and it’s been fantastic to see him take his place on a big stage with such confidence over the last few weeks.

Richard Osman’s House of Games - sometimes you just want something harmless, and this light, friendly, quiz-cum-gameshow is entirely harmless.


Tagged comics, and books

Sunday 08 November 2020 The Forest Road Reader, No 2.56 : STinC++ - shell sort

STinC++ rewrites the programs in Software Tools in Pascal using C++

I described last time that I was using the opening of chapter four as excuse to do something a little different. I’m going to do the same different thing again. (Although I don’t know if that means it’s still different, or is now the same? Language is so confusing.)

Shell Sort

As before, I’m going to implement a sort algorithm and put a nice C++ standard library face on it. You know, like this

std::sort
template<class RandomIt>
void sort(RandomIt first, RandomIt last);

template<class RandomIt, class Compare>
void sort(RandomIt first, RandomIt last, Compare comp);

Last time was bubble sort, this time we’ll look at Shell sort. Bubble sort is relatively easy to implement. As we run passes through the data the biggest values get carried - bubbled up - through to the end. We just keep doing that and everything drops out. It’s straightforward to understand and to visualise.

Shell sort is a little more involved.

How Shell Sort Works

The key idea in Shell sort, named for Donald Shell who described it in his PhD thesis in 1959, is to compare elements which are far apart first. Elements will be rapidly moved nearer to their destination. Subsequent comparisons with their nearer neighbours will then move them into place relatively quickly. We still make a number of passes though the data, but there are significantly fewer comparisons and fewer swaps than in a bubble sort or a simple insertion sort.

In each pass, we split the sequence into a number of sublists, each made of elements n apart. We sort each these sublists, using a simple insertion sort. We then reduce n and repeat the process, until n is 1 when the final pass will produce the fully sorted sequence.

To usefully visualise a Shell sort we need a few more elements than last time, so let’s put aside the dice and deal out some cards.

Three of Spades Ten of Spades Nine of Spades Queen of Spades Eight of Spades Four of Spades Ace of Spades Six of Spades Five of Spades King of Spades Two of Spades Seven of Spades Jack of Spades

Pass One

Let’s start with sublists formed of elements 6 apart. The first sublist will be the Three, Ace, and Jack.

Three of Spades, selected Ten of Spades Nine of Spades Queen of Spades Eight of Spades Four of Spades Ace of Spades, selected Six of Spades Five of Spades King of Spades Two of Spades Seven of Spades Jack of Spades, selected

We sort just those cards, giving us

Ace of Spades, swapped Ten of Spades Nine of Spades Queen of Spades Eight of Spades Four of Spades Three of Spades, swapped Six of Spades Five of Spades King of Spades Two of Spades Seven of Spades Jack of Spades

We move on to our next sublist, which is the Ten and the Six

Ace of Spades Ten of Spades, selected Nine of Spades Queen of Spades Eight of Spades Four of Spades Three of Spades Six of Spades, selected Five of Spades King of Spades Two of Spades Seven of Spades Jack of Spades

We sort those two and move on again, taking us to the Nine and Five

Ace of Spades Six of Spades, moved Nine of Spades, selected Queen of Spades Eight of Spades Four of Spades Three of Spades Ten of Spades, moved Five of Spades, selected King of Spades Two of Spades Seven of Spades Jack of Spades

We continue in this way until we’ve considered all the cards

Ace of Spades Six of Spades Five of Spades, moved Queen of Spades, selected Eight of Spades Four of Spades Three of Spades Ten of Spades Nine of Spades, moved King of Spades, selected Two of Spades Seven of Spades Jack of Spades

Ace of Spades Six of Spades Five of Spades Queen of Spades Eight of Spades, selected Four of Spades Three of Spades Ten of Spades Nine of Spades King of Spades Two of Spades, selected Seven of Spades Jack of Spades

Ace of Spades Six of Spades Five of Spades Queen of Spades Two of Spades, moved Four of Spades, selected Three of Spades Ten of Spades Nine of Spades King of Spades Eight of Spades, moved Seven of Spades, selected Jack of Spades

Ace of Spades Six of Spades Five of Spades Queen of Spades Two of Spades Four of Spades Three of Spades Ten of Spades Nine of Spades King of Spades Eight of Spades Seven of Spades Jack of Spades

That’s the end of our first pass. It’s not immediately obvious that the cards are more sorted, although the lower value cards are mainly to the left, and the higher value cards mainly to the right.

Second Pass

For our second pass, we’ll reduce the gap between cards to 3. We can see there will be fewer sublists to consider than in the first pass, but each sublist will be longer. However, hopefully we won’t have to make too many swaps because our first pass has partly sorted the cards already.

Ace of Spades, selected Six of Spades Five of Spades Queen of Spades, selected Two of Spades Four of Spades Three of Spades, selected Ten of Spades Nine of Spades King of Spades, selected Eight of Spades Seven of Spades Jack of Spades, selected

Ace of Spades Six of Spades, selected Five of Spades Three of Spades, moved Two of Spades, selected Four of Spades Jack of Spades, moved Ten of Spades, selected Nine of Spades Queen of Spades, moved Eight of Spades, selected Seven of Spades King of Spades, moved

Ace of Spades Two of Spades, moved Five of Spades, selected Three of Spades Six of Spades, moved Four of Spades, selected Jack of Spades Eight of Spades, moved Nine of Spades, selected Queen of Spades Ten of Spades, moved Seven of Spades, selected King of Spades

Ace of Spades Two of Spades Four of Spades, moved Three of Spades Six of Spades Five of Spades, moved Jack of Spades Eight of Spades Seven of Spades, moved Queen of Spades Ten of Spades Nine of Spades, moved King of Spades

That completes the second pass, and it’s looking pretty good.

Ace of Spades Two of Spades Four of Spades Three of Spades Six of Spades Five of Spades Jack of Spades Eight of Spades Seven of Spades Queen of Spades Ten of Spades Nine of Spades King of Spades

Third Pass

For the third pass, we’ll reduce gap once more. We’ll take it to 1, so this will be our final pass.

Some of the cards - Ace, Two, Eight, and King - are already in place, and several more - Three & Four, Five & Six, and Ten - are only one position away, so we’ll only have to do a few swaps to move each card to its final place. I’ve animated this final insertion sort, highlighting compares and swaps.

A final pass through the cards

Weird, huh? You can see the cards moving along into their final positions, but the sudden emergence of order always seems remarkable to me.

Ace of Spades Two of Spades Three of Spades Four of Spades Five of Spades Six of Spades Seven of Spades Eight of Spades Nine of Spades Ten of Spades Jack of Spades Queen of Spades King of Spades

And that’s that. The sequence is sorted.

Sorting thirteen cards with a bubble sort would need 78 comparisons. We’ve done rather better than here. The actual number of compare and swaps depends on the gap we use in each pass, the length of the sequence, and how sorted the sequence is initially. On average, for this gap progression, as originally described by Shell, the worst case complexity is O(N2) which is no better than a bubble sort. However, a bubble sort’s average complexity is also O(N2), while Shell sort’s average complexity is considerably better. The gap sequence can make quite a difference, with Robert Sedgewick, who knows a thing or two about sorting algorithms, reducing the worst case complexity to O(N⁴⁄₃) with two different gap sequences. There are, presumably, still PhDs to be had from studying Shell sort.

Even though Shell sort is only a small step up in complexity from bubble sort, I still had to go through the process by hand several times before it clicked. Frankly, all sort algorithms more involved than a bubble sort or insertion sort seem like magic.

Ok, let’s code

I was going to record another video, but never quite managed to line everything up. Probably a good thing. I ended up writing the code in three little episodes of about a two and half hours in total, so it would probably have dragged a bit.

For the bubble sort, I focussed my tests on shaping the interface to the function. This time, I took that as a given.

template<class Iterator, class Comparator = std::less<>>
void shell_sort(Iterator begin, Iterator end, Compare comparator = Comparator()) {
}

Well, that was easy. But what goes in the function body?

Sort with a Sort

Performing a Shell sort involves repeated applying insertion sort to different parts of the sequence. The insertion sorted seemed, therefore, like the place to start. Before long, I came up with this little eyesore

Simple insertion sort
  template<class Iterator, class Comparator = std::less<>>
  void insertion_sort(Iterator begin, Iterator end, Comparator comparator = Comparator()) {
    for (auto cur = begin; cur != std::prev(end); cur = std::next(cur)) {
      auto ref = cur;
      auto next = std::next(ref);
      while ((next != begin) && comparator(*next, *ref)) {
        auto prev = std::prev(ref);
        std::iter_swap(ref, next);
        next = ref;
        ref = prev;
      }
    }
  } // insertion_sort

This is a simple insertion sort - scan through the sequence, comparing pairs of adjacent items. When a pair need to be swapped, walk back down the sequence to see if we need to do any more swaps.

As shown above though, the Shell sort we need to consider items which are some distance apart. I did consider building a list of iterators to those items, but it was easier to pass the gap we wanted in as a parameter.

Insertion sort with gap
  template<class Iterator, class Comparator = std::less<>>
  void insertion_sort(Iterator begin, Iterator end, size_t gap, Comparator comparator = Comparator()) {
    auto boundary = std::distance(begin, end) - gap;
    auto cur = begin;
    for (auto i = 0; i < boundary; i += gap) {
      auto ref = cur;
      auto next = std::next(ref, gap);
      while ((next != begin) && comparator(*next, *ref)) {
        auto prev = std::prev(ref, gap);
        std::iter_swap(ref, next);
        next = ref;
        ref = prev;
      }
      cur = std::next(cur, gap);
    }
  } // insertion_sort

There were a couple of intermediate steps, but the evolution is pretty obvious. Applying this insertion sort, in the way described above, sorted the sequence in the correct way. Hurrah, I guess? Two problems - it’s ugly as hell, and it steps squarely into undefined behaviour. It might have sorted the sequence, but the presence of undefined behaviour means I can’t claim that it actually works. The naughty line of code is

    auto prev = std::prev(ref); // possibly oopsy

The iterators next and ref point to the two values we’re looking at. If they need to be swapped, then we start checking back through the sequence to see if we need to do further swaps. Consider:

  [ 99, 1, ...]
    ^   ^
    !   \------ next
    +---------- ref
    !
    \---------- begin
  while ((next != begin) && comparator(*next, *ref)) { (1)
    std::iter_swap(ref, next); (2)
    next = ref; (3)
    ref = std::prev(ref); (4)
  }
  1. next is not equal to begin and the two values should be swapped

  2. swap the values

  3. repoint next to point to the item pointed at by ref

  4. move ref to point the prior item in the sequence, but there isn’t an item to move to because ref is already at the start of the sequence

In this case, ref would be moved beyond the beginning of the sequence. That’s undefined behaviour and, as the old timers have it, the compiler writers are free to halt and catch fire. In practice, simply decrementing off the beginning is unlikely to be bad, but let’s not even entertain the possibility.

It took me a little while to think my way around this, but eventually I got to this. It’s longer, but I believe both correct and easier to read.

Defined behaviour
  template<class Iterator, class Comparator = std::less<>>
  void insertion_sort(Iterator begin, Iterator end, size_t gap, Comparator comparator = Comparator()) {
    auto length = std::distance(begin, end);
    if (length <= gap)
      return;

    auto steps = length - gap;
    auto current = begin;
    for (auto i = 0; i < steps; i += gap) {
      auto next = std::next(current, gap);

      if (comparator(*next, *current)) {
        std::iter_swap(current, next);
        back_propagate(current, begin, gap, comparator);
      }

      current = next;
    }
  } // insertion_sort

There, all the unpleasantness is hidden away in the back_propagate function. Oh, what does that look like?

back_propagate
  template<class Iterator, class Comparator>
  void back_propagate(Iterator current, Iterator const begin, size_t gap, Comparator comparator) {
    if (current == begin)
      return;

    do {
      auto prev = std::prev(current, gap);
      if (comparator(*current, *prev))
        std::iter_swap(prev, current);
      current = prev;
    } while (current != begin);
  } // back_propagate

The back_propagate function is only concerned with checking to see how far back down the sequence we need to go. As soon as our current position hits the start of the sequence it bails out, so we only ever move back when it’s safe to do so. The earlier code was trying to do two similar but subtly different things, and that’s why it was broken.

At last, Shell sort!

Watching me go back and forth on this would have comprised about 95% of any video I’d made. With a sound insertion sort implementation, putting together the Shell sort took about three minutes.

  template<class Iterator, class Comparator = std::less<>>
  void shell_sort(Iterator begin, Iterator end, Comparator comparator = Comparator()) {
    auto length = std::distance(begin, end);
    for (auto gap = length/2; gap != 0; gap /= 2) {
      auto from = begin;
      for (auto offset = 0; offset != gap; ++offset, std::advance(from, 1))
        insertion_sort(from, end, gap, comparator);
    }
  } // shell_sort

  template<class Container, class Comparator = std::less<>>
  void shell_sort(Container&& sample, Comparator comparator = Comparator()) {
    shell_sort(std::begin(sample), std::end(sample), comparator);
  } // shell_sort

This is the 'classic' Shell sort, with the gap progression of ᴺ⁄₂, ᴺ⁄₄, …​, 1 first proposed by Shell. For other progressions, you’d need some different formula for gap or possibly even a look up table. I did a quick search but couldn’t find any online examples that did anything more complex than this. Now this code is part of that collection of basic Shell sorts too.

And that’s that

Hey friends, this feels like it’s been a long one. Thanks for sticking with me. Please do get in touch by email or on Twitter if you feel so moved.

Post script

How do you know your sort implementation worked when its implementation uses another sort implementation that will non-optimally sort the input even if you got the outer sort wrong? is an interest question, and one I had to deal with here. In the end I did it by inspection. I put together a known test case - the hand of cards shown above - and stepped through the code in the debugger to verify each pass. If I was feeling particularly motivated, I could have put together a special test container type to help verify things. Perhaps it would record each change or count the number of assignments or something like that. Algorithmic complexity is one of those areas it’s hard to test without specialised tooling.

Endnotes

This whole endeavour relies on Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They are both still in print, but new copies are, frankly, just ridiculously expensive. Happily, there are plenty of second-hand copies floating round, or you can borrow them from The Internet Archive using the links above.

For this project I’ve been using JetBrain’s CLion, which I liked enough to buy and renew a license for, because it’s gone a year already.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this project is the first time really getting in and using it. I like it, and it’ll be my first choice in the future.


Tagged code, and software-tools-in-c++

Monday 14 September 2020 The Forest Road Reader, No 2.55 : STinC++ - bubble sort

STinC++ rewrites the programs in Software Tools in Pascal using C++

Here we are in Chapter 4, which is all about sorting. Rather than getting straight into usable tools, Kernighan and Plauger divert slightly from their established course. The sorting chapter isn’t, they say, about sorting algorithms per se, but about packing up a sorting algorithm so that people can use it. A well put together program will be useful and hence used, but “if a sorting program is so poorly packaged that people feel compelled to write their own …​ the quality of its sorting algorithm is surely irrelevant”. I suspect the bit about writing your own sort program might be a reflection of the culture at Bell Labs in the 70s, but the point about the importance of human interface factors - even if your user interface is a command line on the console - is well made.

Before getting into the meat of their program, they do take a page or two to recap a couple of sorting algorithms, the first being bubble sort, and the second being shell sort. They simply present the code, and then briefly describe it. I’m choosing to use these as an excuse to take a little diversion too, and to use them as the basis for a couple of exercises.

Bubble Sort

My first little exercise is writing a bubble sort with an extra C++y twist - a bubble sort implementation with the same interface as std::sort, the standard sort. Just as an aide memoire, std::sort looks like this:

std::sort [1]
template<class RandomIt>
void sort(RandomIt first, RandomIt last);

template<class RandomIt, class Compare>
void sort(RandomIt first, RandomIt last, Compare comp);

std::sort is a generic function that takes a pair of iterators that delimit the sequence to be sorted. There’s an overload, which take a comparison function, should you wish to sort the sequence in the manner of your choosing.

That’s what I want my function to look like on the outside, but what goes on the inside?

How Bubble Sort Works

Here’s some random dice! We want to sort them so the lowest value is on the left, going up to the largest on the right.

a dice showing 4 a dice showing 6 a dice showing 2 a dice showing 1

Pass One

We’ll start by taking the first two dice and comparing them. We want the smaller value on the left.

a dice showing 4 a dice showing 6 a dice showing 2 a dice showing 1

Here, that’s already the case. There’s nothing to do, so we take the next pair.

a dice showing 4 a dice showing 6 a dice showing 2 a dice showing 1

These two are in the wrong order, so we swap them and move to the next pair.

a dice showing 4 a dice showing 2 a dice showing 6 a dice showing 1

This pair is also in the wrong order, so we swap those as well.

a dice showing 4 a dice showing 2 a dice showing 1 a dice showing 6

We’ve now completed one pass, and we can see the largest value - the six - has moved, been bubbled, the right and is in its proper place. Now we start again from the beginning.

Pass Two

a dice showing 4 a dice showing 2 a dice showing 1 a dice showing 6

Two is less than four, so we swap.

a dice showing 2 a dice showing 4 a dice showing 1 a dice showing 6

One is also smaller than four, so we swap this pair too.

a dice showing 2 a dice showing 1 a dice showing 4 a dice showing 6

We don’t need to compare the last pair, because we know we moved the six to the right place in the first pass. We’ve now completed the second pass, and bubbled the second largest value into its place. We go round again.

Pass Three

a dice showing 2 a dice showing 1 a dice showing 4 a dice showing 6

This time we only need to compare the first two dice. We swap them, and our sort is complete.

a dice showing 1 a dice showing 2 a dice showing 4 a dice showing 6

Sorted

Sorting four dice took three passes. The first pass had three comparisons, the second two, and the third has one comparision. That’s a pretty expensive sort, and you can see how the number of comparisions races away with larger datasets.

Live and In Colour

Rather than write an extended essay on how I wrote half a dozen lines of sorting code, I thought it might be fun - or at least a little different - to capture what I did, along with my confused looking face, in a short video.

stiX: Bubble Sort, implemented in C++ with a standard style interface
  • Bonus - see if you can spot where I was interrupted part way though

  • Bonus bonus - can you identify any of the books on the shelf behind me?

Video Postscript

While I was feeling pretty pleased with myself at the end of the video, I did do a little more work on the code afterwards.

bubble_sort.cpp
template<class Iterator, class Comparator = std::less<>>
void bubble_sort(Iterator begin, Iterator end, Comparator comparator = std::less<>()) {
  if (begin == end) return; (1)

  for (auto boundary = std::prev(end); boundary != begin; boundary = std::prev(boundary)) { (2)
    for (auto cur = begin; cur != boundary; cur = std::next(cur)) {
      auto next = std::next(cur); (3)
      if (comparator(*next, *cur)) {
        std::iter_swap(next, cur);
      } // if ...
    } // for ...
  } // for ...
}
  1. Oops! I hadn’t checked what happened with an empty sequence. It wasn’t good.

  2. Instead of calling operator-- and operator++ on the iterators, I switched instead to using std::prev and std::next. I dabbled for a while with std::advance but calling a function named advance to move an iterator backwards just seemed strange.

  3. Went back and forth trying to eliminate next, but couldn’t. Briefly considered

for (auto cur = begin, next = std::next(cur); cur != boundary; cur = std::next(cur), next = std::next(next))

but that really doesn’t help at all. It just looks so weird and uncomfortable.

There’s an optimisation we could make to detect if the sequence is sorted before needing all the passes, but it didn’t add to the fun, and it only occurred to me later.

Pretty sure this is the first time I’ve written a bubble sort, although I’m obviously in that demographic that has watched a bubble sort in action many, many times.

Reference

In everyday code, where you know the types you’re working with, let’s say you’ve got a std::vector<Something>, then you’re probably just going to call the begin and end member functions, and the iterator’s operator++. And you know, I think that’s absolutely fine. For library code though, when we can’t be as certain or when we don’t want to place unnecessary constraints on people using our code, then these little utility functions are just what we need.

Like the container and iterator utilities above, iter_swap and swap use the magic of template deduction, specialisation, and what-not to do the right thing for the types you’ve called the function with. Let’s say we’ve plugged a vector of vectors into the bubble sort. Using the classic two variable swap, each assignment is going to copy the whole vector, which is probably pretty bad in both time and space. Using iter_swap is, somewhere underneath it all, going to find the std::vector specialisation of std::swap, which is going to call the swap member function on one of those vectors, and that’s going to do it all in am exception-safe heartbeat. Chances are the compiler will inline it all too, so there won’t even be a function call overhead. If we invoke stiX::bubble_sort with a C style array of integers, the magic of std::begin and friends will generate a loop optimise for raw memory access.

Endnotes

This whole endeavour relies on Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They are both still in print, but new copies are, frankly, just ridiculously expensive. Happily, there are plenty of second-hand copies floating round, or you can borrow them from The Internet Archive using the links above.

For this project I’ve been using JetBrain’s CLion, which I liked enough to buy and renew, because it’s a year already, a license.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this project is the first time really getting in and using it. I like it, and it’ll be my first choice in the future.

I recorded the video using OBS, with a cheap noname USB mic and an old, although perfectly servicable, Logitech C525 webcam perched precariously on the top of one of my monitors.


1. Actually this isn’t quite the full picture (and is becoming even less so as I type) but it’s near enough for this exercise.

Tagged code, and software-tools-in-c++
Older posts are available in the archive or through tags.


Jez Higgins

Freelance software grandad
software created
extended or repaired

Follow me on Twitter
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed