Jez Higgins

Freelance software grandad
software created
extended or repaired


Follow me on Twitter
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed

Wednesday 30 December 2020 The Forest Road Reader, No 2.59 : STinC++ - quick sort

STinC++ rewrites the programs in Software Tools in Pascal using C++

We’re near the midpoint of Chapter 4 now. Having started by looking at bubble sort and Shell sort before addressing the practicalities of a text sort program, we step firmly back into the realms of Computer Science to look at quick sort.

To be honest, up until I prepped for this article, I didn’t understand how quick sort worked. I knew it had been invented by Tony Hoare, one of the great computer scientists. I knew too it was the algorithm you would expect to find at the heart any language’s library sort function [1]. In terms of how it worked, I knew quick sort involved chosing a pivot around which the sequence was divided into ever smaller subsequences, at the end of which it was somehow sorted, but I couldn’t have told you how it actually worked.

Kernighan and Plauger’s description is correct, but it didn’t really get me closer to be able to implement the algorithm.

Quick sort is best described as a recursive procedure. The essential idea is to partition the original set to be sorted by rearranging it into two groups - all those elements less than some arbitrary value chosen from the set, and all those greater than or equal to the value. Then the same partitioning process is applied to the two subsets in turn until each subset contains only one element. When all subsets have been partitioned, the original set is sorted.

They may, on this occasion, have fallen into the trap of being so familiar with the subject that their explanation is clear to them but not to this novice. I might just not be getting it. This Computerphile video got me over the hump.

Quick Sort

After watching it I did, at last, understand the procedure, although I wouldn’t implement a sort in the way described. I have some meta-knowledge here too, in that I’m aware you can implement quick sort in place, without additional memory allocation. Working in C++ means you can never pretend you haven’t got performance considerations lurking somewhere in the back of your mind 😃.

Fuelled up and ready to go

I’m on the sofa with my Dell laptop on one knee, a bottle of Bishops Finger on the coffee table next to me, and a half formed idea in my head of how to quick sort a sequence. Let’s see what goes wrong.

The best code is no code

I’m going to start with the simplest possible thing I can - a quick sort implementation that sorts an empty sequence.

template<class Iterator>
void quick_sort(Iterator& begin, Iterator& end) {
} // quick_sort

Obviously this is a bit silly, but you’ve got to start somewhere. There is something useful we can do next though. We know, from what we’ve been told, that quick sort is a recursive procedure, working on progressively smaller subsequences. Once a subsequence contains one or zero items, we’re at the limit and should stop the recursion.

template<class Iterator>
void quick_sort(Iterator begin, Iterator end) {
  if(std::distance(begin, end) <= 1)
    return;

  throw std::runtime_error("Oops");
} // quick_sort

Perfect.

Engage!

This is the place where I need to start thinking, and make my first moves to partition the sequence.

First, I need to choose what to use as the pivot value that will be used to partition the sequence into two. The obvious choices are the first value, pointed to by begin, or the last value, which is one before end. Of these, the first value is the easiest, so I’m choosing that.

  auto pivot = *begin;
  auto from = std::next(begin);

Let’s think, if we have a sequence

4 9 1 8 2 7

then the pivot value will be 4, and after partitioning we should end up with something like

1 2 4 9 8 7

If we have an iterator pointing to the left hand end of the sequence (from above, which I think I might rename) working up, and another pointing to the right hand end of the sequence working down then eventually they’ll meet in the middle.

4 9 1 8 2 7
  ^
          ^

While the left iterator points to a value lower than the pivot, it can advance. Here, it will stay put. While the right iterator points to a value higher than the pivot, it can move down. It is able to so let’s do it.

4 9 1 8 2 7
  ^
        ^

Now, neither can move, but they haven’t yet met. If we swap the two values they point to, they’ll both be freed up to move again.

4 2 1 8 9 7
  ^
        ^

Left can now advance to 1 and then to 8.

4 2 1 8 9 7
      ^
        ^

Right can move down to 8, where it meets left.

4 2 1 8 9 7
      ^
      ^

Blimey, this is working, more or less.

Steady on there, big man

Maybe left shouldn’t have gone all the way to 8. I’ll rewind slightly.

4 2 1 8 9 7
    ^
      ^

Perhaps this is the boundary condition, when the two iterators adjacent to each other. The left iterator points to where pivot fits in the sequence, so we can swap it into place.

1 2 4 8 9 7
    ^
      ^

Let me cast that into code.

<a few minute later>

Ok, that’s super fiddly. It’s much easier to let the two iterators meet each other, then wind left back a step afterwards.

  auto pivot = *begin;
  auto left = std::next(begin);
  auto right = std::prev(end);

  while (left != right) {
    while ((left != right) && (*left < pivot))
      left = std::next(left);

    while ((left != right) && (*right >= pivot))
      right = std::prev(right);

    if (left != right)
      std::iter_swap(left, right);
  }

  left = std::prev(left);
  std::iter_swap(left, begin);
Driven by test
  auto sample = std::vector { 4, 9, 1, 8, 2, 7 };
  stiX::quick_sort(sample);

  REQUIRE(sample == std::vector { 1, 2, 4, 8, 9, 7 });

Hmm. I do not love this code, especially the repeated left != right, but it works so far. I’m going to plug in the recursion and see what happens.

Fingers crossed
  quick_sort(begin, left);
  quick_sort(right, end);

Reader, I confess I let out a small exclamation of surprise. The sequence { 4, 9, 1, 8, 2, 7 } popped out all nicely sorted.

Moar tests

One swallow does not a summer make, nor does one test properly exercise a sorting algorithm. I have a pile of tests already hanging around, so I’m going to whammo those in and see what happens.

i should have called it quick not sort
Figure 1. 9 out of 13 tests failing. I should have called it quick-not-sort

Looks like I just got lucky with that first try.

Some of them are almost there,

---------------------------------------------------------------------------
Chapter 4 - quick sort
  sort { 3, 10, 9, 12, 8, 4, 1, 6, 5, 13, 2, 7, 11 }
---------------------------------------------------------------------------
/home/jez/work/jezuk/stiX/c++/chapter_4/0_sorts/quick_sort.cpp:70
...........................................................................

/home/jez/work/jezuk/stiX/c++/chapter_4/0_sorts/quick_sort.cpp:77: FAILED:
  REQUIRE( under_test == expected )
with expansion:
  { 1, 2, 3, 5, 6, 4, 7, 8, 9, 11, 10, 12, 13 }
  ==
  { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 }

while others aren’t even close.

---------------------------------------------------------------------------
Chapter 4 - quick sort
  sort { 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 }
---------------------------------------------------------------------------
/home/jez/work/jezuk/stiX/c++/chapter_4/0_sorts/quick_sort.cpp:70
...........................................................................

/home/jez/work/jezuk/stiX/c++/chapter_4/0_sorts/quick_sort.cpp:77: FAILED:
  REQUIRE( under_test == expected )
with expansion:
  { 2, 4, 6, 8, 9, 10, 7, 11, 5, 12, 3, 13, 1 }
  ==
  { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 }
Where did it all go wrong?

We’re a long way down the page. Here’s the current state of the code.

  template<class Iterator>
  void quick_sort(Iterator begin, Iterator end) {
    if(std::distance(begin, end) <= 1)
      return;

    auto pivot = *begin;
    auto left = std::next(begin);
    auto right = std::prev(end);

    while (left != right) {
      while ((left != right) && (*left < pivot))
        left = std::next(left);

      while ((left != right) && (*right >= pivot)
        right = std::prev(right);

      if (left != right)
        std::iter_swap(left, right);
    }

    left = std::prev(left);
    std::iter_swap(left, begin);

    quick_sort(begin, left);
    quick_sort(right, end);
  } // quick_sort

There’s no obvious error in the code, at least to me. It’s not crashing, for example, nor are the results entirely wacky. There’s a certain regularity to the errors. Many have groups of three elements almost but not quite there, as in the first example above. I’m especially intrigued by the second example though. What is going on there?

We’re starting with a sequence in descending order.

13 12 11 10 9 8 7 6 5 4 3 2 1
    ^
                            ^

Since all the values are lower than the pivot value of 13, the left hand iterator is going to advance all the way to 1, while the right hand iterator isn’t going to move at all.

13 12 11 10 9 8 7 6 5 4 3 2 1
                            ^
                            ^

So far so good - the while loop is ok - so next, I swap the pivot into place [2].

  left = std::prev(left);
13 12 11 10 9 8 7 6 5 4 3 2 1
                          ^
                            ^

You can see where this is going …​

  std::iter_swap(begin, left);
2 12 11 10 9 8 7 6 5 4 3 13 1
                          ^
                            ^

Oh dear. It’s going to be all downhill from here. The two subsequences are { 2, 12, 11 …​ 5, 4, 3 } and { 1 }. The second, shorter, sequence is already 'sorted', but I guess we’re now going to see a similar effect when sorting the first sequence.

2 12 11 10 9 8 7 6 5 4 3
   ^
                       ^

Since all the values are greater than the pivot value of 2, the left iterator won’t move, while the right will whizz down to meet it.

2 12 11 10 9 8 7 6 5 4 3
   ^
   ^

Decrementing left leaves it pointing at the pivot.

2 12 11 10 9 8 7 6 5 4 3
^
   ^

so the subsequent swap is a no-op. The two subsequences are the no-op { 2 } and { 12, 11, 10, 9, 8, 7, 6, 5, 4, 3 }, which is essentially identical to where we started. And so on and so on and so on.

What a balls up.

What we do, we update our formulas

In this case, left has advanced all the way to the right hand end of the sequence but is still pointing at a value which should be in the left hand partition. Rather than pointing one beyond where the pivot should be placed, it’s pointing to exactly where it should be placed. This presumably, is also happening in other sequences, perhaps further down in the recursion. The effect is particularly pronounced here because the elements are in reverse sort order.

Rather than always decrementing left before placing the pivot, only decrement if left points to a 'right-hand' value. That should fix it.

  if (*left >= pivot)
    left = std::prev(left);
  std::iter_swap(left, begin);

<deep breath> And …​

All tests pass. Now that’s a quick sort.

That’ll do for now

There’s a little more to do extending the interface to match std::sort but between this, loading the dishwasher, putting the dogs to bed, and all that other real-life malarkey, it’s rather later now than I had anticipated it being when I set off. That said, I’m pretty pleased with how it’s gone. I thought my way into the code pretty well, and then when it wasn’t right I was able to think my way back out again. I confess I don’t often do quite so much thinking, or at least not so much thinking before taking to code. Sorting algorithms aren’t really the kind of work that lends themselves to iterative development. You have to do a bit of upfront design work.

Source Code

Source code for this little routine, indeed the whole project, is available in the stiX GitHub repository.

Endnotes

This whole endeavour relies on Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They are both still in print, but new copies are, frankly, just ridiculously expensive. Happily, there are plenty of second-hand copies floating round, or you can borrow them from The Internet Archive using the links above.

For this project I’ve been using JetBrain’s CLion, which I liked enough to buy and then renew a license for.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this project is the first time really getting in and using it. I like it, and it’s what I use by default now.


1. This isn’t quite true. C++ standard libraries typically use Introsort, a hybrid of quick sort, heapsort, and insertion sort
2. I’ve fallen into Ron Jeffries habit of using 'we' for the good things and 'I' when it goes wrong. As he says, who else is there to blame?

Tagged code, and software-tools-in-c++


Jez Higgins

Freelance software grandad
software created
extended or repaired

Follow me on Twitter
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed