Jez Higgins

Freelance software grandad
software created
extended or repaired


Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed

Yr glygfa o Hescwm Uchaf, rhif 7 : To see a World in a Grain of Sand

In my piece about my sadness and disappointment about the candidates we were getting for interview, I talked about the refactoring exercise we give people, and the conversations we have afterwards.

I’m not able to show any of that code, but I am going to talk about some code here of the type we often see. According to the version history, it’s passed through a number of hands, and I want to be clear I know none of the people involved nor have I spoken to them. They are, though, exactly the type of person presenting themselves for interview, and so for my purposes here they are exemplars.

This is not about them, though. I hold them blameless, and wish them only happiness. This is about the places that they worked, about the wider trade, about a culture that says this is fine.

To see a World in a Grain of Sand

Here’s some Python code. It’s from a larger document processing pipeline. Documents come shoved into the system, get squished around a bit, have metadata added, some formatting fixups, then squirt out the other end as nice looking pdfs. Standard stuff.

Documents can have references to other documents, both within the existing corpus, and to a variety of external sources. These references have standard forms, and when we find something that looks like a document reference, we do a bit of work to make sure it’s absolutely clean and proper.

That’s where this function, normalise_reference, comes in.

def normalise_reference(reference_type, reference_match, accepted_form):
    if (
        (reference_type == "RefYearAbbrNum")
        | (reference_type == "RefYearAbbrNumTeam")
        | (reference_type == "YearAbbrNum")
    ):
        components = re.findall(r"\d+", reference_match)
        year = components[0]
        d1 = components[1]
        d2 = ""
        corrected_reference = accepted_form.replace("dddd", year).replace("d+", d1)

    elif (
        (reference_type == "RefYearAbbrNumNumTeam")
        | (reference_type == "RefYearAbrrNumStrokeNum")
        | (reference_type == "RefYearNumAbbrNum")
    ):
        components = re.findall(r"\d+", reference_match)
        year = components[0]
        d1 = components[1]
        d2 = components[2]
        corrected_reference = (
            accepted_form.replace("dddd", year).replace("d1", d1).replace("d2", d2)
        )

    elif (
        (reference_type == "AbbrNumAbbrNum")
        | (reference_type == "NumAbbrNum")
        | (reference_type == "EuroRefC")
        | (reference_type == "EuroRefT")
    ):
        components = re.findall(r"\d+", reference_match)
        year = ""
        d1 = components[0]
        d2 = components[1]
        corrected_reference = accepted_form.replace("d1", d1).replace("d2", d2)


    return corrected_reference, year, d1, d2

I have obfuscated identifiers in the code sample, but its structure and behaviour are as I found it.

I’d been kind of browsing around a git repository, looking at folder structure, getting the general picture. A chunk of the system is a Django webapp and thus has that shape, so I went digging for a bit of the meat underneath. This was almost the first thing I saw and, well, I kind of flinched. Poking around some more confirmed it’s not an anomaly. It is representative of the system.

You’ve probably had some kind of reaction of your own. This is what immediately leapt out at me

  • The length

  • The width!

  • The visual repetition, both in the if conditions and in the bodies of the conditionals

  • The string literals

  • The string literal with the spelling mistake

  • The extraneous brackets in the second conditional body - written by someone else?

  • The extra line before the return - functionally, of course, it makes no difference, but still, urgh

Straightaway I’m thinking that more than one person has worked on this over time. That’s normal, of course, but I shouldn’t be able to tell. If I can, it’s a contra-indicator.

Looking a little longer, there’s a lot of repetition - in shape, and in detail. Looking a little longer still, and I think function parameters are in the wrong order. reference_type and accepted_form are correlated and originate within the system. They should go together. It’s reference_match which comes from the input document, it’s the only true variable and so, for me anyway, should be the first parameter. I suspect this function only had two parameters initially, and the third was added without a great deal of thought to the aesthetics of the change.

That’s a lot to not like in not a lot of code.

But at least there are tests

And hurrah for that! There are tests for this function, tangled up in a source file with some unrelated tests that pull in a mock database connection and some other business, but they do exist.

There are two test functions, one checking well-formed references, the other malformed references, but, in fact, each function checks multiple cases.

It’s a start, but the test code is much the same as the code it exercises - long and repetitious - which isn’t, perhaps, that surprising. A quick visual check shows they’re deficient in other, more serious ways. There are ten reference types named in normalise_reference. The tests check seven of them and, in fact, there is a whole branch of the if/else ladder that isn’t exercised. That’s the branch I already suspect of being a later addition.

Curiously too, while normalise_reference returns a 4-tuple, the tests only check the corrected reference and the year, ignoring the other two values. That sent me off looking for the normalise_reference call sites, where all four elements of the tuple are used. Again, I’d suggest the 4-tuple came in after the tests were first written and were not updated to match. After all, they still passed.

I am sure these tests post-hoc. They did not inform the design and development of the code they support.

Miasma

If the phrase coming to mind is code smells, then I guess you’re right. This code is a stinky bouquet of bad odours, except they aren’t clues to some deeper problem with the code. We don’t need clues - it’s right out there front and centre. No, these smells emanate from with the organisation, from a failure to develop the programmers whose hands this code has passed through. The code works, let’s be clear, but there’s a clumsiness to it and a lack of care in its evolution. That’s a cultural and organisational failing.

I keep saying this is about organisations. I’m not saying these are bad places to work, where maladjusted managers delight in making their underlings squirm. Quite the contrary, I’ve worked at more than one of the organisations responsible for the code above and had a great time. But there is something wrong - an unacknowledged failure. An unknown failure even. There so much potential, and it’s just not being taken up

I came across this code because I was talking about potentially work on it, going back into one of those organisations. That didn’t pan out, but had I been able I would absolutely have signed up for it. It’s fascinating stuff and right up a multiplicity of my alleys.

Let’s imagine for a moment that I was sitting down for my first day on this job, what would I do with this code? Well, at a guess, nothing. Well, nothing until I needed to, and then I’d spent a bit of time on it. But I’d absolutely be talking to my new colleagues about, well, everything.


Tagged the-trade

Yr glygfa o Hescwm Uchaf, rhif 6 : We’ve failed them

Over the past four or five months, I’ve been pulled into the recruiting process where I’m working. I say pulled in because I didn’t have any prior experience or training in recruiting, and because I’m not sure our process is particularly great. I don’t think it’s bad, necessarily, but it is quite drawn out. At least I think so, although people tell me it’s not unusual for a company of that type.

As part of the process, we give people a "refactoring exercise". It’s a small chunk of not very good real code, and ask people to make it better. We’re deliberately a bit vague about what better might be, because we don’t want to steer them in any particular direction. They can take as much or as little time as they like and, although few do, are free to submit code, notes, tests, and whatever else they might wish too. Afterwards, we have a good read through, then have a call with the candidate to talk about what they’ve done, how they found it, what they were thinking about and aiming for, do a bit of pairing on the code, explore the paths untaken.

It’s in this conversation where I generally get involved and, actually, it can be a lot of fun. People take the code in all kinds of ways, and have lots of different ideas, and I’ve had some really good conversations with people.

When it’s not fun, though, it can be really quite depressing. People who look like they should be good candidates - six or seven or eight years in, working in the same sector, for well known companies, who are bright and pleasant - turn out to be pretty poor programmers.

We’re not looking for perfection. I’m not looking for people to reproduce what I did when I started - I’m over 50 and been working four, five, even six times longer for a start. I was able to treat the whole recruitment process in a way they can’t. What I’m looking for is promise.

When I did the refactoring exercise, I treated as a piece of theatre. I took a day, and really sank myself into it, ending up submitting as radical a refactoring as I could, wrapped with a set of characterising unit tests, a pageful of notes, and a git archive with 70 or 80 commits in it. I was, in very real way, showing off.

People earlier in their career don’t have that freedom, which is fine, and they often seem to lack confidence. I’ve had a number of candidates who take the code in a particular direction and don’t follow through. It’s like they’ve set off on a path, then rein themselves in. In our conversations, I try to encourage to say yeah, let’s go down that path and see where it takes us. And some can, and some really can’t.

If, for example, someone says "I added this interface to make things more abstract" it’s natural for us to ask how it does that, why is this a good change to make. It’s kind of awkward when they can’t answer. I don’t mean they’re not expressing themselves well, I mean can’t answer.

We offer the exercise in Python, C++, C#, and probably something else I’ve forgotten. Each language cohort has it’s own particular tics, and immediately creating interfaces to important looking classes seems to be some kind of C# reflex.

Some candidates make good looking large scale changes, but seem almost blind to the smaller scale things. I have had people expressly say that didn’t write tests because there was no need, but then stumble when asked how they know the code still works as before. In as many ways as people do well, there are people who are simply floundering.

I do not blame a single one of them. I wish we could hire them, I really do, because I think I have colleagues here who could really show them what working with code can be, what a joy, what a challenge. I don’t doubt that all of them could be terrific programmers, who could really contribute, and have a good time doing it. But we can’t, or at least we can’t take them all.

They aren’t failing candidates, they’re candidates that have been failed.

They’ve been failed by the trade at large, because collectively we know, or at least have some pretty good ideas, about how people can learn and grow their skills.

They’ve been failed by their past employers, for not having the desire to help their own staff develop, not having the structures in place.

And they’ve been failed by me and my peers, who do know our stuff. It doesn’t take a huge amount of time to really make a difference - half an hour here, a day or two there, even just a suggestion can change someone’s course for the better. We should be able to that in spite of our employers' apathy, and I think we have a responsibility to. Sadly, if the people presenting themselves for interview with us are a guide, we’ve been pretty remiss.


Tagged the-trade

Yr glygfa o Hescwm Uchaf, rhif 5 : Until It’s In Your Hands

STinC++ rewrites the programs in Software Tools in Pascal using C++

I’ve been noodling away on this little amusement of writing the Kerninghan and Plauger’s software tools, of Software Tools and Software Tools in Pascal fame, in C++ for over four years now. We’ll get there. At the moment (and by moment, I mean past 15 months) I’m poking at edit, a cut down version of ed, the original Unix editor.

It’s actually at a place now where it’s usable - as usable as a line editor is anyway - so long as you don’t want to save anything. As I was finishing up for the evening, I was having a little play around and discovered that once I’d entered an erroneous command, every subsequent command, whether correct or not, errored as well.

Digging into the code, I quickly realised that the current line was being set to an error state, and just staying that way.

The fix was obvious, but then I looked again at the code and it just irked me.

auto line = stiX::getline(in); (1)

auto parsed_command = parse_command(line); (2)
auto command = parsed_command.compile(buffer_); (3)

buffer_.set_dot(command.dot); (4)

command(in, out, buffer_, filename_); (5)
  1. Read a line, which is some kind of command - 1,$p or something, normal vi talk

  2. Parse that line out

  3. Compile into a command. The command that comes back is always runnable. If the line we typed makes no sense, we return an error object. (Two patterns for the price of one 😮)

  4. Update the current line

  5. Execute the command

The obvious fix is tonot update the current line if we’re in an error state.

if (command.dot != line_error)
  buffer_.set_dot(command.dot);

That would be fine.

Except.

Except.

buffer_.set_dot(command.dot);

is arse-backward.

Instead of the command operating on buffer_, buffer_ is reaching into the command and doing something to itself. My obvious fix of

if (command.dot != line_error)
  buffer_.set_dot(command.dot);

was just going to make things even more horrible. What’s the point of a null object if you have to put in a special case for it?

So, I went off to bed resolved to fix it the following evening.

Grinding

It rubbed me wrong all day, just there in the back of my head. A few minutes, and I’d flip around from

buffer_.set_dot(command.dot);

to

command.apply_dot(buffer_);

or something like that, then it’d be all nice and neat, and I’d head into the next little piece of development happy.

All Change, Please! All Change!

I sat down, opened up CLion, loaded up the code, and now instead of thinking about one isolated line I was seeing it back in context

auto line = stiX::getline(in);

auto parsed_command = parse_command(line);
auto command = parsed_command.compile(buffer_);

buffer_.set_dot(command.dot);

command(in, out, buffer_, filename_);

What’s that?

buffer_.set_dot(command.dot);

command(in, out, buffer_, filename_);

buffer_ and command, followed by command and buffer_.

Doh

There shouldn’t be two calls. The set_dot call should be folded into the command execution. The conditional is inside command were it belongs, command is operating the on buffer_, not this pushmi-pullyu arrangement there was before, and the null object is properly preserved. It’s the easier change, the cleaner change, the better change.

Sometimes the code surprises you

I’d been intending to do the obvious change, 2 minute job, just to smooth things out, make it easier to read - but until you get the code in your hands, you never really know what it’s going to tell or what you’re going to do with it.

Endnotes

This whole endeavour relies on Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They are both still in print, but new copies are, frankly, ridiculously expensive. Happily, there are plenty of second-hand copies around, or you can borrow them from The Internet Archive using the links above.

For this project I’ve been using JetBrain’s CLion, right now the new CLion Nova. It certainly feels a quicker than the CLion Classic which, since I was happy enough to spend money on that, is more than enough for me.


Tagged code, and software-tools-in-c++
Older posts are available in the archive or through tags.


Jez Higgins

Freelance software grandad
software created
extended or repaired

Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed