Jez Higgins

Freelance software grandad
software created
extended or repaired


Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed

To see a World in a Grain of Sand : Yr glygfa o Hescwm Uchaf, rhif 7

In my piece about my sadness and disappointment about the candidates we were getting for interview, I talked about the refactoring exercise we give people, and the conversations we have afterwards.

I’m not able to show any of that code, but I am going to talk about some code here of the type we often see. According to the version history, it’s passed through a number of hands, and I want to be clear I know none of the people involved nor have I spoken to them. They are, though, exactly the type of person presenting themselves for interview, and so for my purposes here they are exemplars.

This is not about them, though. I hold them blameless, and wish them only happiness. This is about the places that they worked, about the wider trade, about a culture that says this is fine.

To see a World in a Grain of Sand

Here’s some Python code. It’s from a larger document processing pipeline. Documents come shoved into the system, get squished around a bit, have metadata added, some formatting fixups, then squirt out the other end as nice looking pdfs. Standard stuff.

Documents can have references to other documents, both within the existing corpus, and to a variety of external sources. These references have standard forms, and when we find something that looks like a document reference, we do a bit of work to make sure it’s absolutely clean and proper.

That’s where this function, normalise_reference, comes in.

def canonicalise_reference(reference_type, reference_match, canonical_form):
    if (
        (reference_type == "RefYearAbbrNum")
        | (reference_type == "RefYearAbbrNumTeam")
        | (reference_type == "YearAbbrNum")
    ):
        components = re.findall(r"\d+", reference_match)
        year = components[0]
        d1 = components[1]
        d2 = ""
        corrected_reference = canonical_form.replace("dddd", year).replace("d+", d1)

    elif (
        (reference_type == "RefYearAbbrNumNumTeam")
        | (reference_type == "RefYearAbrrNumStrokeNum")
        | (reference_type == "RefYearNumAbbrNum")
    ):
        components = re.findall(r"\d+", reference_match)
        year = components[0]
        d1 = components[1]
        d2 = components[2]
        corrected_reference = (
            canonical_form.replace("dddd", year).replace("d1", d1).replace("d2", d2)
        )

    elif (
        (reference_type == "AbbrNumAbbrNum")
        | (reference_type == "NumAbbrNum")
        | (reference_type == "EuroRefC")
        | (reference_type == "EuroRefT")
    ):
        components = re.findall(r"\d+", reference_match)
        year = ""
        d1 = components[0]
        d2 = components[1]
        corrected_reference = canonical_form.replace("d1", d1).replace("d2", d2)


    return corrected_reference, year, d1, d2

I have obfuscated identifiers in the code sample, but its structure and behaviour are as I found it.

I’d been kind of browsing around a git repository, looking at folder structure, getting the general picture. A chunk of the system is a Django webapp and thus has that shape, so I went digging for a bit of the meat underneath. This was almost the first thing I saw and, well, I kind of flinched. Poking around some more confirmed it’s not an anomaly. It is representative of the system.

You’ve probably had some kind of reaction of your own. This is what immediately leapt out at me

  • The length

  • The width!

  • The visual repetition, both in the if conditions and in the bodies of the conditionals

  • The string literals

  • The string literal with the spelling mistake

  • The extraneous brackets in the second conditional body - written by someone else?

  • The extra line before the return - functionally, of course, it makes no difference, but still, urgh

Straightaway I’m thinking that more than one person has worked on this over time. That’s normal, of course, but I shouldn’t be able to tell. If I can, it’s a contra-indicator.

Looking a little longer, there’s a lot of repetition - in shape, and in detail. Looking a little longer still, and I think function parameters are in the wrong order. reference_type and canonical_form are correlated and originate within the system. They should go together. It’s reference_match which comes from the input document, it’s the only true variable and so, for me anyway, should be the first parameter. I suspect this function only had two parameters initially, and the third was added without a great deal of thought to the aesthetics of the change.

That’s a lot to not like in not a lot of code.

But at least there are tests

And hurrah for that! There are tests for this function, tangled up in a source file with some unrelated tests that pull in a mock database connection and some other business, but they do exist.

There are two test functions, one checking well-formed references, the other malformed references, but, in fact, each function checks multiple cases.

It’s a start, but the test code is much the same as the code it exercises - long and repetitious - which isn’t, perhaps, that surprising. A quick visual check shows they’re deficient in other, more serious ways. There are ten reference types named in canonicalise_reference. The tests check seven of them and, in fact, there is a whole branch of the if/else ladder that isn’t exercised. That’s the branch I already suspect of being a later addition.

Curiously too, while canonicalise_reference returns a 4-tuple, the tests only check the corrected reference and the year, ignoring the other two values. That sent me off looking for the canonicalise_reference call sites, where all four elements of the tuple are used. Again, I’d suggest the 4-tuple came in after the tests were first written and were not updated to match. After all, they still passed.

I am sure these tests post-hoc. They did not inform the design and development of the code they support.

Miasma

If the phrase coming to mind is code smells, then I guess you’re right. This code is a stinky bouquet of bad odours, except they aren’t clues to some deeper problem with the code. We don’t need clues - it’s right out there front and centre. No, these smells emanate from with the organisation, from a failure to develop the programmers whose hands this code has passed through. The code works, let’s be clear, but there’s a clumsiness to it and a lack of care in its evolution. That’s a cultural and organisational failing.

I keep saying this is about organisations. I’m not saying these are bad places to work, where maladjusted managers delight in making their underlings squirm. Quite the contrary, I’ve worked at more than one of the organisations responsible for the code above and had a great time. But there is something wrong - an unacknowledged failure. An unknown failure even. There so much potential, and it’s just not being taken up

I came across this code because I was talking about potentially work on it, going back into one of those organisations. That didn’t pan out, but had I been able I would absolutely have signed up for it. It’s fascinating stuff and right up a multiplicity of my alleys.

Let’s imagine for a moment that I was sitting down for my first day on this job, what would I do with this code? Well, at a guess, nothing. Well, nothing until I needed to, and then I’d spend a bit of time on it. But I’d absolutely be talking to my new colleagues about, well, everything.


Tagged the-trade


Jez Higgins

Freelance software grandad
software created
extended or repaired

Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed