Jez Higgins

Freelance software grandad
software created
extended or repaired


Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed

Let’s Predict! : Yr glygfa o Hescwm Uchaf, rhif 8

For a talk I’m thinking about for later this year or early next, I’ve decided I need a text prediction engine. The basic idea is that there are lots of things in software that seem kind of amazing, even to people who work in software. Compilers & interpreters, operating systems, windowing systems are obvious choices, as is lots of games stuff[1]. Those are, however, all pretty large topics. You can’t write an o/s in forty minutes. I can’t anyway.

What I think I can do a text prediction engine. You know the kind of thing, that feeds you words as you type on your phone or in your word processor.

Having typed the world 'Hello', my phone keyboard has switched from spelling correction to text prediction, offering me a choice of 'world', 'there', and 'how'.

If I can explain how a ✨magic✨ piece of software works in forty minutes or an hour, demystify the whole thing, I can start lift the lid for people on some of these other ✌︎hard✌︎ things.

Perhaps because my programming first language is C++, I’ve had people who work in C# or JavaScript or Python or similar say that all that low-level stuff is too complicated for them. I’ve also encountered a lot of C++ (and C and assembler) snobs who look down the noses at languages you can’t cause a segfault with and the people who use them.

The latter are probably beyond redemption, but it kind of boils my piss that there are programmers who’ve taken some of that snobbery onto themselves and accept that it’s founded in any kind of truth. C++ programmers aren’t on some rarefied mountain top. Programming’s programming. Software’s the most malleable medium we could wish to work in. Anybody can do anything in software.

Artifical Woo-woo

My phone keyboard asked me again last month if I wanted to turn on some kind of AI assistant to boost productivity and unlock creativity. I obviously said no[2], but it’s all part of stoking up the idea that text prediction is a difficult problem, best left to the big brains at Google or Apple or Microsoft with their shiny offices and free lunches in Silicon Valley and Redmond.

A text predictor in few lines of JavaScript should pop that bubble. A bit. Hopefully.

Keeping it Real (or whatever today’s young people say)

Why use JavaScript? A couple of reasons:

  1. Everyone can read it. It is the Lingua Franca of the current age.

  2. People shit on it all the time[3] even though they should know better.

If you can do something in JavaScript, language snobs of all stripes just have to shut up. Meanwhile, everyone else can understand the code you’re showing them.

And we’re off

Anyway, as I type I’ve no idea what I’m doing beyond the phrase markov chain, so here goes.

Markov chains are stochastic models of event sequences, where the probability of the next event only depends on the current event.

Helpful.

For text prediction, we take the current word - let’s say hello - and randomly select the word that comes next.

But if we randomly select a word, doesn’t that just mean it could be anything?

Well no, because I skipped over the model part.

Before we start predicting we build the model. We read a lot of text and every word we encounter, we make a note of the words that come immediately afterwards.

For hello, lets says that it’s followed by world, Dolly, and everybody.

Our prediction for the word following hello will, therefore, be one of those three words.

As we build the model, we don’t just note those following words, we also keep track of how often they occur. Imagine hello is followed by world three times, Dolly once, and everybody twice.

hello  ->  (world, 3)
           (everybody, 2)
           (Dolly, 1)

Our prediction is going to choose one of those three words, but the selection will be weighted by how often they occurred. Here we’d expect world to come up more often than everybody, for example.

The particular words and their weights will depend on your text corpus[4]. The complete works of Jane Austen, for example, is going to end up with quite a different model to that of yesterday’s football commentary, which will be different again to a model built from what you’ve typed into your phone.

We can make things sound more artificial intelligency by saying training instead of building. Training makes things sound like Neo in the Matrix, while building sounds boring and mechanistic. Actually they mean exactly the same thing. We’re taking an input of some sort, which we use to construct (ie build or train) a representation of that input (a model).

The model we’ve made can then be used in various ways. We can categorise or classify things, for instance - yes this is a picture of cat, no that is a picture of a traffic light - by seeing how well new inputs match the model.

What we’re doing here is using input as a stimulus to generate new output seemingly out of the air. Our model, full of words and their frequencies, has captured some essence of the text it’s been build from. By giving it a word to start with, it will produce a second word for us. If we feed that word back into as the next input, it’ll generate a third word. And so on and so on.

We’ll see that in due course, it hope.

Code!

While writing this, I’ve been poking around with code at the same time. It’s took a little bit longer than 40 minutes (not an excuse, but I was watching the FIH ProLeague hockey at the same time, but I got there more easily than I’d expected.

The implements the little data structure I sketched out above, with a small amount of extra scaffolding to let you populate it with pairs of words.

The Future

Here’s my silly hello world/everyone/Dolly example worked up into code.

import {make_chain} from "../src/chain.mjs";

const chain = make_chain()

chain.add('hello', 'world')
chain.add('hello', 'everyone')
chain.add('hello', 'Dolly')
chain.add('hello', 'everyone')
chain.add('hello', 'world')
chain.add('hello', 'world')

for (let i = 0; i !== 10; ++i)
  console.log('hello ' + chain.predict('hello'))

And here it is predicting

Console output from the example program. It’s ten lines of text reading hello world

The core of a text prediction engine in 66 lines of JavaScript[5]. Not bad.

The Future Future

A markov chain text predictor that only knows how to predict from one word isn’t a great deal of use. We need more words. It was, however, bed time when I’d wrapped the code up, and so rather late to feed it the complete works of Jane Austen.

That’s where I’ll go next time.


1. And not just fancy graphics business. I have no clue how you’d put together something like the computer opponents in Dominion, for example.
2. Because honestly, how much productivity boosting and creativity unlocking do you to send WhatsApp messages? And, in any case, it can’t do that.
4. Text corpus being the technical term for a big ole pile of words
5. find ./src -print | grep mjs$ | xargs grep -v ^$ | wc -l

Tagged the-trade, and javascript


Jez Higgins

Freelance software grandad
software created
extended or repaired

Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed