Jez Higgins

Freelance software grandad
software created
extended or repaired


Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed

STinC++ - Ed! Ed! Hooray! : Yr glygfa o Hescwm Uchaf, rhif 12

STinC++ rewrites the programs in Software Tools in Pascal using C++

Close on two years, on 29 August 2022, I started work on edit, the program described in Chapter 6 of Software Tools In Pascal. I was, I declared, excited to be getting in it. I meant it too, and so it might be reasonable to think I’d get on with things with a shade more urgency. Life is what it is, though, and so work proceeded rather in fits and starts. Indeed, at one point I thought I might just poke away it kind of forever but I’m only 55[1] so I’m a little young to be thinking like that. Twenty one months later I wrapped up edit a thousand lines of code and 270 commits after I started. The world absolutely doesn’t need another line-oriented text editor, but nonetheless I’ve really enjoyed writing one.

A Refresher

Even if you’re the most dedicated reader, you’ve probably forgotten what I was setting out to do. Hell, I forgot several times along the way and I was actually writing the code. Chapter 6 of Software Tools in Pascal opens with brief rational for building an editor - since we have pattern matching and a simple stream editor, we’ve got the building foundations of a more general text editor.

The editor we present here is modeled after the latest in a long family of conversational text editors that have achieved wide acceptance …​ Because edit is primarily intended for interactive use, it is streamlined and terse, but easy to use. This is especially important for a text editor: for most users it is the primary interface to the system …​ edit is not confined to conversational editing, however. It can be driven from prepared scripts and from other programs …​ It is a tool.

Kernighan and Plauger briefly discuss their approach to error handling. Since edit is used to maintain files, it can’t just core dump if something goes wrong, it needs to be robust, indeed graceful, to avoid the loss of valuable work. Since an editor is unavoidably a big program (their implementation runs to more than 1200 lines of Pascal) it’s important that program is well structured or it will all get “utterly out of hand”.

There follows a few pages describing how you would use edit. If you cross out edit and write in vim, it stands up as a nice concise introduction to line-oriented editing commands. I certainly learned things which had eluded me during 30+ years of exposure to vi. They finish this section with the manual.

PROGRAM

edit edit text files

USAGE

edit [file]

FUNCTION

edit is an interactive text editor that reads command lines from its input and writes display information, upon command, to its output. It works by reading text files on command into an internal "buffer" (which may be quite large), displaying and modifying the buffer contents by other commands, then writing all or part of the buffer to text files, also on command. The buffer is organisated as a sequence of lines, numbered from 1; lines are implicitly renumbered as text is added or deleted.

Context searches and substitutions are specified by writing text patterns, following the same rules for building patterns as used by find. Substitutions specify replacement text following the same rules as used by the program change.

Line numbers are formed by the following components:

n a decimal number

. the current line ("dot")

$ the last line

/pattern/ a forward context search

\pattern\ a backward context search

Components may be combined with + or -, as in, for example

.+1 sum of . and 1

$-5 five lines before $

Line numbers are separated by commas or semicolons; a semicolon sets the current line to the most recent line number preceding.

Commands may be preceded by an arbitrary number of lines numbers (except for e, f and q, which require none be present). The last one or two are used as needed. If two line numbers are needed and only one is specified, it is used for both. If no line numbers are specified, a default rule is applied:

(.) use the current line

(.+1) use the next line

(.,.) use the current line for both line numbers

(1,$) use all lines

In alphabetical order, the command and their default line numbers are

(.) a append text after line (text follows)

(.,.) c change text (text follows)

(.,.) dp delete text

e file edit file after discarding all previous text, remember file name

f file print file name, remember file name

(.) i insert text before line (text follows)

(.,.) m line3 p move text to after line3

(.,.) p print text

q quit

(.) r file read file, appending after line

(.,.) s/pat/new/gp substitute new for occurrence of pat (g implies for each occurrence across line)

(1,$) w file write file (leaves current state unaltered)

(.) =p print line number

(.,+1) newline print one line

The trailing p, which is optional, causes the last affected line to be printed. Dot is set to the last affected line, except for f, w, and =, for which it is unchanged.

Text entered with a, c and i is terminated with a line containing just a ..

The global prefixes cause repeated execution of a command, one for each line that matches (g) or does not match (x) a specified text pattern:

(1,$) g/pattern/command

(1,$) x/pattern/command

command can be anything but a, c, i or q, and may be preceded by line numbers as usual. Dot is set to the matched line before command is done.

If the command line argument file is present, then the editor behaves as if its input began with the command e file. The first filename used is remembered, so that a subsequent e, f, r, or w command can be written with no filename to refer to the remembered filename. A filename given with e or f replaces any remembered filename.

EXAMPLE

Don’t be silly.

Let’s take a moment

The remaining 45 pages of the chapter describe the edit program itself. I won’t, I hope, take that quite that long, but equally I’m not going to dig quite as deeply into it.

Reading the manual, it’s pretty clear, I think, that the core of the editor isn’t particular complex. We can add new lines into the editor and we can remove existing lines. The difference between inserting text before a particular line or appending it after is, in the grand scheme of things, barely worth worrying about. Moving lines about or changing existing lines are a fancy ways of writing a delete followed by an insert. The basic operations of edit are entirely straightforward and no danger lurks there. Even for a very large document, a standard container is going to be entirely fine.

For me, all the interest, all the excitement is in the conversation, in the commands the user enters to manipulate the text, particularly in line numbers. We’ve got integer literals, . as shorthand for the current line, $ for the last line, forwards or backwards search, along with basic arithmetic on any or all of those. Line indexes are optional, and the defaults differ from command to command. In addition, some commands have additional parameters. There’s lots going on there.

There is one final consideration - interpret or compile. Kernighan and Plauger’s edit interprets. It processes line numbers and commands immediately as it parses. Nothing wrong with that at all, and with the oft-discussed contraints of Pascal[2] an entirely reasonable one. However, this does mean you might get half way through processing some user input and then hit an error, leaving the program in a valid but unknown state. In the case of edit the worst that can happen is that current line or remembered search pattern is no longer what you expect, so it’s hardly catastrophic, but, particularly as a long time C++ programmer[3], it’s something I’d like to avoid.

Even before I put finger to keyboard therefore, I’d been leaning (because TDDing doesn’t mean you’re not allowed to think) towards a compilation model - parse out the command to some intermediate state, compile it into something that will do the work, and only then, if everything’s gone ok, pull the trigger.

It’s right there in the code.

void editor::handle_command(
    std::string_view line,
    std::istream&in,
    std::ostream&out) {
  auto parsed_command = parse_command(line);
  auto command = parsed_command.compile(buffer_);

  command(in, out, buffer_);
}

"But you said only pull the trigger if everything was ok" you might reasonably respond "looks like you’re just carrying on regardless". Well, yes and also no. In the event of an error, edit does nothing except produce some output, a single question mark.

$ ./edit
huh
?

In the code above, if parsing or compilation fails, compile returns an error command. When we run the error command, it’s responsible for popping out the ?. More successful parses produce more useful commands like inserting or deleting.

Three Steps To Ed-en

Step One: Parsing

Parsing can be something of dark art, but the edit commands are pretty straightforward and so we don’t need anything sophisticated. The commands have the general form of 0, 1, or N line indexes followed by a single letter command, and then, potentially, a bit extra on the end. Honestly, I put off dealing with the extra bits for quite some time, before just knuckling down.

  auto parsed_command = parse_command(line);

The parse_command is a free function and hands back the parsed result by value, so we don’t have any lifetime or memory management issues to worry about. parse_command itself simply hands over the work to a command_parser object. The sole purpose of command_parser is the bundle up all the moving parts - how much of the input have we consumed, what have we found so far, have we finished - so that we don’t have to pass half-a-dozen parameters down through the call chain.

parse_command(std::string_view const input) {
  auto parser = command_parser(input);

  return parser.parse();
}
parsed_command parse() {
  parse_line_numbers();

  parse_command_code();

  has_line_numbers_when_forbidden();

  line_number_defaults();

  return parse_result();
}

parse_line_numbers builds up a list of line_expression objects. A literal index, say 1 becomes an int_index(1), while something like /next/+1 becomes add_offset(forward_search('next'), 1). I’ve said objects, but there’s no class hierarchy here. Everything is captured in little lambda functions. Some, like dot_index_fn just return constants, int_index captures the integer value. More complex operations, like addition and subtraction add_offset capture other line expressions within themselves. It’s actual function composition in action.

We’re building up the little expression functions rather than working out what, for example, .;$ means straightaway for a couple of reasons. First, the value of line expressions depends on the state of the editor, and secondly, evaluating the line expressions has side effects. The current line can be updated, and any search patterns will be remembered. We don’t want to be changing the editor state when we still don’t know if the whole command makes sense. So, at this stage, build the line expressions, and we’ll work them out later.

Once the line index parsing is taken care off, everything else is pretty straightforward. We expect to next see a command letter, and that letter tells us what to look for after that. If we see p, then we’re looking for the end of the input. On the other hand if we have an s we look for a search pattern, replacement text, and perhaps the g flag.

stiX::parsed_command parse_result() const {
  if (is_error())
    return parse_error();

  return { indicies, code, extras };
}

Once everything’s parsed out, we’re ready to compile.

Step Two: Compiling

Compiling the command is, as the parsing result might suggest, a two phase process. First the line expressions are evaluated against the edit buffer. This is slightly more involved that might first appear as we need to propagate updates to the current line and the saved search pattern through the evaluation chain. It is the case, although I think most people would never use it, that one line expression might affect another through the side effect of updating the current line. I suspect this kind of manoeuvre was perhaps more useful when working with line printers, noisily carving output into fanfold paper[4]. In this modern age of when everyone has multiple virtual glass teletypes and screen editors, we’ve rather lost the art.

std::tuple<size_t, size_t, size_t, std::string>
eval_line_expressions(
  std::vector<stiX::line_expression_step> const& line_expressions,
  stiX::edit_buffer const& buffer
) {
  auto dot = buffer.dot();
  auto last_pattern = std::string { buffer.pattern() };
  auto line_numbers = std::vector<size_t> { };

  for (auto const& [expr, separator] : line_expressions) {
    auto [index, pattern] = index_or_error(expr, buffer, dot, last_pattern);
    dot = (separator == stiX::expression_separator::update) ? index : dot;
    last_pattern = !pattern.empty() ? pattern : last_pattern;
    line_numbers.push_back(index);
  }

  if (line_numbers.size() > 2)
    line_numbers.erase(line_numbers.begin(), line_numbers.end()-2);

  return { line_numbers.front(), line_numbers.back(), dot, last_pattern };
}

Using a tuple to return several values from a function still feels like slightly stinky C++, even though I wouldn’t think twice about it in a language like Python or JavaScript. I’d still be unlikely to use it in a function that formed part of an interface or API, a public member function say, but in this context it’s perfectly fine[5].

Line expressions evaluated, yielding actual numbers, we can now compile the main command.

stiX::action command_for_code(
  char const code,
  size_t const from_index,
  size_t const to_index,
  size_t const destination,
  stiX::command_extras const& extras) {

  auto const fn = command_map.find(code);

  return fn != command_map.cend()
    ? fn->second(from_index, to_index, destination, extras)
    : stiX::command::error;
}

The fn->second(from_index, to_index, destination, extras) call does the work of compiling the command. These compilation functions do any extra setup work that might be needed, then returns, as you’ve probably guessed, a lambda function wrapping everything we’ll need when we come to run the command. Again, everything the lambda needs is captured by value, the lambda function itself is a value, so yet again we’ve avoided any lifetime issues.

using action =
    std::function<void(std::istream&, std::ostream&, edit_buffer&)>;

action stiX::make_insert_action(
    size_t const,
    size_t const to_index,
    size_t const,
    command_extras const&) {
  return [to_index](std::istream& in, std::ostream&, edit_buffer& buffer) {
    insert_action(in, to_index, buffer);
  };
}

Under the covers, there are actually more actions available beyond those implementing the commands described in the edit manual. Updating the current line and storing a search pattern are both implemented as "hidden" actions. A compiled command is, in fact, a sequence of actions, all wrapped up together.

stiX::commands stiX::parsed_command::compile(
    stiX::edit_buffer const& buffer) const {
  auto const [from, to, updated_dot, last_pattern] =
      eval_line_expressions(line_expressions, buffer);

  if (is_error(from, to, code))
    return { command::error };

  auto destination = eval_destination_expression(
    extras.destination_expression,
    buffer,
    updated_dot,
    from,
    to,
    last_pattern);

  if (is_error(destination))
    return { command:: error };

  auto set_dot = command::update_dot(updated_dot);
  auto set_pattern = command::update_pattern(last_pattern);

  auto command = command_for_code(
      code,
      from,
      to,
      destination,
      extras);

  auto and_then = extras.and_print
    ? command::and_print
    : command::noop;

  return { set_dot, set_pattern, command, and_then } ;
}

This multiple action approach only occurred to me quite late on, as I was thinking about implementing g, the global command. I’d actually not quite understood what g does, and imagined that I’d be able to do most of the work up front, and then return several commands, one for each line the global search matched. You can’t, I quickly realised, but I’d made the switch from single action to multiple action. Once the change was in place though, even though it didn’t serve my original purpose, the possibilities just opened up. Suddenly everything started getting a lot simpler, things I’d put off because they seemed awkwardEp just started dropping into place, and I realised I could discard a whole layer of indirection. I’ve got faith in myself and in the process - I’m comfortable I can form the code I write into the shape it needs to have - but every now and again something almost magical happens and some small change just cracks everything open.

Step Three: Action

We’ve parsed out the command string entered by the user, evaluated our line expressions, built our sequence of lambda functions, what now?

Well, we just apply those lambdas to the edit buffer. parsed_command::compile returns a commands object, which provides the function call operator operator(). This simply calls down into each lambda in sequence, which themselves call the functions that do actual work.

void stiX::commands::operator()(
    std::istream& in,
    std::ostream& out,
    edit_buffer& buffer) const {
  for(auto const& c : commands_)
    c(in, out, buffer);
}

The action functions are all pretty straightforward. Indeed, many of them share implementations or are implemented in terms of each other actions. Append, insert and read file, for example, all share their implementation, while change is a combination of delete and insert.

void stiX::append_action(
    std::istream& in,
    size_t const after,
    edit_buffer& buffer) {
  read_lines(in, after, true, buffer);
}

void stiX::insert_action(
    std::istream& in,
    size_t const before,
    edit_buffer& buffer) {
  auto const adjust = (!buffer.empty() && before != 0) ? 1 : 0;
  read_lines(in, before-adjust, true, buffer);
}

void stiX::read_from_file_action(
    std::ostream& out,
    size_t after,
    std::string_view filename,
    edit_buffer& buffer) {
  auto f = std::string { !filename.empty() ? filename : buffer.filename() };
  if (filename_is_good(f, out)) {
    auto source = std::ifstream { f };
    read_lines(source, after, false, buffer);
  }
}

void read_lines(
    std::istream& source,
    size_t index,
    bool stop_on_stop,
    edit_buffer& buffer) {
  while(source.peek() != eof) {
    auto line = stiX::getline(source);

    if (stop_on_stop && line == ".")
      return;

    buffer.insert(index, line);
    ++index;
  }
}

There really is very little code involved in implementing the editor commands, perhaps only 150 lines or so. Even the global command, which involves calling back into the parser, and then recompiling commands is under 20 lines or so spread over three functions. Honestly, the thing that tripped me up the most, even quite late on, is that edit line indexes are 1-based rather 0-based, except that sometimes a 0 line index is ok, depending on the command, and whether the edit buffer is empty or not.

Getting to the point where we’re actually call those functions - the parsing and compiling - is the bulk of rest of the program, 650 lines or so. That’s the eternal truth of software - working out what people want done is much more complicated that actually doing it.

A Long Way To A Small Program

While I’m coming up on two years wall clock time writing edit, I have no real idea how much human CPU time I’ve spent on it. I usually work on this little project on a weekday evening, generally for an hour or so[6], sometimes while watching a bit of FIH Pro League hockey. I’m at the end of the day, the sport might get exciting at any moment, but I’m still trying to do good work, so I do the smallest little step I can that moves the code forward, and commit it. Even if it’s only a single line. Even if it’s a single character. Now I can go again with another little step, and commit. And, if I’m able, again, and again, until it’s time to take the dogs for a wee, put the breadmaker on, or any of the other closing down for the evening chores.

You can see this in the commit history. There are gaps where weeks pass with nothing, interspersed with periods where I work on the code for several evenings in a row, committing five or six or however many times, some times with only a couple of minutes between commits.

The more often I commit, the less I have to hold in my head. The code itself contains everything I know about design and implementation. When I come back, even after an extended period, I can pick it up, find the shape, and start working again.

Maybe I could do some maths on the commit timestamps and come up with an estimate for time spent, but does it really matter? STinC++ is a pleasure project and I have really enjoyed this chapter of it.

Where next?

Kernighan and Plauger suggest a number of exercises for extending edit beyond the code they present. They include small things like a variation on print to represent "invisible" characters like tabs and trailing spaces, or more helpful error messages than ?. They work an exercise for editing with files larger than can be accommodated in memory. This is a historical curiosity, and not something I’m especially worried about today. Even my unsophisticated use of std::vector should stand-up to the rigours of everyday use in 2024[7].

More interesting exercises include implementing shelling out to a process, catching interrupts without data loss, and then cannibalising edit to make a "stream editor". I’d not really thought about it before, but I’d sort of assumed sed predated ed but it seems not.

I don’t propose to leap into any of these straightaway, but I might loop back round to them at some point. I’ve also poked at the idea of making my edit even more ed-like - ^ instead of % in the regexes, filling out the command set, that kind of thing. We’ll see. Something else that might be fun is grappling a screen editor on top of edit, in the same way vim was evolved from ex.

We’ll see.

Kernighan and Plauger conclude by describing how they use edit to pull together the manuscript of Software Tools in Pascal. This involves using edit to build scripts to feed back into edit. Of course they do. This is a book of tools, and they’re putting the tools to use.

Ed-roboros

Throughout this I’ve described edit as ed-like. When Kernighan and Plauger wrote Software Tools and then Software Tools in Pascal, ed already existed on the Bell Labs UNIX systems and the program they presented provided a subset, albeit a large subset, of ed functionality, hence ed-like. Curiously though, the man page for both the BSD ed and GNU ed both reference Software Tools in Pascal. Indeed, the GNU ed info pages go so far as to say

GNU ed is based on the editor algorithm described in Brian W. Kernighan
and P. J. Plauger's book "Software Tools in Pascal", Addison-Wesley, 1981.

Perhaps edit isn’t ed-like, perhaps ed is edit-like.

Source code

Source code for this program, indeed the whole project, is available in the stiX GitHub repository. edit is the only program in Chapter 6.

Wait, I arrived late! What’s going on?

Software Tools In Pascal by Brian Kernighan and PJ Plauger is a book that I love. Since August 2019, I’ve been gradually working through the book, reimplementing their tools in C++. The book really is a delight and I commend it to you. It’s still in print, although wildly expensive so get yourself a second hand copy or borrow it from The Internet Archive.

Tools-wise, I’ve been using JetBrain’s CLion, which I like a lot, with CMake for build and Catch2 as my test framework.


1. As of 29 May, birthday fans!
4. I’d love to know the history of that particular feature. I’d put money on it being an case book study of iterative development in action.
5. Although if you look through the commit history you’ll see it took me some time to realise this
6. A development episode can be as short as 10 minutes, and only very rarely much longer than two hours
7. By which I mean of course, that it won’t be used at all when so many better alternatives exist.

Tagged code, and software-tools-in-c++


Jez Higgins

Freelance software grandad
software created
extended or repaired

Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed