Jez Higgins

Freelance software grandad
software created
extended or repaired


Follow me on Twitter
My code on GitHub
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed

Thursday 06 August 2020 The Forest Road Reader, No 2.51 : STinC++ - archive contents, delete, print

STinC++ rewrites the programs in Software Tools in Pascal using C++

At the end of my last installment, I had archive creation going, albeit with mocked up input. Putting together inputs and expected outputs as inline strings quickly became rather tedious, and I extended out my tests to pick up test cases from the filesystem. Subsequently, I was able to reuse the same test cases to drive archive from the very top, passing in the command line arguments. I was even able to reuse code from compare to verify the program output.

With archive creation out the way, the next step seemed to be listing the contents. I followed that with removing files from the archive, and then printing a file to standard out, the -t, -d, and -p command line options. Recall that the archive format is a header line

-h- name size

followed by the file contents, the next header line and contents, and so on.

These three operations, which operate on an existing archive file, all have a similar shape

void table_archive(
  std::istream& archive_in,
  std::ostream& out
) {
  archive_in.peek();

  while (archive_in && !archive_in.eof()) {
    auto header_line = getline(archive_in);
    auto header = parse_header(header_line);

    out << header.name << '\t' << header.filesize << '\n';

    skip_entry(archive_in, header);

    archive_in.peek();
  }
} // table_archive

void delete_from_archive(
  std::istream& archive_in,
  std::vector<std::string> const& files_to_remove,
  std::ostream& archive_out
) {
  archive_in.peek();

  while (archive_in && !archive_in.eof()) {
    auto header_line = getline(archive_in);
    auto header = parse_header(header_line);

    if (of_interest(files_to_remove, header))
      skip_entry(archive_in, header);
    else {
      archive_out << header;
      copy_contents(archive_in, header, archive_out);
    }

    archive_in.peek();
  } // while ...
} // delete_from_archive

void print_files(
  std::istream& archive_in,
  std::vector<std::string> const& files,
  std::ostream& out
) {
  archive_in.peek();

  while(archive_in && !archive_in.eof()) {
    auto header_line = getline(archive_in);
    auto header = parse_header(header_line);

    if (of_interest(files, header))
      copy_contents(archive_in, header, out);
    else
      skip_entry(archive_in, header);

    archive_in.peek();
  } // while ...
} // print_files

Before I lined them up like this, I hadn’t realised just how similar they are. There’s a fairly obvious refactoring to do next time I touch the code.

The archive_in.peek() is a cheeky lookahead that I’m using to set the end-of-file (technically end-of-input) flag before I try to read anything, rather than afterwards. Let’s imagine we’ve been churning through an archive file, and have just read the last header and contents. We now call archive_in.peek() which goes away and tries to find the next character of the input. There isn’t one, so the stream sets its end of file flag. Looping back up to the top of the while loop, we call archive_in.eof(), which now returns true causing us to break out of the loop. Without the peek(), that call to archive_in.eof() would return false because we hadn’t reached the end of the input yet. The subsequent call to getline() would hit end of input, so would return an empty string. We would then need to handle that inside the loop, and the whole thing starts to get a bit messy. I don’t think I’ve used peek() in this way before, but I wish I’d worked it out years ago.

Almost There

With the create, list contents, delete, and print operations complete, the archive program is almost complete, with only the extract and update operations to go. Extracting a file is essentially the same as printing, except we write to a file rather than to standard out. Updating can be achieved by, essentially, combining the operation of the delete and create operations. If first we remove the files to be updated from the archive, then append the new contents to the archive we’ll have successfully updated it. That’s what I reckon anyway. I’m off to find out.

Source Code

Source code for this program, indeed the whole project, is available in the stiX GitHub repository. archive is the sixth program of Chapter 3.

Library Reference

Endnotes

This whole endeavour relies on Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They’re both still in print, but new copies are, frankly, just ridiculously expensive. Happily, here are plenty of second-hand copies floating round, or you can borrow them from The Internet Archive using the links above.

For this project I’ve been using JetBrain’s CLion, which I liked enough to buy a license.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this project is the first time really getting in and using it. I like it and will use it again.


Tagged code, and software-tools-in-c++

Wednesday 15 July 2020 The Forest Road Reader, No 2.50 : STinC++ - archive create

STinC++ rewrites the programs in Software Tools in Pascal using C++

Up until now, when I sit down to write the next chunk of STinC++ I’ve already written the code. Consequently, I’m writing retrospectively, I know what I’ve done, which blind alleys I went down and came back from and so on. I can fit it all together and tell a nice little story. Not so today - I’m writing the code as I go.

Creating an archive

As I mentioned yesterday, archive is a pretty hefty program by the standards of what we’ve encountered so far, so I’m taking it one piece at a time. I’m starting with archive creation, because if we haven’t got an archive file to play around with none of the other functionality comes into play anyway.

In their description of the program, Kernighan and Plauger have take one of the design decisions away from us

An archive is a concatenation of zero or more entities, each consisting of a header and an exact copy of the original file. The header format is

-h- name length

Our archive will, then, distribute its table of contents through the archive file. We can think about how this might make certain operations easier, perhaps at the expense of time efficiency. Given the time we’re talking about is probably going to be measured in the milliseconds, I’m happy with their choice. It feels more straightforward to implement. Let’s find out!

To begin with, I’m not going to worry about files at all. Here’s a test.

SECTION("no input files creates empty archive") {
  std::ostringstream archiveOut;

  create_archive(archiveOut);

  auto archive = archiveOut.str();
  REQUIRE(archive.empty());
}

All I know at the moment is I want my archive to write out to a stream. Not only have I said those no input, I haven’t even described how the input’s going to get in there. That’s not going to be enough, but I’ll find everything else out as I go.

Of course, it doesn’t compile yet but that’s ok. Let’s make it compile.

namespace stiX {
  void create_archive(
    std::ostream& archive_out
  ) {

  } // create_archive
}

Boom. Let’s commit that. Now what? What’s the next smallest step? Adding one file of zero length. Given what we know about our file structure, if our input is the zero length file nothing, the output archive file must be

-h- nothing 0

I don’t want to worry about where the file size comes from, so I’ll just pass that in along with the name. Something like this,

SECTION("one zero-length input file") {
  std::ostringstream archiveOut;

  auto input = stiX::input_file { "nothing", 0 };
  stiX::create_archive(input, archiveOut);

  auto archive = archiveOut.str();
  REQUIRE(archive == "-h- nothing 0\n");
}

In the Acknowledgments of his book Test-Driven Development By Example, Kent Beck writes

Finally, to the unknown author of the book which I read as a weird 12-year-old that suggested you type in the expected output tape from a real input tape, the code until the actual results matched the expected result, thank you, thank you, thank you.

That does appear to be pretty much the exercise I’m engaged in here. I’ll perhaps save you from every step of every single commit I make, lest I inadvertently reproduce the early chapters of that excellent book. If you’re feeling especially keen you can, of course, browse the commit history.

create.hpp
namespace stiX {
  struct input_file {
    std::string const name;
    size_t const filesize;
  };

  void write_header(
    stiX::input_file const& input,
    std::ostream& archive_out
  ); // write_header

  template<typename FileReader> (1)
  void write_contents(
    stiX::input_file const& input,
    std::ostream& archive_out,
    FileReader fileReader
  ) {
    auto inputStream = fileReader(input.name);
    copy(inputStream, archive_out); (2)
  } // write_contents

  template<typename FileReader>
  void create_archive(
    std::vector<input_file> const &input,
    std::ostream &archive_out,
    FileReader fileReader
  ) {
    for (auto i : input) {
      write_header(i, archive_out);
      write_contents(i, archive_out, fileReader);
    } // for ...
  } // create_archive
} // namespace stiX

Well, it’s past bedtime and a few tests later this is where I am. Parameterising create_archive on the FileReader function has exactly the same motivation and explanation as when I pulled the same trick while working on include. The copy function in the middle is our old friend from right back when we began, popping up again. I’m slightly unhappy with the fact that the input structure already knows the file size, but makes a call to read the file. It feels like both should be function calls, or both should be known when we invoke create_archive. That’ll resolve itself in due course, I’m sure.

Kernighan and Plauger also started with creating an archive. They worked outside-in, so they spend a little bit of time setting up application scaffolding to parse arguments, call placeholder functions, and what not, before getting into the business of creating and updating an archive. They treat the two as the same operation, which may turn out to be true for us too in due course. As before, their code deals directly with the file system, rather than with the abstractions C++ allows us. At this point though, we’re in broadly similar places.

Endnotes

This whole endeavour relies on Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They’re both still in print, but new copies are, frankly, just ridiculously expensive. Happily, here are plenty of second-hand copies floating round, or you can borrow them from The Internet Archive using the links above.

For this project I’ve been using JetBrain’s CLion, which I liked enough to buy a license.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this project is the first time really getting in and using it. I like it and will use it again.


Tagged code, and software-tools-in-c++

Tuesday 14 July 2020 The Forest Road Reader, No 2.49 : STinC++ - archive

STinC++ rewrites the programs in Software Tools in Pascal using C++

And so, my friends, we’re into the final stretch of chapter 3. Chapter 3 broadly follows the pattern of its predecessors - presenting several small programs exploring various aspects of the matter under examination, before pulling the threads together into a slightly larger program. This chapter is all about files and file handling - opening files, reading their contents, creating and writing new files - and the last program of the chapter brings all that together in archive, a file archiving utility not unlike tar or cpio.

PROGRAM

archive maintain file archive

USAGE

archive -cmd aname [ file …​ ]

FUNCTION

archive manages any number of member files in a single files, aname, with sufficient information that members may be selectively added, extracted, replaced, or deleted from the collection. -cmd is a code that determines the operation to be performed.

-c create a new archive with the named members

-d delete named members from archive

-p print named members on standard output

-t print table of archive contents

-u update named members or add at end

-x extract named members from archive

In each case, the "named members" are the zero or more filenames given as arguments following aname. If no arguments follow, then the "named members" are taken as all of the files in the archives, except for the delete command -d, which is not so rash. archive complains if a file is named twice or cannot be accessed.

The -t command writes one line to the output for each named member, consisting of the member name and a string representation of the file lenght, separated by a blank.

The create command -c makes a new archive containing the named files. The update command -u replaces existing named members and adds new files onto the end of an existing archive. Create and update read from, and extract writes to, files whose names are the same as the member names in the archive. An intermediate version of the new archive file is first written to artemp; hence this filename should be avoided.

An archive is a concatentation of zero or more entities, each consisting of a header and an exact copy of the original file. The header format is
-h- name length

EXAMPLE

To replace two files in an existing archive, add a new one, then print the table of contents:

 archive -u archfile old1 old2 new1
 archive -t archfile

It is, frankly, a bit of beast. The discussion of archive takes as many pages as does all the preceding programs in the chapter. I suspect I’ll probably end up with a similar ratio, but rather than write one monstrous great article (which would be tiring and, frankly, dispiriting for all of us) I’m going to take the excellent advice Kernighan and Plauger give at the start of the section.

The archive program is a natural for what we like to call "left-corner" construction. The idea is to nibble off a small, manageable corner of the program - a part that does something useful - and make that work. Once it does, more and more pieces are added until the whole thing is done. If care is taken with the original design, later pieces should fit in relatively smoothly. Debugging and testing are easier, for the pieces are only added one at a time. Furthermore, if you decide to scrap the whole thing at some point, you are scrapping only that fraction built so far.

They continue …​

The beauty of left-corner construction is that the progam does some part of its job very early in the game. By implementing the most useful functions first, you get an idea of how valuable the program will be before investing any time in the difficult or esoteric services (which often prove to be unnecessary or unwanted anyway). You also ensure that the simpler and more common functions are handled simply, which leads to greater efficiency in the end.

Again, Software Tools in Pascal was written in the 1970s and published in 1981. Collectively it took us another twenty years for this kind of thing to become remotely mainstream, and even now there are people who’ll tell you this kind of incremental development, focusing on software that actually does stuff, is dangerous nonsense and what we really need is more design documents and planning meetings.

So anyway, I guess it makes sense to start with -c create an archive. Join me again tomorrow - same C++ time, same C++ channel - as I dive in and have a go.


Tagged code, and software-tools-in-c++

Friday 03 July 2020 Talk: Journey Into Space

nor(DEV):live - Journey Into Space with Jez Higgins

How I read an article by one of the original signatories of the manifesto for agile software development, and accidentally ended up writing a version of Asteroids for my phone.


Because no good deed goes unpunished, when I mentioned on Twitter that I had unexpectedly written a phone game, I was signed up to do a live stream for my friends at Norfolk Developers in about 5 seconds flat. I don’t think I’ve put together a talk in such a short space of time before, and I did lose my thread a little at the end, but it was fun to do and people seemed to enjoy it.

Thanks in particular to Alex and Shaun at NorDev for indulging my video pipelining shenanighans instead of insisting I just do a screen share like a normal person.

The articles that kicked all this off

  • Ron Jeffries' Asteroids articles. I’d strongly suggest you start at the beginning and work through because they’re just a delight, but if that seems a bit daunting at least read number 59. If you aren’t moved by that one, I don’t know what to say to you.

The WikiWikiWeb

The C2 Wiki still exists, with much of the content from the time I was talking about.


Tagged talk, nordev, code, and android

Friday 05 June 2020 The Forest Road Reader, No 2.48 : STinC++ - makecopy

STinC++ rewrites the programs in Software Tools in Pascal using C++

Deep into chapter 3, we’re now on our fifth file handling program and it takes a bit of a turn away from what we’ve done so far.

PROGRAM

makecopy copy a file to a new file

USAGE

makecopy old new

FUNCTION

makecopy copies the file old to a new instance of the file new, i.e. if new already exists it is truncated and rewritten, otherwise it is made to exist. The new file is an exact replica of the old.

EXAMPLE

To make a backup copy of a precious file

makecopy precious backup

BUGS

Copying a file onto itself is very system dependent and usually disastrous.

As you read this program description, you’re probably already sketching the source out in your head. The description even says 'truncated and rewritten', which is a pretty solid example of implementation detail leaking into documentation. I certainly was, not least because Kernighan and Plauger have established a pattern of setting out a problem at the start of a chapter, and then growing and tweaking the code we write to take us through solving the next problem and the next.

This program differs significantly from its predecessors, though. It’s entirely about the file system. We’re copying a file. We’re not changing the contents, we’re not even looking at it. We just need to manipulate the file, and if we can do that without cracking it open and messing around with its insides, well that would be lovely. And we can! In a development that’s taken nearly 20 years and required what seems like unreasonable amounts of high-quality brainpower, as of the 2017 standard C++ sports a spiffy filesystem library.

makecopy.cpp
#include <iostream>
#include <filesystem>
#include <tuple>
#include "../../lib/arguments.hpp"

namespace fs = std::filesystem;

std::tuple<fs::path, fs::path> file_paths(int argc, char const* argv[]);

int main(int argc, char const* argv[]) {
  try {
    auto [source, destination] = file_paths(argc, argv);

    if (fs::exists(destination) && fs::is_regular_file(destination))
      fs::remove(destination); (1)

    fs::copy_file(source, destination);
  } catch (const std::exception& fse) {
    std::cerr << fse.what() << '\n';
  }
}

std::tuple<fs::path, fs::path> file_paths(int argc, char const* argv[]) {
  auto filenames = stiX::make_arguments(argc, argv);

  if (filenames.size() != 2)
    throw std::runtime_error("Error: makecopy old new");

  auto source = fs::path(filenames[0]);
  auto destination = fs::path(filenames[1]);

  if (fs::equivalent(source, destination)) (2)
    throw std::runtime_error("Error: source and destination are the same file");

  return std::make_tuple(source, destination); (3)
} // file_paths
  1. The copy operation, fs::copy_file, fails if the destination exists, hence why I delete any existing file first.

  2. fs::equivalent is almost, but not quite, equals for paths. Two paths are equivalent if they point to the same file, even if one is, say, a file and the other a symlink, or one is a relative path and the other absolute, or whatever. If, underneath it all, they resolve to the same file, they are equivalent.

  3. I’m using a std::tuple as an ad hoc multivalue return. Using the magic of auto and structured binding, we can unpack the tuple directly into two separate variables at the call site and never really see the tuple at all. As I’m returning two values, I could have used a std::pair I suppose, but std::tuple just feels more suited here. To me, a pair says these two things belong together while tuple says these are just some things that I happen have right now.

But does it work?

It does, but you’ll have to build and run it yourself to prove it.

Partly because it’s so short, little more than a glorified wrapper around a library function, and partly because it’s all about filesystem manipulation, this is the first STinC++ program for which I wrote no tests. I could, I suppose, have shimmed out the std::filesystem functions I needed, done a bit of namespace manipulation, and thrown some tests around it but, what, really would that prove? There’s no 'business logic' to validate, no unit to isolate. I could try to test 'good' and 'bad' inputs I suppose, but filesystems are hard and any mock couldn’t hope to match real behaviour beyond the simplest.

Part of the raison d’etre of a standard library is to provide guarantees about the boundaries of your program. I’m happy to rely on those here. After all, this version doesn’t suffer from the bug described by Kernighan and Plauger and can make a much stronger claim on the new file is an exact replica of the old than their program could.

That’s All Very Neat, But Haven’t It Rather Missed The Point

Having thoroughly covered reading files, Kernighan and Plauger’s pedagogic intent with makecopy was to introduce programmatic file creation. This version of makecopy does not do that. It’s not even close. I’m a little surprised to get this far into the book before the Pascal and C++ version diverged so completely. Maybe we haven’t come as far as we thought.

Source code

Source code for this program, indeed the whole project, is available in the stiX GitHub repository. print is fifth program of Chapter 3.

Library Reference

  • The Filesystem library was added in C++17. It does look largely as you expect, and is one of the many Standard features born and nutured in Boost

  • Structured Binding Declaration, binding the specified names to subobjects or elements of the initializer, is one of the gifts of C++17 that will surely be giving for a long, long time.

  • std::tuple has been around since C++11, but I wouldn’t be surprised if it had largely passed you by until structured binding made it radically more convenient to use.

Endnotes

This whole endeavour relies Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They’re both still in print, but new copies are, frankly, just ridiculously expensive. Happily, here are plenty of second-hand copies floating round, or you can borrow them from the The Internet Archive using the links above.

For this project I’ve been using JetBrain’s CLion, which I liked enough to buy a license. CLion uses CMake to build projects. My previous flirtations with CMake, admittedly many years ago, weren’t a huge success. Not so this time - it’s easy to use and works a treat.


Tagged code, and software-tools-in-c++
Older posts are available in the archive or through tags.


Jez Higgins

Freelance software grandad
software created
extended or repaired

Follow me on Twitter
My code on GitHub
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed