Jez Higgins

Freelance software grandad
software created
extended or repaired


Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed

Monday 24 August 2020 The Forest Road Reader, No 2.52 : STinC++ - archive update, extract

STinC++ rewrites the programs in Software Tools in Pascal using C++

Last time out, back at the start of August, I left myself with three jobs to do on archive - factor out the common loop in the table, delete, and print commands, implement the update command, and finally do the extract command.

Factoring out the loop

I realised I had three functions that all looked pretty similar.

void an_archive_operation(
  ...
) {
  archive_in.peek();

  while (archive_in && !archive_in.eof()) {
    auto header_line = getline(archive_in);
    auto header = parse_header(header_line);

    ... about three lines of code doing the actual thing ...

    archive_in.peek();
  }
}

The loop itself, reading each entry in the archive before we decide what to do with it, isn’t that complex but it’s not entirely trivial and that archive_in.peek() is a little bit clever-clogsy. We’ve written it three times already and with two operations yet to go, we could end up doing it twice more. Let’s go the other way, do the loop once and just drop the interesting part of each operation into the middle.

table.cpp, archive_file.cpp
void table_archive(
  std::istream& archive_in,
  std::ostream& out
) { (1)
  read_archive( (2)
    archive_in,
    [&out]( (3)
      std::istream& archive_in,
      archive_file const& header
    ) {
      out << header.name << '\t' << header.filesize << '\n';

      skip_entry(archive_in, header);
    }
  );
} // table_archive

typedef std::function<
  void(
    std::istream&,
    archive_file const&
  )
> ArchiveReader; (5)

void read_archive(
  std::istream& archive_in,
  ArchiveReader reader
) {
  archive_in.peek();

  while(archive_in && !archive_in.eof()) {
    auto header_line = getline(archive_in);
    auto header = parse_header(header_line);

    reader(archive_in, header); (4)

    archive_in.peek();
  } // while ...
} // read_archive
  1. table_archive prints the archive’s table of contents to the standard output.

  2. read_archive implements our loop. Here at the call site, we need to hand it the work we need to do.

  3. I’ve chosen to wrap that work up in a lambda expression.

  4. Within read_archive we invoke the lambda using the normal function call syntax.

  5. This rather funky-looking typedef is the modern-day equivalent of a function pointer declaration. std::function is a general-purpose polymorphic function wrapper - it can wrap functions, function objects, lambda expressions, pointers to member functions, and more. You name it, std::function can wrap it.

Often in C++, you have the choice of doing something at compile time or at runtime. read_archive could have been written, almost identically, as a template function. Here, I made the runtime choice using std::function. For this application it doesn’t make any kind of noticeable difference - I’m trading an almost unmeasurably slower application against, perhaps, a triflingly longer build time.[1]

Update

The archive program’s update command replaces named members and adds new files onto the end of an existing archive. Kernighan and Plauger treat creating an archive and updating an archive as the same operation. Creation for them is an update where there’s nothing to replace. They also implement updating in place. If you’re updating, say, the second file in an archive, the new version will be also be the second file in the archive.

For me, creation and updating are not the same. Updating is an operation on an archive that already exists. It doesn’t seem ok to create one if it isn’t there. The order of files within the archive doesn’t seem important either. As a user, we can’t say give me the first and third file in the archive, we can only refer to the contents by name. To update an archive, we can remove any existing versions from the archive, then add the new versions to the end of archive. And hey, there’s code to that already.

This piece of code is at a slightly higher level than that previously shown, so it’s got a bit of the file manipulation detail.

update.cpp
void update(std::string const& archive, std::vector<std::string> const& files) {
  auto working = working_file(); (1)

  {
    auto archive_in = std::ifstream(archive);
    auto archive_out = std::ofstream(working);
    delete_from_archive(archive_in, files, archive_out);

    auto input_files = gather_input_files(files); (2)
    append_archive(input_files, archive_out);
  } (3)

  fs::rename(working, archive); (4)
} // update

Beautiful. Really pleased with this.

  1. C++ filesystem library has fs::temp_directory_path() but nothing to generate a temporary filename. working_file() simply creates a random filename, and appends to the temporary directory path.

  2. gather_input_files collects the named input files, ensures they exist and grabs their size. If a file is missing, we can deal with it here rather than later on when we’re down in the details.[2]

  3. These operations are in their own little block so that the archive_in and archive_out streams are destroyed, closing their underlying files before we move the modified working copy over the original.

  4. In C++ as in Unix, renaming a file is the same as moving it. Even if a file with the new name already exists, it’s overwritten.

Extract

That just left the extract operation. I started with a test

test_extract
  auto archive_in = std::istringstream(archive);

  auto filename = std::string();
  auto out = std::ostringstream();

  auto mock_writer = [&filename, &out](std::string const& f) -> std::ostream& {
    filename = f;
    return out;
  };

  stiX::extract_files(archive_in, to_extract, mock_writer);

  REQUIRE(out.str() == expected);

Just as I did with create, I’m not letting extract_files interact directly with the file system. Instead, I’m passing in a function that’ll do whatever needs to be done file-wise and hands back a stream. For the tests, I’m not going to let it go anywhere near a real filesystem.

Test in place, a quick cut/paste/edit of the print operation

extract.hpp
template<typename FileWriteOpener>
void extract_files(
  std::istream& archive_in,
  std::vector<std::string> const& files,
  FileWriteOpener file_opener
) {
  read_archive(
    archive_in,
    [&files, &file_write_opener](
      std::istream& archive_in,
      archive_file const& header
    ) {
      if (of_interest(files, header)) {
        auto& out = file_opener(header.name);
        copy_contents(archive_in, header, out);
      }
      else
        skip_entry(archive_in, header);
    }
  );
} // extract_archive

Boom - tests pass. Just need to hook up to the real file creation function, and job done right?

hooking up extract
std::ofstream file_creator(
  std::string const& filename
);

void extract(
  std::string const& archive,
  std::vector<std::string> const& files
) {
  auto archive_in = std::ifstream(archive);
  extract_files(archive_in, files, file_creator);
} // extract

and …​ oh

/home/jez/work/jezuk/stiX/c++/chapter_3/6_archive/./extract.hpp:23:17: error: cannot bind non-const lvalue reference of type ‘std::basic_ofstream<char>&’ to an rvalue of type ‘std::basic_ofstream<char>’
   23 |           auto& out = file_opener(header.name);
      |                 ^~~

Ok, so instead of auto& out = …​ it should be auto out = …​.

and …​ oh, again

/home/jez/work/jezuk/stiX/c++/chapter_3/6_archive/test/../extract.hpp:23:16: error: use of deleted function ‘std::basic_ostream<_CharT, _Traits>::basic_ostream(const std::basic_ostream<_CharT, _Traits>&) [with _CharT = char; _Traits = std::char_traits<char>]’
   23 |           auto out = file_opener(header.name);
      |                ^~~

Damned with a cannot bind non-const lvalue reference of type ‘std::basic_ofstream<char>&’ to an rvalue of type ‘std::basic_ofstream<char>’ if I do, damned with a use of deleted function ‘std::basic_ostream<_CharT, _Traits>::basic_ostream(const std::basic_ostream<_CharT, _Traits>&) if I don’t. This time the error has come while compiling the tests. It took me a little while to work this out. extract_files is a template function, and I’m instantiating it with two functions that have similar but ever so slightly different signatures.

close, but no cigar
std::ofstream file_creator(
  std::string const& filename
); // the real function

[&filename, &out](std::string const& f) -> std::ostream&
// the test mock, which is more or less
std::ostream& test_mock(std::string const& f);

One returns a stream, the other a stream reference. The rules for auto type deduction are the same as for template type deduction, which basically means the reference gets tossed away. If I force a reference, auto& out …​ it can’t bind to the object returned by file_creator. If we leave it as auto out …​, it’ll try to copy the ostream reference returned by test_mock, which fails because ostream is an abstract class and so can’t be copied.

What we need here is for out to take on the actual return type of the function parameter, and to do that we just change the rules!

decltype(auto) out = file_writer(header.name);

Now we’ll pick up the actual return type of the function, and everything clicks into place.

But wait, there’s more …​

Printing and extracting are almost the same. The extract implementation is just a light edit of print. That can’t stand - it’s a big fat duplication that can be factored away.

Bang, and the duplication is gone
template<typename FileWriteOpener>
void extract_files_to_sink(
  std::istream& archive_in,
  std::vector<std::string> const& files,
  FileWriteOpener file_writer
) {
  read_archive(
    archive_in,
    [&files, &file_writer](
      std::istream& archive_in,
      archive_file const& header
    ) {
      if (of_interest(files, header)) {
        decltype(auto) out = file_writer(header.name);
        copy_contents(archive_in, header, out);
      }
      else
        skip_entry(archive_in, header);
    }
  );
} // extract_files_to_sink

std::ostream& send_to_stdout(std::string const&) {
  return std::cout;
} // send_to_stdout

void print_files(
  std::istream& archive_in,
  std::vector<std::string> const& files
) {
  extract_files_to_sink(
    archive_in,
    files,
    send_to_stdout
  );
} // print_files

std::ofstream file_creator(
  std::string const& filename
) {
  ...
} // file_creator

void extract_files(
  std::istream& archive_in,
  std::vector<std::string> const& files
) {
  extract_files_to_sink(
    archive_in,
    files,
    file_write_opener
  );
} // extract_files

And that, at last, is that

Six weeks (crikey) of start-stop (mainly stop) work from when I started, archive is done, and so is chapter three. Blimey.

My first encounter with a file archiving program was in the summer of 1989 during which I had the use of a friend’s Amstrad CPC 6128. In between writing programs to calculate 4 colour Mandelbrot Sets (I’d set them running in the morning when I went out to work, returning several hours later to see what I’d find), I spent the time poking around his collection of floppy disks. Some of those disks had ARC files on them, and when I discovered those files had files inside them, it just blew my mind. Writing a file archiver of my own, albeit with a strong steer, is really quite satisfying.

Source Code

Source code for this program, indeed the whole project, is available in the stiX GitHub repository. archive is the sixth program of Chapter 3.

Endnotes

This whole endeavour relies on Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They’re both still in print, but new copies are, frankly, just ridiculously expensive. Happily, there are plenty of second-hand copies floating round, or you can borrow them from The Internet Archive using the links above.

For this project I’ve been using JetBrain’s CLion, which I liked enough to buy and renew a license.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this project is the first time really getting in and using it. I like it, and it’ll be my first choice in the future.


1. No, I didn’t even try to measure either of those things.
2. Of course, it’s possible that a file might be deleted in the time between to the call to gather_input_files and when we try to open it for reading, but I don’t feel that’s a case it’s reasonable to cover here. Or in most cases, come to that.

Tagged code, and software-tools-in-c++


Jez Higgins

Freelance software grandad
software created
extended or repaired

Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed