Jez Higgins

Freelance software grandad
software created
extended or repaired


Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed

Wednesday 05 February 2020 The Forest Road Reader, No 2.43 : STinC++ - compare

STinC++ rewrites the programs in Software Tools in Pascal using C++

Here we are at the start of Software Tools in Pascal’s third chapter, entitled Files. It’s all very well reading one input and writing one output, but eventually we’re going to want to access “some kind of permanent file system where information can be kept for extended periods on secondary storage”.

In their opening paragraphs Kernighan and Plauger outline the territory - each operating system has its own jargon for describing file system operations, each has its own capabilities and limitations, some operations are easier on this system than that. This is still true, perhaps even more so given that there isn’t a one-to-one mapping of operating systems to filesystems, although happily most of us are able to ignore that in our daily programming lives.

System variations, while a pain, were not Kernighan and Plauger’s biggest obstacle - file system access was just not a thing Pascal concerned itself with. When I was taught Pascal some years after Software Tools in Pascal was written, the language support for file access could best be described as grudging. K&P handle the problem in the same way as they handled console i/o in earlier chapters, presenting a library of file access primitives. The primitives are, unsurprisingly, very similar to those provided in C, and much of the chapter is spent on describing their operation and the operation of file systems in general.

In keeping with their step-by-step approach, Chapter 3’s first program is compare, a file comparison program. Our program needs to read two files, but we can reuse the console output we’re familiar with.

Comparing two files

Because this is Software Tools in Pascal and not, say, The Art of Computer Programming, and the point of the chapter is file handling not algorithm development, the approach to file comparison is not sophisticated.

As long as the two lines are identical, no output will be produced; if [the files] end together, all is well, even if both are empty. If one file is a prefix of the other, then a message to that effect is printed. Finally, if the files differ anywhere in some line, the differing lines will be printed.

compare.cpp
void compare(
    std::string left_file_name,
    std::istream& left,
    std::string right_file_name,
    std::istream& right,
    std::ostream& out
) {
    size_t line_count = 1;
    do {
        auto left_line = getline(left);
        auto right_line = getline(right);

        if (left.eof() && !right.eof()) {
            out << end_of_file << left_file_name << '\n';
            return;
        }
        if (!left.eof() && right.eof()) {
            out << end_of_file << right_file_name << '\n';
            return;
        }

        if (left_line != right_line) {
            out << line_count << ':' << '\n'
                << left_line << '\n'
                << right_line << '\n';
        }

        ++line_count;
    } while (left && right);
} // compare

std::string getline(std::istream& in) {
    std::string line;
    std::getline(in, line);
    return line;
} // getline

It’s not often you get to write a do while loop. Exciting times! You can see this lines up exactly with Kernighan and Plauger’s description:

  • read a line from each input stream

  • check to see if one or other stream is at the end

  • compare the two lines, and output a message if they differ

  • repeat while there is still input to read

You’ll also note, file names aside, there’s no mention of files anywhere in the code. The input stream abstraction, familiar from our earlier endeavours shields us from it. If we really needed to get down into the details of a file - how big it was, or where are we in the file say - then, sure, we’d need to be dealing with a some kind of handle to the file itself, but here we’re not. We just need to be able to read each file until there’s nothing left, and the stream abstraction gives us exactly that.

I used the non-member std::getline in preference to the std::istream::getline member function because it deals in std::string rather than a char* and a count. We’re living in the 21st century and there’s no need to be messing around with raw memory pointers in a program like this. I then threw a little wrapper around std::getline because the out value by reference feels ugly.

To connect the compare function up to a couple of files, we just need a little bit of extra set up.

1_compare.cpp
int main(int argc, char const* argv[]) {
    auto arguments = stiX::make_arguments(argc, argv);
    if (arguments.size() != 2) {
        std::cout << argv[0] << " file1 file2" << std::endl;
        return 0;
    }

    auto left = openfile(arguments[0]);
    auto right = openfile(arguments[1]);

    if (!left || !right)
        return 0;

    stiX::compare( (3)
        arguments[0],
        left,
        arguments[1],
        right,
        std::cout
    );
}

std::ifstream openfile(std::string const& filename) {
    std::ifstream f(filename); (1)
    if (!f) (2)
        std::cout << "Could not open '" << filename << "'\n";
    return f;
}
  1. An std::ifstream is an input file stream. Passing a filename to the constructor attempts to open the file, ready for reading.

  2. An istream is good if it is readable and implicitly convertible to true. If our newly created stream is not good, we just give up and display an error message. (Note, however, that an istream that is not good is not necessarily bad.)

  3. With two open files, hand off to our compare function.

It does a job, but it’s no diff

The comparison algorithm used here is rudimentary at best. The program compare is mimicking this time is, of course, diff, which is another of Doug McIlroy’s many contributions to programmers the world over. The algorithm McIlroy and his colleague William Hunt developed, later described in (the short and very readable) An Algorithm for Differential File Comparison, is the basis of pretty much all file differencing and source code control tools, at least as far as I’m aware, as well as finding applications in funky things like analysing DNA sequences.

I once developed a three-way XML document diff and merge tool and it too used the Hunt-McIlroy algorithm, although I went the long way round to get it. I found a diff implementation written in Perl, which itself might have been a port from Smalltalk, and without any real understanding of how it worked translated that into Java. I then jabbed at for several days until it produced reasonable outputs. The end result was, I thought, a pretty decent bit of kit, although it absolutely failed to make my fortune.

Source code

Source code for this program, indeed the whole project, is available in the stiX GitHub repository. compare is first program of Chapter 3.

Library Reference

Endnotes

This whole endeavour relies Software Tools in Pascal and Software Tools, both by Brian W Kernighan and PJ Plauger. I love these books and commend them to you. They’re both still in print, but new copies are, frankly, just ridiculously expensive. Happily, here are plenty of second-hand copies floating round, or you can borrow them from the The Internet Archive using the links above.

For this project I’ve been using JetBrain’s CLion, which I liked enough to buy a license. CLion uses CMake to build projects. My previous flirtations with CMake, admittedly many years ago, weren’t a huge success. Not so this time - it’s easy to use and works a treat.

The test harness I’m using is Catch. I’ve been aware of Catch pretty much since it was first released, but this is my first time really using it. I like it and will use it again.


Tagged code, and software-tools-in-c++


Jez Higgins

Freelance software grandad
software created
extended or repaired

Follow me on Mastodon
Applications, Libraries, Code
Talks & Presentations

Hire me
Contact

Older posts are available in the archive or through tags.

Feed