STinC++ rewrites the programs in Software Tools in Pascal using C++
At the end of my last installment, I had archive creation going, albeit with mocked up input. Putting together inputs and expected outputs as inline strings quickly became rather tedious, and I extended out my tests to pick up test cases from the filesystem. Subsequently, I was able to reuse the same test cases to drive archive
from the very top, passing in the command line arguments. I was even able to reuse code from compare
to verify the program output.
With archive creation out the way, the next step seemed to be listing the contents. I followed that with removing files from the archive, and then printing a file to standard out, the -t
, -d
, and -p
command line options. Recall that the archive format is a header line
followed by the file contents, the next header line and contents, and so on.
These three operations, which operate on an existing archive file, all have a similar shape
void table_archive(
std::istream& archive_in,
std::ostream& out
) {
archive_in.peek();
while (archive_in && !archive_in.eof()) {
auto header_line = getline(archive_in);
auto header = parse_header(header_line);
out << header.name << '\t' << header.filesize << '\n';
skip_entry(archive_in, header);
archive_in.peek();
}
} // table_archive
void delete_from_archive(
std::istream& archive_in,
std::vector<std::string> const& files_to_remove,
std::ostream& archive_out
) {
archive_in.peek();
while (archive_in && !archive_in.eof()) {
auto header_line = getline(archive_in);
auto header = parse_header(header_line);
if (of_interest(files_to_remove, header))
skip_entry(archive_in, header);
else {
archive_out << header;
copy_contents(archive_in, header, archive_out);
}
archive_in.peek();
} // while ...
} // delete_from_archive
void print_files(
std::istream& archive_in,
std::vector<std::string> const& files,
std::ostream& out
) {
archive_in.peek();
while(archive_in && !archive_in.eof()) {
auto header_line = getline(archive_in);
auto header = parse_header(header_line);
if (of_interest(files, header))
copy_contents(archive_in, header, out);
else
skip_entry(archive_in, header);
archive_in.peek();
} // while ...
} // print_files
Before I lined them up like this, I hadn’t realised just how similar they are. There’s a fairly obvious refactoring to do next time I touch the code.
archive_in.peek()
is a cheeky lookahead that I’m using to set the end-of-file (technically end-of-input) flag before I try to read anything, rather than afterwards. Let’s imagine we’ve been churning through an archive file, and have just read the last header and contents. We now call archive_in.peek()
which goes away and tries to find the next character of the input. There isn’t one, so the stream sets its end of file flag. Looping back up to the top of the while
loop, we call archive_in.eof()
, which now returns true
causing us to break out of the loop. Without the peek()
, that call to archive_in.eof()
would return false
because we hadn’t reached the end of the input yet. The subsequent call to getline()
would hit end of input, so would return an empty string. We would then need to handle that inside the loop, and the whole thing starts to get a bit messy. I don’t think I’ve used peek()
in this way before, but I wish I’d worked it out years ago.