Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the idiomatic C++17 standard approach to reading binary files?

Tags:

c++

file

io

c++17

Normally I would just use C style file IO, but I'm trying a modern C++ approach, including using the C++17 specific features std::byte and std::filesystem.

Reading an entire file into memory, traditional method:

#include <stdio.h>
#include <stdlib.h>

char *readFileData(char *path)
{
    FILE *f;
    struct stat fs;
    char *buf;

    stat(path, &fs);
    buf = (char *)malloc(fs.st_size);

    f = fopen(path, "rb");
    fread(buf, fs.st_size, 1, f);
    fclose(f);

    return buf;
}

Reading an entire file into memory, modern approach:

#include <filesystem>
#include <fstream>
#include <string>
using namespace std;
using namespace std::filesystem;

auto readFileData(string path)
{
    auto fileSize = file_size(path);
    auto buf = make_unique<byte[]>(fileSize);
    basic_ifstream<byte> ifs(path, ios::binary);
    ifs.read(buf.get(), fileSize);
    return buf;
}

Does this look about right? Can this be improved?

like image 431
Terry Brian Avatar asked Jul 15 '18 23:07

Terry Brian


1 Answers

Personally I prefer std::vector<std::byte>to using std::string unless you are reading an actual text document. The problem with make_unique<byte[]>(fileSize); is that you instantly lose the size of the data and have to carry it in a separate variable. It may be a tiny fraction faster than a std::vector<std::byte> given that it won't zero initialize. But I think that will probably always be overshadowed by the time taken reading off the disk.

So for a binary file I use something like this:

std::vector<std::byte> load_file(std::string const& filepath)
{
    std::ifstream ifs(filepath, std::ios::binary|std::ios::ate);

    if(!ifs)
        throw std::runtime_error(filepath + ": " + std::strerror(errno));

    auto end = ifs.tellg();
    ifs.seekg(0, std::ios::beg);

    auto size = std::size_t(end - ifs.tellg());

    if(size == 0) // avoid undefined behavior 
        return {}; 

    std::vector<std::byte> buffer(size);

    if(!ifs.read((char*)buffer.data(), buffer.size()))
        throw std::runtime_error(filepath + ": " + std::strerror(errno));

    return buffer;
}

This is the fastest method I know of. It also avoids a common mistake in determining the size of the data in the file because ifs.tellg() is not necessarily the same as the file size after opening the file at the end and ifs.seekg(0) is not theoretically the correct way to locate the start of the file (even though it works in practice most places).

The error message from std::strerror(errno) is guaranteed to work on POSIX systems (that should include Microsoft but not sure).

Obviously you can use std::filesystem::path const& filepath in place of std::string if you want.

Also, especially for pre C++17, you can use std::vector<unsigned char> or std::vector<char> if you don't have or want to use std::byte.

like image 172
Galik Avatar answered Nov 18 '22 10:11

Galik