Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing binary file too slow in C++ using memory-mapped files

I'm trying to parse a binary file integer-wise in order to check whether the integer value fulfills a certain condition but the loop is very slow.

Furthermore, I found that memory-mapped files are the fastest for reading a file into the memory quickly, hence I'm using the following Boost-based code:

unsigned long long int get_file_size(const char *file_path) {
    const filesystem::path file{file_path};
    const auto generic_path = file.generic_path();
    return filesystem::file_size(generic_path);
}

boost::iostreams::mapped_file_source read_bytes(const char *file_path,
                                         const unsigned long long int offset,
                                         const unsigned long long int length) {
    boost::iostreams::mapped_file_params parameters;
    parameters.path = file_path;
    parameters.length = static_cast<size_t>(length);
    parameters.flags = boost::iostreams::mapped_file::mapmode::readonly;
    parameters.offset = static_cast<boost::iostreams::stream_offset>(offset);

    boost::iostreams::mapped_file_source file;

    file.open(parameters);
    return file;
}

boost::iostreams::mapped_file_source read_bytes(const char *file_path) {
    const auto file_size = get_file_size(file_path);
    const auto mapped_file_source = read_bytes(file_path, 0, file_size);
    return mapped_file_source;
}

My test case roughly looks as follows:

inline auto test_parsing_binary_file_performance() {
    const auto start_time = get_time();
    const std::filesystem::path input_file_path = "...";
    const auto mapped_file_source = read_bytes(input_file_path.string().c_str());
    const auto file_buffer = mapped_file_source.data();
    const auto file_buffer_size = mapped_file_source.size();
    LOG_S(INFO) << "File buffer size: " << file_buffer_size;
    auto printed_lap = (long) (file_buffer_size / (double) 1000);
    printed_lap = round_to_nearest_multiple(printed_lap, sizeof(int));
    LOG_S(INFO) << "Printed lap: " << printed_lap;
    std::vector<int> values;
    values.reserve(file_buffer_size / sizeof(int)); // Pre-allocate a large enough vector
    // Iterate over every integer
    for (auto file_buffer_index = 0; file_buffer_index < file_buffer_size; file_buffer_index += sizeof(int)) {
        const auto value = *(int *) &file_buffer[file_buffer_index];
        if (value >= 0x30000000 && value < 0x49000000 - sizeof(int) + 1) {
            values.push_back(value);
        }

        if (file_buffer_index % printed_lap == 0) {
            LOG_S(INFO) << std::setprecision(4) << file_buffer_index / (double) file_buffer_size * 100 << "%";
        }
    }

    LOG_S(INFO) << "Values found count: " << values.size();

    print_time_taken(start_time, false, "Parsing binary file");
}

The memory-mapped file reading finishes almost instantly as expected but iterating it integer-wise is way too slow on my machine despite excellent hardware (SSD etc.):

2020-12-20 13:04:35.124 (   0.019s) [main thread     ]Tests.hpp:387   INFO| File buffer size: 419430400
2020-12-20 13:04:35.124 (   0.019s) [main thread     ]Tests.hpp:390   INFO| Printed lap: 419432
2020-12-20 13:04:35.135 (   0.029s) [main thread     ]Tests.hpp:405   INFO| 0%
2020-12-20 13:04:35.171 (   0.065s) [main thread     ]Tests.hpp:405   INFO| 0.1%
2020-12-20 13:04:35.196 (   0.091s) [main thread     ]Tests.hpp:405   INFO| 0.2%
2020-12-20 13:04:35.216 (   0.111s) [main thread     ]Tests.hpp:405   INFO| 0.3%
2020-12-20 13:04:35.241 (   0.136s) [main thread     ]Tests.hpp:405   INFO| 0.4%
2020-12-20 13:04:35.272 (   0.167s) [main thread     ]Tests.hpp:405   INFO| 0.5%
2020-12-20 13:04:35.293 (   0.188s) [main thread     ]Tests.hpp:405   INFO| 0.6%
2020-12-20 13:04:35.314 (   0.209s) [main thread     ]Tests.hpp:405   INFO| 0.7%
2020-12-20 13:04:35.343 (   0.237s) [main thread     ]Tests.hpp:405   INFO| 0.8%
2020-12-20 13:04:35.366 (   0.261s) [main thread     ]Tests.hpp:405   INFO| 0.9%
2020-12-20 13:04:35.399 (   0.293s) [main thread     ]Tests.hpp:405   INFO| 1%
2020-12-20 13:04:35.421 (   0.315s) [main thread     ]Tests.hpp:405   INFO| 1.1%
2020-12-20 13:04:35.447 (   0.341s) [main thread     ]Tests.hpp:405   INFO| 1.2%
2020-12-20 13:04:35.468 (   0.362s) [main thread     ]Tests.hpp:405   INFO| 1.3%
2020-12-20 13:04:35.487 (   0.382s) [main thread     ]Tests.hpp:405   INFO| 1.4%
2020-12-20 13:04:35.520 (   0.414s) [main thread     ]Tests.hpp:405   INFO| 1.5%
2020-12-20 13:04:35.540 (   0.435s) [main thread     ]Tests.hpp:405   INFO| 1.6%
2020-12-20 13:04:35.564 (   0.458s) [main thread     ]Tests.hpp:405   INFO| 1.7%
2020-12-20 13:04:35.586 (   0.480s) [main thread     ]Tests.hpp:405   INFO| 1.8%
2020-12-20 13:04:35.608 (   0.503s) [main thread     ]Tests.hpp:405   INFO| 1.9%
2020-12-20 13:04:35.636 (   0.531s) [main thread     ]Tests.hpp:405   INFO| 2%
2020-12-20 13:04:35.658 (   0.552s) [main thread     ]Tests.hpp:405   INFO| 2.1%
2020-12-20 13:04:35.679 (   0.574s) [main thread     ]Tests.hpp:405   INFO| 2.2%
2020-12-20 13:04:35.702 (   0.597s) [main thread     ]Tests.hpp:405   INFO| 2.3%
2020-12-20 13:04:35.727 (   0.622s) [main thread     ]Tests.hpp:405   INFO| 2.4%
2020-12-20 13:04:35.769 (   0.664s) [main thread     ]Tests.hpp:405   INFO| 2.5%
2020-12-20 13:04:35.802 (   0.697s) [main thread     ]Tests.hpp:405   INFO| 2.6%
2020-12-20 13:04:35.831 (   0.726s) [main thread     ]Tests.hpp:405   INFO| 2.7%
2020-12-20 13:04:35.860 (   0.754s) [main thread     ]Tests.hpp:405   INFO| 2.8%
2020-12-20 13:04:35.887 (   0.781s) [main thread     ]Tests.hpp:405   INFO| 2.9%
2020-12-20 13:04:35.924 (   0.818s) [main thread     ]Tests.hpp:405   INFO| 3%
2020-12-20 13:04:35.956 (   0.850s) [main thread     ]Tests.hpp:405   INFO| 3.1%
2020-12-20 13:04:35.998 (   0.893s) [main thread     ]Tests.hpp:405   INFO| 3.2%
2020-12-20 13:04:36.033 (   0.928s) [main thread     ]Tests.hpp:405   INFO| 3.3%
2020-12-20 13:04:36.060 (   0.955s) [main thread     ]Tests.hpp:405   INFO| 3.4%
2020-12-20 13:04:36.102 (   0.997s) [main thread     ]Tests.hpp:405   INFO| 3.5%
2020-12-20 13:04:36.132 (   1.026s) [main thread     ]Tests.hpp:405   INFO| 3.6%
...
2020-12-20 13:05:03.456 (  28.351s) [main thread     ]Tests.hpp:410   INFO| Values found count: 10650389
2020-12-20 13:05:03.456 (  28.351s) [main thread     ]          benchmark.cpp:31    INFO| Parsing binary file took 28.341 second(s)

Parsing those 419 MB always takes around 28 - 70 seconds. Even compiling in Release mode does not really help. Is there any way to cut this time down? It doesn't seem like the operation I'm performing should be that inefficient.

Note that I'm compiling for Linux 64-bit using GCC 10.

EDIT:
As suggested in the comments, using memory-mapped files with advise() also does not help the performance:

boost::interprocess::file_mapping file_mapping(input_file_path.string().data(), boost::interprocess::read_only);
boost::interprocess::mapped_region mapped_region(file_mapping, boost::interprocess::read_only);
mapped_region.advise(boost::interprocess::mapped_region::advice_sequential);
const auto file_buffer = (char *) mapped_region.get_address();
const auto file_buffer_size = mapped_region.get_size();
...

Lessons learned so far by taking into account the comments/answers:

  • Using advise(boost::interprocess::mapped_region::advice_sequential) does not help
  • Not calling reserve() or calling it with exactly the right size can double the performance
  • Iterating directly on int * is a bit slower than iterating on a char *
  • Using a std::set is a bit slower than a std::vector for collecting the results
  • The progress logging is insignificant for the performance
like image 406
BullyWiiPlaza Avatar asked Dec 20 '20 10:12

BullyWiiPlaza


People also ask

Are memory mapped files faster?

Accessing memory mapped files is faster than using direct read and write operations for two reasons. Firstly, a system call is orders of magnitude slower than a simple change to a program's local memory.

Why is memory-mapped IO faster?

Memory-mapped I/O provides several potential advantages over explicit read/write I/O, especially for low latency devices: (1) It does not require a system call, (2) it incurs almost zero overhead for data in memory (I/O cache hits), and (3) it removes copies between kernel and user space.

How do memory mapped files simplify application programming?

In some cases, memory-mapped files simplify the logic of a program by using memory-mapped I/O. Rather than using fseek() multiple times to jump to random file locations, the data can be accessed directly by using an index into an array. Memory-mapped files provide more efficient access for initial reads.

How shared memory relates to memory mapped files?

Memory-mapped files can be shared across multiple processes. Processes can map to the same memory-mapped file by using a common name that is assigned by the process that created the file. To work with a memory-mapped file, you must create a view of the entire memory-mapped file or a part of it.


1 Answers

As hinted by xanatos memory-mapped files are deceiving in performance since they don't really read the entire file into memory in an instant. During processing, multiple disk accesses are caused on page misses, severely degrading the performance.

In this case it is more efficient to read the entire file into the memory first and then iterating through the memory:

inline std::vector<std::byte> load_file_into_memory(const std::filesystem::path &file_path) {
    std::ifstream input_stream(file_path, std::ios::binary | std::ios::ate);

    if (input_stream.fail()) {
        const auto error_message = "Opening " + file_path.string() + " failed";
        throw std::runtime_error(error_message);
    }

    auto current_read_position = input_stream.tellg();
    input_stream.seekg(0, std::ios::beg);

    auto file_size = std::size_t(current_read_position - input_stream.tellg());
    if (file_size == 0) {
        return {};
    }

    std::vector<std::byte> buffer(file_size);

    if (!input_stream.read((char *) buffer.data(), buffer.size())) {
        const auto error_message = "Reading from " + file_path.string() + " failed";
        throw std::runtime_error(error_message);
    }

    return buffer;
}

Now the performance is much more acceptable with roughly 3 - 15 seconds in total.

like image 110
BullyWiiPlaza Avatar answered Sep 19 '22 15:09

BullyWiiPlaza