I am using 2D Eigen::Array
s for a project, and I like to keep using them in the case of huge 2D arrays.
For avoiding memory issues, I thought to use memory mapped files to manage (read/modify/write) these arrays, but I cannot find working examples.
The closest example that I have found is this based on boost::interprocess
, but it uses shared-memory (while I'd prefer to have persistent storage).
The lack of examples makes me worry if there is a better, main-stream alternative solution to my problem. Is this the case? A minimal example would be very handy.
EDIT:
This is a minimal example explaining my use case in the comments:
#include <Eigen/Dense>
int main()
{
// Order of magnitude of the required arrays
Eigen::Index rows = 50000;
Eigen::Index cols = 40000;
{
// Array creation (this is where the memory mapped file should be created)
Eigen::ArrayXXf arr1 = Eigen::ArrayXXf::Zero( rows, cols );
// Some operations on the array
for(Eigen::Index i = 0; i < rows; ++i)
{
for(Eigen::Index j = 0; j < cols; ++j)
{
arr1( i, j ) = float(i * j);
}
}
// The array goes out of scope, but the data are persistently stored in the file
}
{
// This should actually use the data stored in the file
Eigen::ArrayXXf arr2 = Eigen::ArrayXXf::Zero( rows, cols );
// Manipulation of the array data
for(Eigen::Index i = 0; i < rows; ++i)
{
for(Eigen::Index j = 0; j < cols; ++j)
{
arr2( i, j ) += 1.0f;
}
}
// The array goes out of scope, but the data are persistently stored in the file
}
}
So i googled
boost memory mapped file
and came upon boost::iostreams::mapped_file
in the first result.
Combined with the link to Eigen::Map
from this comment i tested the following:
#include <boost/iostreams/device/mapped_file.hpp>
#include <Eigen/Dense>
boost::iostreams::mapped_file file("foo.bin");
const std::size_t rows = 163840;
const std::size_t columns = 163840;
if (rows * columns * sizeof(float) > file.size()) {
throw std::runtime_error("file of size " + std::to_string(file.size()) + " couldn’t fit float Matrix of " + std::to_string(rows) + "×" + std::to_string(columns));
}
Eigen::Map<Eigen::MatrixXf> matrix(reinterpret_cast<float*>(file.data()), rows, columns);
std::cout << matrix(0, 0) << ' ' << matrix(rows - 1, columns - 1) << std::endl;
matrix(0, 0) = 0.5;
matrix(rows - 1, columns - 1) = 0.5;
using cmake
find_package(Boost REQUIRED COMPONENTS iostreams)
find_package(Eigen3 REQUIRED)
target_link_libraries(${PROJECT_NAME} Boost::iostreams Eigen3::Eigen)
Then i googled
windows create dummy file
and the first result gave me
fsutil file createnew foo.bin 107374182400
Running the program twice gives:
0 0
0.5 0.5
without blowing up memory usage.
So it works like a charm.
I think it wouldn't be that hard to write your own class for this.
To initialize the array for the first time, create a file of size x * y * elem_size
and memory map
it.
You could even add a small header with information such as size, x, y, etc. - so that if you reopen those you have all the info you need.
Now you have one big memory block and you could use a member function elem(x,y)
or get_elem()
/ set_elem()
or use the [] operator
, and in that function calculate the position of the data element.
Closing the file, or committing in between, will save the data.
For really large files it could be better to map
only portions of the file when they are needed to avoid the creation of a very large page table.
Windows specific (not sure if those are available in Linux):
If you don't need to keep the data on disk, you can open the file with the delete on close
flag. This will only (temporary) write to disk if memory becomes unavailable.
For sparse arrays, a sparse file could be used. Those files only use disk space for the blocks that contain data. All other blocks are virtual and default to all zeros.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With