Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Eigen and huge dense 2D arrays

I am using 2D Eigen::Arrays for a project, and I like to keep using them in the case of huge 2D arrays.

For avoiding memory issues, I thought to use memory mapped files to manage (read/modify/write) these arrays, but I cannot find working examples.

The closest example that I have found is this based on boost::interprocess, but it uses shared-memory (while I'd prefer to have persistent storage).

The lack of examples makes me worry if there is a better, main-stream alternative solution to my problem. Is this the case? A minimal example would be very handy.

EDIT:

This is a minimal example explaining my use case in the comments:

#include <Eigen/Dense>


int main()
{
    // Order of magnitude of the required arrays
    Eigen::Index rows = 50000;
    Eigen::Index cols = 40000;

    {
        // Array creation (this is where the memory mapped file should be created)
        Eigen::ArrayXXf arr1 = Eigen::ArrayXXf::Zero( rows, cols );

        // Some operations on the array
        for(Eigen::Index i = 0; i < rows; ++i)
        {
            for(Eigen::Index j = 0; j < cols; ++j)
            {
                arr1( i, j ) = float(i * j);
            }
        }

        // The array goes out of scope, but the data are persistently stored in the file
    }

    {
        // This should actually use the data stored in the file
        Eigen::ArrayXXf arr2 = Eigen::ArrayXXf::Zero( rows, cols );

        // Manipulation of the array data
        for(Eigen::Index i = 0; i < rows; ++i)
        {
            for(Eigen::Index j = 0; j < cols; ++j)
            {
                arr2( i, j ) += 1.0f;
            }
        }

        // The array goes out of scope, but the data are persistently stored in the file
    }

}
like image 374
gmas80 Avatar asked Jun 30 '18 19:06

gmas80


2 Answers

So i googled

boost memory mapped file

and came upon boost::iostreams::mapped_file in the first result.

Combined with the link to Eigen::Map from this comment i tested the following:

#include <boost/iostreams/device/mapped_file.hpp>
#include <Eigen/Dense>
boost::iostreams::mapped_file file("foo.bin");

const std::size_t rows = 163840;
const std::size_t columns = 163840;
if (rows * columns * sizeof(float) > file.size()) {
    throw std::runtime_error("file of size " + std::to_string(file.size()) + " couldn’t fit float Matrix of " + std::to_string(rows) + "×"  + std::to_string(columns));
}

Eigen::Map<Eigen::MatrixXf> matrix(reinterpret_cast<float*>(file.data()), rows, columns);

std::cout << matrix(0, 0) << ' ' << matrix(rows - 1, columns - 1) << std::endl;
matrix(0, 0) = 0.5;
matrix(rows - 1, columns - 1) = 0.5;

using cmake

find_package(Boost REQUIRED COMPONENTS iostreams)
find_package(Eigen3 REQUIRED)
target_link_libraries(${PROJECT_NAME} Boost::iostreams Eigen3::Eigen)

Then i googled

windows create dummy file

and the first result gave me

fsutil file createnew foo.bin 107374182400

Running the program twice gives:

0 0

0.5 0.5

without blowing up memory usage.

So it works like a charm.

like image 56
Darklighter Avatar answered Oct 27 '22 03:10

Darklighter


I think it wouldn't be that hard to write your own class for this.

To initialize the array for the first time, create a file of size x * y * elem_size and memory map it.

You could even add a small header with information such as size, x, y, etc. - so that if you reopen those you have all the info you need.

Now you have one big memory block and you could use a member function elem(x,y) or get_elem() / set_elem() or use the [] operator, and in that function calculate the position of the data element.

Closing the file, or committing in between, will save the data.

For really large files it could be better to map only portions of the file when they are needed to avoid the creation of a very large page table.

Windows specific (not sure if those are available in Linux):

  • If you don't need to keep the data on disk, you can open the file with the delete on close flag. This will only (temporary) write to disk if memory becomes unavailable.

  • For sparse arrays, a sparse file could be used. Those files only use disk space for the blocks that contain data. All other blocks are virtual and default to all zeros.

like image 1
Danny_ds Avatar answered Oct 27 '22 02:10

Danny_ds