I'm writing an application that should process large ammounts of data (between 1-10 GB) as realtime as possible. the data is present in multiple binary data files on harddisk, each between few kb and 128MB. when the process starts, first it is decided which data is actually needed. then some user settings are taken through the userinterface and then the data is processed chunk by chunk where always a file is loaded into memory, processed, and then cleared from the memory. this processing should be fast because the user can change some settings and then the same data is reprocessed and this user interaction should be as fluent as possible. Now the loading from disk is quite some bottleneck and I would like to preload the data already at the stage where it's decided what files will be used. however - if I preload too much data, the os will use virtual memory and i'll have plenty of pagefaults, making the processing even slower. how can I determine how much data to preload in order to keep pagefaults low? can I influence the os somehow on what data I want to keep in memory? thanks! //edit: i'm currently running on Windows 7 64 (the application is 32bit however) and the application does not need to run on any computer - only on a specific one since this is a research project.

For a Windows platform, I would recommend you look into : <ul> <li> MapViewOfFile function : maps a view of a file mapping into the address space of a calling process </li> <li> I/O Completion Ports : an efficient threading model for processing multiple asynchronous I/O requests on a multiprocessor system </li> </ul>

c++ heavy data processing and paging

Tags:

c++

windows

virtual-functions

paging

I'm writing an application that should process large ammounts of data (between 1-10 GB) as realtime as possible.

the data is present in multiple binary data files on harddisk, each between few kb and 128MB. when the process starts, first it is decided which data is actually needed. then some user settings are taken through the userinterface and then the data is processed chunk by chunk where always a file is loaded into memory, processed, and then cleared from the memory. this processing should be fast because the user can change some settings and then the same data is reprocessed and this user interaction should be as fluent as possible.

Now the loading from disk is quite some bottleneck and I would like to preload the data already at the stage where it's decided what files will be used. however - if I preload too much data, the os will use virtual memory and i'll have plenty of pagefaults, making the processing even slower.

how can I determine how much data to preload in order to keep pagefaults low? can I influence the os somehow on what data I want to keep in memory?

thanks!

//edit: i'm currently running on Windows 7 64 (the application is 32bit however) and the application does not need to run on any computer - only on a specific one since this is a research project.

531

asked Nov 10 '10 09:11

Mat

2 Answers

For a general case random access to large binary files I would consider using native OS file memory mapping API. This will most probably be the most efficient solution from performance perspective, there is also a system API available in most OS-es to lock a page in memory, but I wouldn't use it. When doing something more specific, it is possible in most cases to have a smart indexing to know exactly what is where and solve most performance bottlenecks by that.

And yes, there is no magic, if you need all 10G available in RAM because they are accessed equally often, get 16GB of RAM on your box.

128

answered Sep 28 '22 03:09

bobah

For a Windows platform, I would recommend you look into :

MapViewOfFile function : maps a view of a file mapping into the address space of a calling process
I/O Completion Ports : an efficient threading model for processing multiple asynchronous I/O requests on a multiprocessor system

answered Sep 28 '22 03:09

icecrime

Related questions
                            
                                boost::filesystem::rename: Cannot create a file when that file already exists
                            
                                Using Boost.Program_options in modular program
                            
                                Finding the memory address of a loaded DLL in a process in C++
                            
                                How can I get the duration of an MP3 file (CBR or VBR) with a very small library or native code c/c++?
                            
                                How to Access File Descriptor of Open File
                            
                                boost::asio async_accept Refuse a connection
                            
                                Compile C++ code for AIX on Ubuntu?
                            
                                C/C++ Question about trace-programming techniques
                            
                                Segmentation fault in std::map::insert(...)
                            
                                Is return atomic and should I use temporary in getter to be thread safe?
                            
                                "C# base class virtual function" - "override in Managed C++ ref class"
                            
                                Mock objects in C++
                            
                                C++ - Clutter 1.0 - calling function from thread causes segfault
                            
                                How to read successfully from a 2D texture
                            
                                Why is my program slow ? How can I improve its efficiency?
                            
                                Compiling static TagLib 1.6.3 libraries for Windows
                            
                                lvalue-to-rvalue conversion of an array in ISO C
                            
                                Reference to uninitialized object iniside constructor
                            
                                How to dllimport in Microsoft Visual C++
                            
                                Date in another Timezone: C++ on Linux

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With