Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c++ heavy data processing and paging

I'm writing an application that should process large ammounts of data (between 1-10 GB) as realtime as possible.

the data is present in multiple binary data files on harddisk, each between few kb and 128MB. when the process starts, first it is decided which data is actually needed. then some user settings are taken through the userinterface and then the data is processed chunk by chunk where always a file is loaded into memory, processed, and then cleared from the memory. this processing should be fast because the user can change some settings and then the same data is reprocessed and this user interaction should be as fluent as possible.

Now the loading from disk is quite some bottleneck and I would like to preload the data already at the stage where it's decided what files will be used. however - if I preload too much data, the os will use virtual memory and i'll have plenty of pagefaults, making the processing even slower.

how can I determine how much data to preload in order to keep pagefaults low? can I influence the os somehow on what data I want to keep in memory?

thanks!

//edit: i'm currently running on Windows 7 64 (the application is 32bit however) and the application does not need to run on any computer - only on a specific one since this is a research project.

like image 531
Mat Avatar asked Nov 10 '10 09:11

Mat


People also ask

What is data paging in Ado net?

Paging through a query result is the process of returning the results of a query in smaller subsets of data, or pages. This is a common practice for displaying results to a user in small, easy-to-manage chunks. The DataAdapter provides a facility for returning only a page of data, through overloads of the Fill method.

What is pagination C#?

The C# pagination logic is contained in a single Pager class that takes the following constructor arguments: totalItems (required) - the total number of items to be paged. currentPage (optional) - the current active page, defaults to the first page. pageSize (optional) - the number of items per page, defaults to 10.

How can we implement pagination in asp net core?

How to implement paging in ASP.NET Core Web API. In an empty project, update the Startup class to add services and middleware for MVC. Add models to hold link and paging data. Create a type to hold the paged list.


2 Answers

For a general case random access to large binary files I would consider using native OS file memory mapping API. This will most probably be the most efficient solution from performance perspective, there is also a system API available in most OS-es to lock a page in memory, but I wouldn't use it. When doing something more specific, it is possible in most cases to have a smart indexing to know exactly what is where and solve most performance bottlenecks by that.

And yes, there is no magic, if you need all 10G available in RAM because they are accessed equally often, get 16GB of RAM on your box.

like image 128
bobah Avatar answered Sep 28 '22 03:09

bobah


For a Windows platform, I would recommend you look into :

  • MapViewOfFile function : maps a view of a file mapping into the address space of a calling process
  • I/O Completion Ports : an efficient threading model for processing multiple asynchronous I/O requests on a multiprocessor system
like image 40
icecrime Avatar answered Sep 28 '22 03:09

icecrime