I'm writing an application that should process large ammounts of data (between 1-10 GB) as realtime as possible.
the data is present in multiple binary data files on harddisk, each between few kb and 128MB. when the process starts, first it is decided which data is actually needed. then some user settings are taken through the userinterface and then the data is processed chunk by chunk where always a file is loaded into memory, processed, and then cleared from the memory. this processing should be fast because the user can change some settings and then the same data is reprocessed and this user interaction should be as fluent as possible.
Now the loading from disk is quite some bottleneck and I would like to preload the data already at the stage where it's decided what files will be used. however - if I preload too much data, the os will use virtual memory and i'll have plenty of pagefaults, making the processing even slower.
how can I determine how much data to preload in order to keep pagefaults low? can I influence the os somehow on what data I want to keep in memory?
thanks!
//edit: i'm currently running on Windows 7 64 (the application is 32bit however) and the application does not need to run on any computer - only on a specific one since this is a research project.
Paging through a query result is the process of returning the results of a query in smaller subsets of data, or pages. This is a common practice for displaying results to a user in small, easy-to-manage chunks. The DataAdapter provides a facility for returning only a page of data, through overloads of the Fill method.
The C# pagination logic is contained in a single Pager class that takes the following constructor arguments: totalItems (required) - the total number of items to be paged. currentPage (optional) - the current active page, defaults to the first page. pageSize (optional) - the number of items per page, defaults to 10.
How to implement paging in ASP.NET Core Web API. In an empty project, update the Startup class to add services and middleware for MVC. Add models to hold link and paging data. Create a type to hold the paged list.
For a general case random access to large binary files I would consider using native OS file memory mapping API. This will most probably be the most efficient solution from performance perspective, there is also a system API available in most OS-es to lock a page in memory, but I wouldn't use it. When doing something more specific, it is possible in most cases to have a smart indexing to know exactly what is where and solve most performance bottlenecks by that.
And yes, there is no magic, if you need all 10G available in RAM because they are accessed equally often, get 16GB of RAM on your box.
For a Windows platform, I would recommend you look into :
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With