Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize paging for large in memory database

I have an application where the entire database is implemented in memory using a stl-map for each table in the database.

Each item in the stl-map is a complex object with references to other items in the other stl-maps.

The application works with a large amount of data, so it uses more than 500 MByte RAM. Clients are able to contact the application and get a filtered version of the entire database. This is done by running through the entire database, and finding items relevant for the client.

When the application have been running for an hour or so, then Windows 2003 SP2 starts to page out parts of the RAM for the application (Eventhough there is 16 GByte RAM on the machine).

After the application have been partly paged out then a client logon takes a long time (10 mins) because it now generates a page fault for each pointer lookup in the stl-map. If running the client logon a second time right after then it is fast (few secs) because all the memory is now back in RAM.

I can see it is possible to tell Windows to lock memory in RAM, but this is generally only recommended for device drivers, and only for "small" amounts of memory.

I guess a poor mans solution could be to loop through the entire memory database, and thus tell Windows we are still interested in keeping the datamodel in RAM.

I guess another poor mans solution could be to disable the pagefile completely on Windows.

I guess the expensive solution would be a SQL database, and then rewrite the entire application to use a database layer. Then hopefully the database system will have implemented means to for fast access.

Are there other more elegant solutions ?

like image 389
Rolf Kristensen Avatar asked Jun 07 '10 13:06

Rolf Kristensen


People also ask

Does pagination improve performance?

Thanks to pagination, we can split our large dataset into chunks ( or pages ) that we can gradually fetch and display to the user, thus reducing the load on the database. Pagination also solves a lot of performance issues both on the client and server-side!

What is database pagination?

Pagination is the process of displaying the data on multiple pages rather than showing them on a single page. You usually do pagination when there is a database with numerous records. Dividing those records increases the readability of the data. It can retrieve this data as per the user's requests.

What do you mean by paging?

Paging is a function of memory management where a computer will store and retrieve data from a device's secondary storage to the primary storage. Memory management is a crucial aspect of any computing device, and paging specifically is important to the implementation of virtual memory.


2 Answers

This sounds like either a memory leak, or a serious fragmentation problem. It seems to me that the first step would be to figure out what's causing 500 Mb of data to use up 16 Gb of RAM and still want more.

Edit: Windows has a working set trimmer that actively attempts to page out idle data. The basic idea is that it goes through and marks pages as being available, but leaves the data in them (and the virtual memory manager knows what data is in them). If, however, you attempt to access that memory before it's allocated to other purposes, it'll be marked as being in use again, which will normally prevent it from being paged out.

If you really think this is the source of your problem, you can indirectly control the working set trimmer by calling SetProcessWorkingSetSize. At least in my experience, this is only rarely of much use, but you may be in one of those unusual situations where it's really helpful.

like image 107
Jerry Coffin Avatar answered Sep 28 '22 11:09

Jerry Coffin


As @Jerry Coffin said, it really sounds like your actual problem is a memory leak. Fix that.

But for the record, none of your "poor mans solutions" would work. At all.

Windows pages out some of your data because there's not room for it in RAM. Looping through the entire memory database would load in every byte of the data model, yes... which would cause other parts of it to be paged out. In the end, you'd generate a lot of page faults, and the only difference in the end would be which parts of the data structure are paged out.

Disabling the page file? Yes, if you think a hard crash is better than low performance. Windows doesn't page data out because it's fun. It does that to handle situations where it would otherwise run out of memory. If you disable the pagefile, the app will just crash when it would otherwise page out data.

If your dataset really is so big it doesn't fit in memory, then I don't see why an SQL database would be especially "expensive". Unlike your current solution, databases are optimized for this purpose. They're meant to handle datasets too large to fit in memory, and to do this efficiently.

It sounds like you have a memory leak. Fixing that would be the elegant, efficient and correct solution.

If you can't do that, then either

  • throw more RAM at the problem (the app ends up using 16GB? Throw 32 or 64GB at it then), or
  • switch to a format that's optimized for efficient disk access (A SQL database probably)
like image 29
jalf Avatar answered Sep 28 '22 12:09

jalf