Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I manage pages or just lean on virtual memory?

I'm writing a database-style thing in C (i.e. it will store and operate on about 500,000 records). I'm going to be running it in a memory-constrained environment (VPS) so I don't want memory usage to balloon. I'm not going to be handling huge amounts of data - perhaps up to 200MB in total, but I want the memory footprint to remain in the region of 30MB (pulling these numbers out of the air).

My instinct is doing my own page handling (real databases do this), but I have received advice saying that I should just allocate it all and allow the OS to do the VM paging for me. My numbers will never rise above this order of magnitude. Which is the best choice in this case?

Assuming the second choice, at what point would it be sensible for a program to do its own paging? Obviously RDBMsses that can handle gigabytes must do this, but there must be a point along the scale at which the question is worth asking.

Thanks!

like image 935
Joe Avatar asked Jul 22 '10 10:07

Joe


2 Answers

Use malloc until it's running. Then and only then, start profiling. If you run into the same performance issues as the proprietary and mainstream "real databases", you will naturally begin to perform cache/page/alignment optimizations. These things can easily be slotted in after you have a working database, and are orthogonal to having a working database.

like image 81
Matt Joiner Avatar answered Sep 29 '22 14:09

Matt Joiner


The database management systems that perform their own paging also benefit from the investment of huge research efforts to make sure their paging algorithms function well under varying system and load conditions. Unless you have a similar set of resources at your disposal I'd recommend against taking that approach.

The OS paging system you have at your disposal has already benefit from tuning efforts of many people.

There are, however, some things you can do to tune your OS to benefit database type access (large sequential I/O operations) vs. the typical desktop tuning (mix of seq. and random I/O).

In short, if you are a one man team or a small team, you probably should make use of existing tools rather than trying to roll your own in that particular area.

like image 30
Amardeep AC9MF Avatar answered Sep 29 '22 13:09

Amardeep AC9MF