Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does loading cached objects increase the memory consumption drastically when computing them will not?

Relevant background info

I've built a little software that can be customized via a config file. The config file is parsed and translated into a nested environment structure (e.g. .HIVE$db = an environment, .HIVE$db$user = "Horst", .HIVE$db$pw = "my password", .HIVE$regex$date = some regex for dates etc.)

I've built routines that can handle those nested environments (e.g. look up value "db/user" or "regex/date", change it etc.). The thing is that the initial parsing of the config files takes a long time and results in quite a big of an object (actually three to four, between 4 and 16 MB). So I thought "No problem, let's just cache them by saving the object(s) to .Rdata files". This works, but "loading" cached objects makes my Rterm process go through the roof with respect to RAM consumption (over 1 GB!!) and I still don't really understand why (this doesn't happen when I "compute" the object all anew, but that's exactly what I'm trying to avoid since it takes too long).

I already thought about maybe serializing it, but I haven't tested it as I would need to refactor my code a bit. Plus I'm not sure if it would affect the "loading back into R" part in just the same way as loading .Rdata files.

Question

Can anyone tell me why loading a previously computed object has such effects on memory consumption of my Rterm process (compared to computing it in every new process I start) and how best to avoid this?

If desired, I will also try to come up with an example, but it's a bit tricky to reproduce my exact scenario. Yet I'll try.

like image 836
Rappster Avatar asked Oct 31 '11 16:10

Rappster


People also ask

Why is memory utilization high for cache?

The reason Linux uses so much memory for disk cache is because the RAM is wasted if it isn't used. Keeping the cache means that if something needs the same data again, there's a good chance it will still be in the cache in memory.

Why does increasing the capacity of cache tend to increase its hit rate?

The larger the cache, the better the chances are that least recently accessed information remains in the cache and has not been automatically removed. The smaller the cache, the more likely that least recently accessed information will be removed from the cache.

How does cache affect computer performance?

Cache memory is a large determinant of system performance. The larger the cache, the more instructions can be queued and carried out. Storing instructions in cache reduces the amount of time it takes to access that instruction and pass it to a CPU core.

Why is caching used to increase performance?

A cache's primary purpose is to increase data retrieval performance by reducing the need to access the underlying slower storage layer. Trading off capacity for speed, a cache typically stores a subset of data transiently, in contrast to databases whose data is usually complete and durable.


1 Answers

Its likely because the environments you are creating are carrying around their ancestors. If you don't need the ancestor information then set the parents of such environments to emptyenv() (or just don't use environments if you don't need them).

Also note that formulas (and, of course, functions) have environments so watch out for those too.

like image 182
G. Grothendieck Avatar answered Sep 18 '22 14:09

G. Grothendieck