Why does loading cached objects increase the memory consumption drastically when computing them will not?

Relevant background info

I've built a little software that can be customized via a config file. The config file is parsed and translated into a nested environment structure (e.g. .HIVE$db = an environment, .HIVE$db$user = "Horst", .HIVE$db$pw = "my password", .HIVE$regex$date = some regex for dates etc.)

I've built routines that can handle those nested environments (e.g. look up value "db/user" or "regex/date", change it etc.). The thing is that the initial parsing of the config files takes a long time and results in quite a big of an object (actually three to four, between 4 and 16 MB). So I thought "No problem, let's just cache them by saving the object(s) to .Rdata files". This works, but "loading" cached objects makes my Rterm process go through the roof with respect to RAM consumption (over 1 GB!!) and I still don't really understand why (this doesn't happen when I "compute" the object all anew, but that's exactly what I'm trying to avoid since it takes too long).

I already thought about maybe serializing it, but I haven't tested it as I would need to refactor my code a bit. Plus I'm not sure if it would affect the "loading back into R" part in just the same way as loading .Rdata files.

Question

Can anyone tell me why loading a previously computed object has such effects on memory consumption of my Rterm process (compared to computing it in every new process I start) and how best to avoid this?

If desired, I will also try to come up with an example, but it's a bit tricky to reproduce my exact scenario. Yet I'll try.

836

asked Oct 31 '11 16:10

Rappster

1 Answers

Its likely because the environments you are creating are carrying around their ancestors. If you don't need the ancestor information then set the parents of such environments to emptyenv() (or just don't use environments if you don't need them).

Also note that formulas (and, of course, functions) have environments so watch out for those too.

182

answered Sep 18 '22 14:09

G. Grothendieck

Related questions
                            
                                How do I generate objects in std::vector and without copy?
                            
                                Why is LastOrDefault(predicate) in LINQ faster than FirstOrDefault(predicate)
                            
                                removing loops with numpy.einsum
                            
                                Code alignment dramatically affects performance
                            
                                iPhone Development - Lessons in memory management
                            
                                How do I stop custom performance counter instance names from being auto converted to lower case
                            
                                Timing Objective-C code
                            
                                Delay rendering of dom element when changing properties
                            
                                Manually implementing high performance algorithms in .NET
                            
                                Is it right to write multiple and separate <script > on a page?
                            
                                Fastest sorting algorithm for a specific situation
                            
                                Are sessions faster than querying the database?
                            
                                In Java (1.5 or later), what is the best performing way to fetch an (any) element from a Set?
                            
                                Alternative to Observer Pattern
                            
                                What do I need to know when maintaining a Java app with a large number of threads?
                            
                                Performance of accessing dataSet fields using Field-names instead of indexes
                            
                                Calling Python functions from inline C with scipy.weave
                            
                                FullCalendar - What level of events rendering performance should I expect?
                            
                                How could a Ruby process put limit to its CPU usage?
                            
                                PHP file uploads. POST vs PUT?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does loading cached objects increase the memory consumption drastically when computing them will not?

Tags:

performance

memory

caching

r