What is the correct way to profile memory in R code that contains calls to data.table
functions? Let's say I want to determine the maximum memory usage during an expression.
This reference indicates that Rprofmem
may not be the right choice:
https://cran.r-project.org/web/packages/profmem/vignettes/profmem.html
All memory allocations that are done via the native allocVector3() part of R's native API are logged, which means that nearly all memory allocations are logged. Any objects allocated this way are automatically deallocated by R's garbage collector at some point. Garbage collection events are not logged by profmem(). Allocations not logged are those done by non-R native libraries or R packages that use native code Calloc() / Free() for internal objects. Such objects are not handled by the R garbage collector.
The data.table source code contains plenty of calls to Calloc()
and malloc()
so this suggests that Rprofmem
will not measure all memory allocated by data.table
functions. If Rprofmem
is not the right tool, how come Matthew Dowle uses it here: R: loop over columns in data.table?
I've found a reference suggesting similar potential issues for gc()
(which can be used to measure maximum memory usage between two calls to gc()
):
https://r.789695.n4.nabble.com/Determining-the-maximum-memory-usage-of-a-function-td4669977.html
gc() is a good start. Call gc(reset = TRUE) before and gc() after your task, and you will see the maximum extra memory used by R in the interim. (This does not include memory malloced by compiled code, which is much harder to measure as it gets re-used.)
Nothing I've found suggests that similar issues exist with Rprof(memory.profiling=TRUE)
. Does this mean that the Rprof
approach will work for data.table
even though it doesn't always use the R API to allocate memory?
If Rprof(memory.profiling=TRUE)
in fact is not the right tool for the job, what is?
Would ssh.utils::mem.usage
work?
The data profile serves as a good data inspection tool and ensures that the data is valid and fit for further consumption. For small datasets that can be loaded into memory to be accessed using python or R, data profiling can be done fairly quickly.
These attributes of the data are a good starting point to understand what is contained in each column of the table and begin to get a sense of the distribution of data. The data profile serves as a good data inspection tool and ensures that the data is valid and fit for further consumption.
With the new native memory profiler finding memory leaks and understanding where memory is being held on to just got a little bit easier. Give the native memory profiler a try in Android Studio 4.1, and leave any feedback on our bug tracker.
After the application starts and the profile window opens, click on the memory profiler and select “record native allocation” First look at a native memory capture when it is loaded in Android Studio.
This is not related to data.table. Recently there was a discussion on twitter about same dplyr behaviour: https://mobile.twitter.com/healthandstats/status/1182840075001819136
/usr/bin/time -v Rscript -e 'library(data.table); CJ(1:1e4, 1:1e4)' |& grep resident
There is also interesting cgmemtime project, but it requires a little bit more setup.
If you are on Windows I suggest you to move to Linux.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With