Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert a data frame to a data.table without copy

I have a large data frame (in the order of several GB) that I'd like to convert to a data.table. Using as.data.table creates a copy of the data frame, which means I need available memory to be at least twice the size of the data. Is there a way to do the conversion without a copy?

Here's a simple example to demonstrate:

library(data.table) N <- 1e6 K <- 1e2 data <- as.data.frame(rep(data.frame(rnorm(N)), K))  gc(reset=TRUE) tracemem(data) data <- as.data.table(data) gc() 

With output:

library(data.table) # data.table 1.8.10  For help type: help("data.table") N <- 1e6 K <- 1e2 data <- as.data.frame(rep(data.frame(rnorm(N)), K))  gc(reset=TRUE) # used  (Mb) gc trigger   (Mb)  max used  (Mb) # Ncells    303759  16.3     597831   32.0    303759  16.3 # Vcells 100442572 766.4  402928632 3074.2 100442572 766.4 tracemem(data) # [1] "<0x363fda0>" data <- as.data.table(data) # tracemem[0x363fda0 -> 0x31e4260]: copy as.data.table.data.frame as.data.table  gc() # used  (Mb) gc trigger   (Mb)  max used   (Mb) # Ncells    304519  16.3     597831   32.0    306162   16.4 # Vcells 100444242 766.4  322342905 2459.3 200933219 1533.0 
like image 386
ytsaig Avatar asked Dec 03 '13 07:12

ytsaig


People also ask

How do I turn a data frame into a table?

Method 1 : Using setDT() method The setDT() method can be used to coerce the dataframe or the lists into data. table, where the conversion is made to the original dataframe. The modification is made by reference to the original data structure.

What does setDT do in R?

The setDT function takes care of this issue by allowing to convert lists - both named and unnamed lists and data. frames by reference instead. That is, the input object is modified in place, no copy is being made.


1 Answers

This is available from v1.9.0+. From NEWS:

o Following this S.O. post, a function setDT is now implemented that takes a list (named and/or unnamed), data.frame (or data.table) as input and returns the same object as a data.table by reference (without any copy). See ?setDT examples for more.

This is in accordance with data.table naming convention - all set* functions modifies by reference. := is the only other that also modifies by reference.

require(data.table) # v1.9.0+ setDT(data) # converts data which is a data.frame to data.table *by reference* 

See history for older (now outdated) answers.

like image 160
Arun Avatar answered Sep 21 '22 19:09

Arun