Problems with ddply for splitting a large number of categories in R

Question

I recently asked a question about counting the number of times an element had repeated itself (http://stackoverflow.com/questions/7669553/how-to-assign-number-of-repeats-to-dataframe-based-on-elements-of-an-identifying/7669607#7669607) in a large data-frame. I received some very helpful advice, which worked on a small number of rows, but now need to perform the operation on a much larger level (over 255k rows, with around 100k "groups" being formed using ddply):

system.time( data <- ddply(data, "uid", function(x) {x$time <- 1:nrow(x); x}) ) #uid is the grouping variable, for which I need to count the number of repeats for output like

uid    time
ny1    1
ny1    2
ny2    1
ny2    2
ny2    3

Trying to perform this operation on the larger data set results in R choking due to memory issues. Are there any obvious solutions to this? Thanks in advance (especially for patience as I'm a new "programmer").

joran · Accepted Answer

For truly large problems like this, you might try using data.tables rather than plyr:

library(data.table)
data <- data.table(data)

data[,transform(.SD,time = NROW(.SD)), by = uid]

assuming the time column doesn't already exist.

I'm still in the process of learning data.table, so as I tinker with this it appears this may be simpler (and maybe faster):

data[,rep(.N, .N),by = uid]

.N appears to an internal variable that represents the number of rows of each subgroup.

nzcoops · Answer

I posted a new answer to your original question here How to assign number of repeats to dataframe based on elements of an identifying vector in R?.

That will hopefully help you there and here.

Problems with ddply for splitting a large number of categories in R

Tags:

r

large-data

transform

plyr

SMM

2 Answers

joran

nzcoops

Recent Activity

Donate For Us

Problems with ddply for splitting a large number of categories in R

Tags:

r

large-data

transform

plyr

SMM

2 Answers

joran

nzcoops

Related questions

Recent Activity

Donate For Us