Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast replacing values in dataframe in R

Tags:

I have a dataframe of 150,000 rows with 2,000 columns containing values, some being negatives. I am replacing those negative values by 0, but it is extremely slow to do so (~60min or more).

df[df < 0] = 0

where df[,1441:1453] looks like (all columns/values numeric):

  V1441 V1442 V1443 V1444 V1445 V1446 V1447 V1448 V1449 V1450 V1451 V1452 V1453
1     3     1     0     4     4    -2     0     3    12     5    17    34    27
2     0     1     0     7     0     0     0     1     0     0     0     0     0
3     0     2     0     1     2     3     6     1     2     1    -6     3     1
4     1     2     3     6     1     2     1    -6     3     1    -4     1     0
5     1     2     1    -6     3     1    -4     1     0     0     1     0     0
6     1     0     0     1     0     0     0     0     0     0     1     2     2

Is there a way to speed up such process, eg the way I am doing it is utterly slow, and there is faster approach for this ? Thanks.

like image 581
Benoit B. Avatar asked Oct 11 '12 09:10

Benoit B.


People also ask

How do I replace specific values in a column in R?

replace() function in R Language is used to replace the values in the specified string vector x with indices given in list by those given in values. It takes on three parameters first is the list name, then the index at which the element needs to be replaced, and the third parameter is the replacement values.

How do I change the value of dataset in R?

In the R Commander, you can click the Data set button to select a data set, and then click the Edit data set button. For more advanced data manipulation in R Commander, explore the Data menu, particularly the Data / Active data set and Data / Manage variables in active data set menus.

How do I replace a column in a Dataframe in R?

To replace the character column of dataframe in R, we use str_replace() function of “stringr” package. Let's see how to replace the character column of dataframe in R with an example.


2 Answers

Try transforming your df to a matrix.

df <- data.frame(a=rnorm(1000),b=rnorm(1000))
m <- as.matrix(df)
m[m<0] <- 0
df <- as.data.frame(m)
like image 150
Roland Avatar answered Oct 08 '22 17:10

Roland


Both your original approach and the current answer create an object the same size as m (or df) when creating m<0 (the matrix approach is quicker because there is less internal copying with [<- compared with [<-.data.frame

You can use lapply and replace, then you are only looking at a vector or length (nrow(df)) each time and not copying so much

df <- as.data.frame(lapply(df, function(x){replace(x, x <0,0)})

The above code should be quite effiicent.

If you use data.table, then most of the memory (and) time inefficiency of the data.frame approach is removed. It would be ideal for a large data situation like yours.

library(data.table)
# this really shouldn't be 
DT <- lapply(df, function(x){replace(x, x <0,0)})
# change to data.table
setattr(DT, 'class', c('data.table','data.frame'))
# or 
# DT <- as.data.table(df, function(x){replace(x, x <0,0)})

You could set keys on all the columns and then replacing by reference for key values less than 0

like image 39
mnel Avatar answered Oct 08 '22 19:10

mnel