Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove rows conditionally from a data.table in R

Tags:

r

data.table

I have a data.table with fields {id, menuitem, amount}.

This is transaction data - so, ids are unique, but menuitem repeats. Now, I want to remove all entries where menuitem == 'coffee'.

Also, want to delete all rows where amount <= 0;

What is the right way to do this in data.table?

I can use data$menuitem!='coffee' and then index int into data[] - but that is not necessarily efficient and does not take advantage of data.table.

Any pointers in the right direction are appreciated.

like image 201
Gopalakrishna Palem Avatar asked Mar 26 '14 08:03

Gopalakrishna Palem


2 Answers

In this scenario it is not so different than data.frame

data <- data[ menuitem != 'coffee' | amount > 0]  

Delete/add row by reference it is to be implemented. You find more info in this question

Regarding speed:

1 You can benefit from keys by doing something like:

setkey(data, menuitem) data <- data[!"coffee"] 

which will be faster than data <- data[ menuitem != 'coffee']. However to apply the same filters you asked in the question you'll need a rolling join (I've finished my lunch break I can add something later :-)).

2 Even without key data.table is much faster for relatively big table (similar speed for handful amount of rows)

dt<-data.table(id=sample(letters,1000000,T),var=rnorm(1000000)) df<-data.frame(id=sample(letters,1000000,T),var=rnorm(1000000)) library(microbenchmark) > microbenchmark(dt[ id == "a"], df[ df$id == "a",]) Unit: milliseconds                expr       min        lq    median        uq       max neval       dt[id == "a"]  24.42193  25.74296  26.00996  26.35778  27.36355   100  df[df$id == "a", ] 138.17500 146.46729 147.38646 149.06766 154.10051   100 
like image 74
Michele Avatar answered Sep 25 '22 22:09

Michele


try this:

data <- data[ !(menuitem == 'coffee' | amount <= 0),]  

Generally:

dt <- data.table(a=c(1,1,1,2,2,2,3,3,3),b=c(4,2,3,1,5,3,4,7,6)) dt #>    a b #> 1: 1 4 #> 2: 1 2 #> 3: 1 3 #> 4: 2 1 #> 5: 2 5 #> 6: 2 3 #> 7: 3 4 #> 8: 3 7 #> 9: 3 6 dt[a!=1,] #>    a b #> 1: 2 1 #> 2: 2 5 #> 3: 2 3 #> 4: 3 4 #> 5: 3 7 #> 6: 3 6 
like image 36
song.xiao Avatar answered Sep 23 '22 22:09

song.xiao