I have a data frame with multiple time series identified by uniquer id's. I would like to remove any time series that have only 0 values.
The data frame looks as follows,
id date value
AAA 2010/01/01 9
AAA 2010/01/02 10
AAA 2010/01/03 8
AAA 2010/01/04 4
AAA 2010/01/05 12
B 2010/01/01 0
B 2010/01/02 0
B 2010/01/03 0
B 2010/01/04 0
B 2010/01/05 0
CCC 2010/01/01 45
CCC 2010/01/02 46
CCC 2010/01/03 0
CCC 2010/01/04 0
CCC 2010/01/05 40
I want any time series with only 0 values to be removed so that the data frame look as follows,
id date value
AAA 2010/01/01 9
AAA 2010/01/02 10
AAA 2010/01/03 8
AAA 2010/01/04 4
AAA 2010/01/05 12
CCC 2010/01/01 45
CCC 2010/01/02 46
CCC 2010/01/03 0
CCC 2010/01/04 0
CCC 2010/01/05 40
This is a follow up to a previous question that was answered with a really great solution using the data.tables package.
R efficiently removing missing values from the start and end of multiple time series in 1 data frame
If dat
is a data.table
, then this is easy to write and read :
dat[,.SD[any(value!=0)],by=id]
.SD
stands for Subset of Data. This answer explains .SD
very well.
Picking up on Gabor's nice use of ave
, but without repeating the same variable name (DF
) three times, which can be a source of typo bugs if you have a lot of long or similar variable names, try :
dat[ ave(value!=0,id,FUN=any) ]
The difference in speed between those two may be dependent on several factors including: i) number of groups ii) size of each group and iii) the number of columns in the real dat
.
Try this. No packages are used.
DF[ ave(DF$value != 0, DF$id, FUN = any), ]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With