Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing time series with only zero values from a data frame

I have a data frame with multiple time series identified by uniquer id's. I would like to remove any time series that have only 0 values.

The data frame looks as follows,

id   date          value
AAA  2010/01/01    9
AAA  2010/01/02    10
AAA  2010/01/03    8
AAA  2010/01/04    4
AAA  2010/01/05    12
B    2010/01/01    0
B    2010/01/02    0
B    2010/01/03    0
B    2010/01/04    0
B    2010/01/05    0
CCC  2010/01/01    45
CCC  2010/01/02    46
CCC  2010/01/03    0
CCC  2010/01/04    0
CCC  2010/01/05    40

I want any time series with only 0 values to be removed so that the data frame look as follows,

id   date          value
AAA  2010/01/01    9
AAA  2010/01/02    10
AAA  2010/01/03    8
AAA  2010/01/04    4
AAA  2010/01/05    12
CCC  2010/01/01    45
CCC  2010/01/02    46
CCC  2010/01/03    0
CCC  2010/01/04    0
CCC  2010/01/05    40

This is a follow up to a previous question that was answered with a really great solution using the data.tables package.

R efficiently removing missing values from the start and end of multiple time series in 1 data frame

like image 964
sizeight Avatar asked Jan 17 '23 13:01

sizeight


2 Answers

If dat is a data.table, then this is easy to write and read :

dat[,.SD[any(value!=0)],by=id]

.SD stands for Subset of Data. This answer explains .SD very well.

Picking up on Gabor's nice use of ave, but without repeating the same variable name (DF) three times, which can be a source of typo bugs if you have a lot of long or similar variable names, try :

dat[ ave(value!=0,id,FUN=any) ]

The difference in speed between those two may be dependent on several factors including: i) number of groups ii) size of each group and iii) the number of columns in the real dat.

like image 144
Matt Dowle Avatar answered Jan 21 '23 14:01

Matt Dowle


Try this. No packages are used.

DF[ ave(DF$value != 0, DF$id, FUN = any), ]
like image 29
G. Grothendieck Avatar answered Jan 21 '23 14:01

G. Grothendieck