Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add the condition that there is a minimum amount of data to calculate a variable in `data.table`

Tags:

r

data.table

I have two dataframes. The first one df summarises detections of several individuals (ID) of a fish species overtime (DateTime). As an example:

options("digits.secs" = 3)

df<- data.frame(DateTime=c("2017-08-05 14:03:55.300","2017-08-05 16:18:12.100","2017-08-05 20:34:31.540","2017-08-05 16:18:14.355","2017-08-05 20:34:33.605"),
    ID= c("A","B","C","B","C"))

df
                 DateTime ID
1 2017-08-05 14:03:55.300  A
2 2017-08-05 16:18:12.100  B
3 2017-08-05 20:34:31.540  C
4 2017-08-05 16:18:14.355  B
5 2017-08-05 20:34:33.605  C

The other dataframe Activity has activity info about those individuals over time. This data is at high temporal resolution. That is, it is 11 data per second (11 hertz). As a reproducible example:

set.seed(100)
fmt <- "%Y-%m-%d %H:%M:%OS"

DateTime = seq(from=as.POSIXct("2017-08-05 14:03:55.100", format=fmt, tz="UTC"), by=1/11, length.out=80)
ID = rep("A", each=80)
x= sample(seq(from = -1, to = 1, by = 0.01), size = 80, replace = TRUE)
y= sample(seq(from = -1, to = 1, by = 0.01), size = 80, replace = TRUE)
z= sample(seq(from = -1, to = 1, by = 0.01), size = 80, replace = TRUE)
Activity1<- data.frame(DateTime,ID, x, y, z)

DateTime = seq(from=as.POSIXct("2017-08-05 16:18:11.900", format=fmt, tz="UTC"),by=1/5, length.out=40)
ID = rep("B", each=40)
x= sample(seq(from = -1, to = 1, by = 0.01), size = 40, replace = TRUE)
y= sample(seq(from = -1, to = 1, by = 0.01), size = 40, replace = TRUE)
z= sample(seq(from = -1, to = 1, by = 0.01), size = 40, replace = TRUE)
Activity2<- data.frame(DateTime,ID, x, y, z)

DateTime = seq(from=as.POSIXct("2017-08-05 16:18:19.703", format=fmt, tz="UTC"),by=1/11, length.out=40)
ID = rep("B", each=40)
x= sample(seq(from = -1, to = 1, by = 0.01), size = 40, replace = TRUE)
y= sample(seq(from = -1, to = 1, by = 0.01), size = 40, replace = TRUE)
z= sample(seq(from = -1, to = 1, by = 0.01), size = 40, replace = TRUE)
Activity3<- data.frame(DateTime,ID, x, y, z)

DateTime = seq(from=as.POSIXct("2017-08-05 20:34:31.240", format=fmt, tz="UTC"),by=1/11, length.out=80)
ID = rep("C", each=80)
x= sample(seq(from = -1, to = 1, by = 0.01), size = 80, replace = TRUE)
y= sample(seq(from = -1, to = 1, by = 0.01), size = 80, replace = TRUE)
z= sample(seq(from = -1, to = 1, by = 0.01), size = 80, replace = TRUE)
Activity4<- data.frame(DateTime,ID, x, y, z)
Activity<- rbind(Activity1,Activity2,Activity3,Activity4)

head(Activity)
                 DateTime ID     x     y     z
1 2017-08-05 14:03:55.099  A  0.01 -0.16 -1.00
2 2017-08-05 14:03:55.190  A  0.11  0.55 -0.69
3 2017-08-05 14:03:55.281  A  0.50  0.79  1.00
4 2017-08-05 14:03:55.372  A  0.97 -0.76  0.24
5 2017-08-05 14:03:55.463  A -0.97 -0.59  0.20
6 2017-08-05 14:03:55.554  A -0.46  0.42 -0.88

I am using the code below to calculate the variables VeDBA and RMS in df. I use data from Activity to calculate them, and I add those variables to the dataframe df. As a summary, for each row of df data, I calculate VeDBA and RMS using 2 seconds of data (which is 22 rows since I have 11 data per second in the dataframe Activity) from the Activity data frame and using the DateTime of the data frame df as the start time.

library(data.table)
setDT(df)[, DateTime := as.POSIXct(DateTime, format=fmt, tz="UTC")][,
    c("start", "end") := .(DateTime, DateTime+2)]
setDT(Activity)[, DateTime := as.POSIXct(DateTime, format=fmt, tz="UTC")]

df<- Activity[df, on=.(ID, DateTime>=start, DateTime<=end),
    by=.EACHI, .(
        DateTime=i.DateTime,
        ID=i.ID, 
        VeDBA=sum(sqrt(x^2 + y^2 + z^2)) / .N,
        RMS=sqrt((sum(x^2) + sum(y^2) + sum(z^2)) / .N))][, 
            (1L:3L) := NULL][]

The problem is that sometimes I have 5 data per second instead of 11 in the dataframe Activity. So I want to include some kind of code that indicates than, for instance, when I have less than 14 data in the mentioned 2 second period, I want to appear NA for VeDBA and RMS in df.

With the mentioned code I get this so far:

df
                  DateTime ID     VeDBA       RMS
1: 2017-08-05 14:03:55.299  A 0.9919576 1.0264458
2: 2017-08-05 16:18:12.099  B 0.9375138 0.9573975
3: 2017-08-05 20:34:31.539  C 0.9294209 0.9764383
4: 2017-08-05 16:18:14.355  B 0.7542922 0.7886634
5: 2017-08-05 20:34:33.605  C 1.0041628 1.0395891

And I would like to get this:

df
                  DateTime ID     VeDBA       RMS
1: 2017-08-05 14:03:55.299  A 0.9919576 1.0264458
2: 2017-08-05 16:18:12.099  B NA        NA        # Between 16:18:12.099 and 16:18:14.099 there is only 10 data instead of 22
3: 2017-08-05 20:34:31.539  C 0.9294209 0.9764383
4: 2017-08-05 16:18:14.355  B NA        NA
5: 2017-08-05 20:34:33.605  C 1.0041628 1.0395891

Does anyone know how to modify the code I am using with data.table to get those NAs?

like image 261
Dekike Avatar asked Dec 04 '25 00:12

Dekike


1 Answers

The modification below returns NA if .N is below the given threshhold n_min:

n_min <- 14L
Activity[df, on = .(ID, DateTime >= start, DateTime <= end),
         by = .EACHI, .(
           DateTime = i.DateTime,
           ID = i.ID,
           .N,  # inserted just to verify the result, to be omitted in production code
           VeDBA = if (.N < n_min) NA_real_ else sum(sqrt(x ^ 2 + y ^ 2 + z ^ 2)) / .N,
           RMS = if (.N < n_min) NA_real_ else sqrt((sum(x ^ 2) + sum(y ^ 2) + sum(z ^ 2)) / .N)
         )][,
            (1L:3L) := NULL][]
                  DateTime ID  N     VeDBA       RMS
1: 2017-08-05 14:03:55.299  A 22 0.8777660 0.9181305
2: 2017-08-05 16:18:12.099  B 10        NA        NA
3: 2017-08-05 20:34:31.539  C 22 0.8807835 0.9383084
4: 2017-08-05 16:18:14.355  B 10        NA        NA
5: 2017-08-05 20:34:33.605  C 22 1.0587765 1.1023549

Please, note that column N has been inserted just to verify the result. Also note that the output differs from OP's expected result although set.seed(100) was used to create the data.

if() can be used here because there is only one .N value per .EACHI group.

like image 157
Uwe Avatar answered Dec 06 '25 14:12

Uwe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!