This is similar to this dplyr lag post, and this dplyr mutate lag post, but neither of those ask this question about defaulting to the input value. I am using dplyr to mutate a new field that's a lagged offset of another field (that I've converted to POSIXct). The goal is, for a given ip, I'd like to know some summary statistics on the delta between all the times it shows up on my list. I also have about 12 million rows.
The data look like this (prior to mutation)
ip hour snap
192.168.1.2 2017070700 0
192.168.1.2 2017070700 15
192.168.1.4 2017070700 0
192.168.1.4 2017070701 45
192.168.1.4 2017070702 30
192.168.1.7 2017070700 15
'hour' is an integer, but should be a timestamp.
'snap' is one of 4 'snapshot' values that represent 15 minute increments.
Here's the data.frame creation code:
test <- data.frame(ip=c("192.168.1.2","192.168.1.2","192.168.1.4","192.168.1.4","192.168.1.4","192.168.1.7"), hour=c(2017070700,2017070700,2017070700,2017070701,2017070702,2017070700), snap=c(0,15,0,45,30,15))
There are hundreds and sometimes thousands of timestamps per ip. The code below uses dplyr to
These steps refer to the comments at the end of each line.
timedelta <- test %>%
mutate(snap = formatC(snap, width=2, flag=0)) %>% # a)
mutate(fulldateint = paste(hour, snap, sep="")) %>% # b)
mutate(fulldate = as.POSIXct(strptime(fulldateint, "%Y%m%d%H%M"))) %>% # c)
group_by(ip) %>% # d)
mutate(shifted = dplyr::lag(fulldate, default=fulldate)) %>% # e)
mutate(diff = fulldate-shifted) # f)
After mutation, the data should look like this:
ip hour snap fulldateint fulldate shifted diff
<fctr> <dbl> <chr> <chr> <dttm> <dttm> <time>
1 192.168.1.2 2017070700 00 201707070000 2017-07-07 00:00:00 2017-07-07 00:00:00 0 secs
2 192.168.1.2 2017070700 15 201707070015 2017-07-07 00:15:00 2017-07-07 00:00:00 900 secs
3 192.168.1.4 2017070700 00 201707070000 2017-07-07 00:00:00 2017-07-07 00:00:00 0 secs
4 192.168.1.4 2017070701 45 201707070145 2017-07-07 01:45:00 2017-07-07 00:00:00 6300 secs
5 192.168.1.4 2017070702 30 201707070230 2017-07-07 02:30:00 2017-07-07 01:45:00 2700 secs
6 192.168.1.7 2017070700 15 201707070015 2017-07-07 00:15:00 2017-07-07 00:15:00 0 secs
And if I could get lag to default to its original value, the 'delta-T' would always be 0 when it doesn't have a previous value (which is the desired result).
However, dplyr::lag(fulldate, default=fulldate)
throws the error
Error in mutate_impl(.data, dots) :
Column `shifted` must be length 2 (the group size) or one, not 3
It does work if I use fulldate1, but then I lose the group_by(ip)
result, which is necessary. Is it possible to make lag reference its own input within dplyr?
Note: I really would prefer an answer using dplyr and not data.table, if possible, since I've been using dplyr as our primary data munging library, but also since I'd like to suggest to Mr. Wickham that he take this under consideration if it truly has no solution in the existing dplyr library.
In R programming, the mutate function is used to create a new variable from a data set. In order to use the function, we need to install the dplyr package, which is an add-on to R that includes a host of cool functions for selecting, filtering, grouping, and arranging data.
Source: R/mutate.R. mutate.Rd. mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name. Variables can be removed by setting their value to NULL .
In the OP's code ...
... d) group_by(ip) %>% e) mutate(shifted = dplyr::lag(fulldate, default=fulldate)) %>% ...
The default=
argument should have a length of one. Replacing the OP's code with default = first(fulldate)
should work in this case (since the first element won't have a lag and so is where we need to apply the default value).
Related cases:
dplyr::lead(x, default=last(x))
. n
greater than 1), default=
cannot do it and we'd probably need to switch to if_else
or case_when
or similar. (I'm not sure about the current tidyverse idiom.)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With