was trying to figure a way to use dplyr to count the number of occurrences for each id at each time 1 hour ahead. Tried using a for loop but it doesn't give me the desired result. Went through stack and tried looking for various methods but to no avail. Any advise or help is greatly appreciated. Thanks
Dataset: https://drive.google.com/file/d/1U186SeBWYyTnJVgUPmow7yknr6K9vu8i/view?usp=sharing
id date_time count
1 1 2019-12-27 00:00:00 NA
2 2 2019-12-27 00:00:00 NA
3 2 2019-12-27 00:55:00 NA
4 2 2019-12-27 01:00:00 NA
5 2 2019-12-28 01:00:00 NA
6 3 2019-12-27 22:00:00 NA
7 3 2019-12-27 22:31:00 NA
8 3 2019-12-28 14:32:00 NA
Desired Output
id date_time count
1 1 2019-12-27 00:00:00 1 #Count = 1 since there is no other cases 1 hour ahead but itself, only 1 case of id=1
2 2 2019-12-27 00:00:00 3 #Count = 3 as there are 3 cases from 00:00 to 01:00 on 27/12
3 2 2019-12-27 00:55:00 2 #Count = 2 as there are 2 cases from 00:55 to 01:55 on 27/12
4 2 2019-12-27 01:00:00 1 #Count = 1 as only itself from 01:00 to 02:00 on 27/12
5 2 2019-12-28 01:00:00 1 #Count = 1 as only itself from 01:00 to 02:00 on 28/12
6 3 2019-12-27 22:00:00 2
7 3 2019-12-27 22:31:00 1
8 3 2019-12-28 14:32:00 1
My codes (I'm stuck):
library(tidyverse)
data <- read.csv('test.csv')
data$date_time <- as.POSIXct(data$date_time)
data$count <- NA
data %>%
group_by(id) %>%
arrange(date_time, .by_group=TRUE)
#Doesn't give the desired output
for (i in 1:nrow(data)){
data$count[i] <- nrow(data[data$date_time<=data$date_time[i]+1*60*60 & data$date_time>=data$date_time[i],])
}
You can use the following syntax in R to count the number of occurrences of certain values in columns of a data frame: The following examples show how to use this syntax in practice with the following data frame: The following code shows how to count the number of occurrences of each value in the ‘team’ column: The team name ‘Mavs’ appears 2 times.
Note that dplyr is part of the Tidyverse package which can be installed. Installing the Tidyverse package will install a number of very handy and useful R packages. For example, we can use dplyr to remove columns, and remove duplicates in R.
You can use one of the following methods to count the number of distinct values in an R data frame using the n_distinct() function from dplyr: Method 1: Count Distinct Values in One Column. n_distinct(df$column_name) Method 2: Count Distinct Values in All Columns. sapply(df, function (x) n_distinct(x)) Method 3: Count Distinct Values by Group
In this example, we are going to use three R functions (i.e., from the dplyr package). First, we use the piping operator, again, and then we group the data by a column. After we have grouped the data we count the unique occurrences in the column, we have selected.
If OP is only looking for tidyverse
solution. I am happy to delete this.
Here is an approach using data.table
non-equi join:
DT[, onehrlater := date_time + 60*60]
DT[, count :=
DT[DT, on=.(id, date_time>=date_time, date_time<=onehrlater),
by=.EACHI, .N]$N
]
How to read this:
1) DT[, onehrlater := date_time + 60*60]
creates a new column of POSIX date time that is one hour later. :=
updates the original dataset by reference.
2) DT[DT, on=.(id, date_time>=date_time, date_time<=onehrlater)
performs a self non-equi join such that all rows with i) the same id, ii) date_time after this row's date_time and iii) date_time before this row's date_time one hour later are joined to this row.
3) by=.EACHI, .N
returns the count for each of those rows. And $N
accesses the output of this self non-equi join. And DT[, count := ...]
updates the original dataset by reference.
output:
id date_time onehrlater count
1: 1 2019-12-27 00:00:00 2019-12-27 01:00:00 1
2: 2 2019-12-27 00:00:00 2019-12-27 01:00:00 3
3: 2 2019-12-27 00:55:00 2019-12-27 01:55:00 2
4: 2 2019-12-27 01:00:00 2019-12-27 02:00:00 1
5: 2 2019-12-28 01:00:00 2019-12-28 02:00:00 1
6: 3 2019-12-27 22:00:00 2019-12-27 23:00:00 2
7: 3 2019-12-27 22:31:00 2019-12-27 23:31:00 1
8: 3 2019-12-28 14:32:00 2019-12-28 15:32:00 1
data:
library(data.table)
DT <- fread("id date_time
1 2019-12-27T00:00:00
2 2019-12-27T00:00:00
2 2019-12-27T00:55:00
2 2019-12-27T01:00:00
2 2019-12-28T01:00:00
3 2019-12-27T22:00:00
3 2019-12-27T22:31:00
3 2019-12-28T14:32:00")
DT[, date_time := as.POSIXct(date_time, format="%Y-%m-%dT%T")]
The question can be solved using a non-equi self join (in data.table
speak). Unfortunately, this is not yet available with dplyr
, AFAIK.
Here is an implementation using SQL:
library(sqldf)
sqldf("
select d1.id, d1.date_time, count(d2.date_time) as count
from dat as d1, dat as d2
where d1.id = d2.id and d1.date_time between d2.date_time and (d2.date_time + 60*60)
group by d2.id, d2.date_time")
id date_time count 1 1 2019-12-27 00:00:00 1 2 2 2019-12-27 00:00:00 3 3 2 2019-12-27 00:55:00 2 4 2 2019-12-27 01:00:00 1 5 2 2019-12-28 01:00:00 1 6 3 2019-12-27 22:00:00 2 7 3 2019-12-27 22:31:00 1 8 3 2019-12-28 14:32:00 1
# reading directly from google drive, see https://stackoverflow.com/a/33142446/3817004
dat <- data.table::fread(
"https://drive.google.com/uc?id=1U186SeBWYyTnJVgUPmow7yknr6K9vu8i&export=download")[
, date_time := anytime::anytime(date_time)]
Maybe fuzzyjoin
might be helpful here. You can create time ranges for each row of data (setting the end_time
to 3600 seconds or 1 hour after each time). Then, you can do a fuzzy join with itself, where the date_time
falls between this range to be counted as within the hour.
library(tidyverse)
library(fuzzyjoin)
df %>%
mutate(row_id = row_number(),
end_time = date_time + 3600) %>%
fuzzy_inner_join(df,
by = c("id", "date_time" = "date_time", "end_time" = "date_time"),
match_fun = list(`==`, `<=`, `>=`)) %>%
group_by(row_id) %>%
summarise(id = first(id.x),
date_time = first(date_time.x),
count = n())
Output
# A tibble: 8 x 4
row_id id date_time count
<int> <int> <dttm> <int>
1 1 1 2019-12-27 00:00:00 1
2 2 2 2019-12-27 00:00:00 3
3 3 2 2019-12-27 00:55:00 2
4 4 2 2019-12-27 01:00:00 1
5 5 2 2019-12-28 01:00:00 1
6 6 3 2019-12-27 22:00:00 2
7 7 3 2019-12-27 22:31:00 1
8 8 3 2019-12-28 14:32:00 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With