Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate data by equally spaced time intervals in R

My dataset is something like this

Section Time  x
s3      9:35  2
s4      9:35  2
s1      9:36  1
s2     10:01  1
s8     11:00  2

So, I want to group the data section wise on hourly interval and sum up the x values that lies in that interval

My expected output is

 sec     Time          x
 s1      9:00-10:00    1
 s2      9:00-10:00    0
 s3      9:00-10:00    2
 s4      9:00-10:00    2
 s8      9:00-10:00    0
 s1      10.00-11.00   0
 s2      10.00-11.00   1
 s3      10.00-11.00   0
 s4      10.00-11.00   0
 s8      10.00-11.00   1   

I tried to get some help from this post in stack overflow, but I am getting the following error for my this query. Here x is my frame

data.frame(value = tapply(cbind(x$x),
                     list(sec= x$section,cut(x$Time, breaks="1 hour")),
                       sum))

Error in cut.default(x$Time, breaks = "1 hour") : 'x' must be numeric

I am not even sure if that is right or wrong. I never worked with time data in R. So any help on how can I achieve that would be a great help.

like image 968
user3050590 Avatar asked Feb 10 '23 19:02

user3050590


2 Answers

I think the problem lies in the fact that your Time column is in a character format ?

Anyway, here is a quick and dirty approach using dplyr :

library(dplyr)
x <- data.frame(section = c("s3", "s4", "s1", "s2", "s8", "s1", "s2", "s3"), 
            Time = c("9:35", "9:35", "9:36", "10:01", "11:00", "9:45", "10:05", "10:05"), 
            x = c(2, 2, 1, 1, 2, 6, 2, 4), stringsAsFactors = FALSE)
x %>% 
  rowwise %>% 
  mutate(aux = as.numeric(strsplit(Time, ":")[[1]][1]),
         time = paste0(aux, ":00-", aux+1, ":00")) %>% 
  select(-aux, -Time) %>% 
  ungroup %>% 
  group_by(time, section) %>% 
  summarise(x = sum(x)) %>% 
  ungroup
like image 79
Tutuchan Avatar answered Feb 12 '23 08:02

Tutuchan


Here is an alternative version:

m1 <- as.data.frame(matrix(c("s3","9:35",2,"s4","9:35",2,"s1","9:36",1,"s2","10:01",1,"s8","11:00",2),byrow=T,ncol=3))
colnames(m1) <- c("Section", "Time","x")
times <- as.character(m1$Time)
hours <- sapply(sapply(times,function(x)strsplit(x,":")),function(x)x[1])
small_hours <- hours[which(as.numeric(hours)<10)]
small_hours <- sapply(small_hours,function(x) paste0("0",x))
hours[which(as.numeric(hours)<10)]<-small_hours
hour_range <- sapply(hours,function(x) paste0(x,":00-",as.character(as.numeric(x)+1),":00"))
m1$Time <- hour_range
m1$x <- as.numeric(m1$x)
require (plyr)
m1 <- ddply(m1, .(Time,Section), summarise, x = sum(x))
m1 <- m1[,c("Section","Time","x")]

This gives the following data frame:

> m1
  Section        Time x
1      s1 09:00-10:00 1
2      s3 09:00-10:00 2
3      s4 09:00-10:00 2
4      s2 10:00-11:00 1
5      s8 11:00-12:00 2

The trick here is, like in @Tutuchan's suggestion, to ignore that the times are actually times like in a POSIXct object, but to treat them instead simply as charachter strings. I hope this helps.

Update / Edit

As I mentioned previously in a comment, my former version of the code did not perform the requested sum of x over equal Sections falling into the same time frame. This is corrected in the updated version posted above, but I decided to give up trying to do all this in base R. Eventually, I used the plyr package.

like image 44
RHertel Avatar answered Feb 12 '23 09:02

RHertel