Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Number of Observations by Day in R

Tags:

r

I am working with a dataframe that looks like this:

date<-c("2012-02-01", "2012-02-01", "2012-02-03", "2012-02-04", "2012-02-04", "2012-02-05", "2012-02-09", "2012-02-12", "2012-02-12")
var<-c("a","b","c","d","e","f","g","h","i")
df1<-data.frame(date,var)

I would like to create a second dataframe that will tabulate the number of observations I have each day. In that dataframe, the dates that are not mentioned would get a zero...resulting in something like this:

date<-c("2012-02-01","2012-02-02","2012-02-03","2012-02-04","2012-02-05","2012-02-06","2012-02-07","2012-02-08","2012-02-09","2012-02-10","2012-02-11","2012-02-12")
num<-c(2,0,1,2,1,0,0,0,1,0,0,2)
df2<-data.frame(date,num)

I have tried a number of things with the aggregate function, but can't figure out how to include the dates with no observations (the zeros).

like image 282
Joe Ripberger Avatar asked Oct 30 '12 00:10

Joe Ripberger


1 Answers

Here an approach using data.table

library(data.table)
DF1 <- as.data.table(df1)
# coerce date to a date object
DF1[, date := as.IDate(as.character(date), format = '%Y-%m-%d')]
# setkey for joining
setkey(DF1, date)

# create a data.table that matches with a data.table containing
# a sequence from the minimum date to the maximum date
# nomatch = NA includes those non-matching. 
# .N is the number of rows in the subset data.frame
# this is 0 when there are no matches 
DF2 <- DF1[J(DF1[,seq(min(date), max(date), by = 1)]), .N, nomatch = NA]
DF2

          date N
 1: 2012-02-01 2
 2: 2012-02-02 0
 3: 2012-02-03 1
 4: 2012-02-04 2
 5: 2012-02-05 1
 6: 2012-02-06 0
 7: 2012-02-07 0
 8: 2012-02-08 0
 9: 2012-02-09 1
10: 2012-02-10 0
11: 2012-02-11 0
12: 2012-02-12 2

An approach using reshape2::dcast

If you ensure that your date column has levels for every day that you wish to tabulate

df1$date <- with(df1, factor(date, levels = as.character(seq(min(as.Date(as.character(date))), max(as.Date(as.character(date))), by = 1 ))))


df2 <- dcast(df1, date~., drop = FALSE)
like image 81
mnel Avatar answered Nov 09 '22 06:11

mnel