Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plot time data in R to various resolutions (to the minute, to the hour, to the second, etc.)

I have some data in CSV like:

"Timestamp", "Count"
"2009-07-20 16:30:45", 10
"2009-07-20 16:30:45", 15
"2009-07-20 16:30:46", 8
"2009-07-20 16:30:46", 6
"2009-07-20 16:30:46", 8
"2009-07-20 16:30:47", 20

I can read it into R using read.cvs. I'd like to plot:

  1. Number of entries per second, so:
    "2009-07-20 16:30:45", 2
    "2009-07-20 16:30:46", 3
    "2009-07-20 16:30:47", 1
    
  2. Average value per second:
    "2009-07-20 16:30:45", 12.5
    "2009-07-20 16:30:46", 7.333
    "2009-07-20 16:30:47", 20
    
  3. Same as 1 & 2 but then by Minute and then by Hour.

Is there some way to do this (collect by second/min/etc & plot) in R?

like image 535
ayman Avatar asked Aug 10 '09 18:08

ayman


People also ask

How do you plot time series data?

To create a time series plot in Excel, first select the time (DateTime in this case) Column and then the data series (streamflow in this case) column. Next, click on the Insert ribbon, and then select Scatter. From scatter plot options, select Scatter with Smooth Lines as shown below.

Which of the following plot is used to compare the volume trend across time series?

Time plots A time plot is basically a line plot showing the evolution of the time series over time. We can use it as the starting point of the analysis to get some basic understanding of the data, for example, in terms of trend/seasonality/outliers, etc.

Which time scale can be used to plot the statistics?

Time series graphs are created by plotting an aggregated value (either a count or a statistic, such as sum or average) on a time line. The values are aggregated using time intervals based on the time range in the data being plotted. The following time intervals are used on time series graphs: One decade.


2 Answers

Read your data, and convert it into a zoo object:

R> X <- read.csv("/tmp/so.csv")
R> X <- zoo(X$Count, order.by=as.POSIXct(as.character(X[,1])))

Note that this will show warnings because of non-unique timestamps.

Task 1 using aggregate with length to count:

R> aggregate(X, force, length)
2009-07-20 16:30:45 2009-07-20 16:30:46 2009-07-20 16:30:47 
                  2                   3                   1 

Task 2 using aggregate:

R> aggregate(X, force, mean)
2009-07-20 16:30:45 2009-07-20 16:30:46 2009-07-20 16:30:47 
             12.500               7.333              20.000 

Task 3 can be done the same way by aggregating up to higher-order indices. You can call plot on the result from aggregate:

plot(aggregate(X, force, mean))
like image 187
Dirk Eddelbuettel Avatar answered Oct 19 '22 20:10

Dirk Eddelbuettel


Averaging the data is easy with the plyr package.

library(plyr)
Second <- ddply(dataset, "Timestamp", function(x){
    c(Average = mean(x$Count), N = nrow(x))
})

To do the same thing by minute or hour, then you need to add fields with that info.

library(chron)
dataset$Minute <- minutes(dataset$Timestamp)
dataset$Hour <- hours(dataset$Timestamp)
dataset$Day <- dates(dataset$Timestamp)
#aggregate by hour
Hour <- ddply(dataset, c("Day", "Hour"), function(x){
    c(Average = mean(x$Count), N = nrow(x))
})
#aggregate by minute
Minute <- ddply(dataset, c("Day", "Hour", "Minute"), function(x){
    c(Average = mean(x$Count), N = nrow(x))
})
like image 22
Thierry Avatar answered Oct 19 '22 19:10

Thierry