I have some data in CSV like: <pre class="prettyprint"><code>"Timestamp", "Count" "2009-07-20 16:30:45", 10 "2009-07-20 16:30:45", 15 "2009-07-20 16:30:46", 8 "2009-07-20 16:30:46", 6 "2009-07-20 16:30:46", 8 "2009-07-20 16:30:47", 20 </code></pre> I can read it into R using read.cvs. I'd like to plot: <ol> <li>Number of entries per second, so: <pre class="prettyprint"> "2009-07-20 16:30:45", 2 "2009-07-20 16:30:46", 3 "2009-07-20 16:30:47", 1 </pre> </li> <li>Average value per second: <pre class="prettyprint"> "2009-07-20 16:30:45", 12.5 "2009-07-20 16:30:46", 7.333 "2009-07-20 16:30:47", 20 </pre> </li> <li>Same as 1 & 2 but then by Minute and then by Hour.</li> </ol> Is there some way to do this (collect by second/min/etc & plot) in R?

Averaging the data is easy with the plyr package. <pre class="prettyprint"><code>library(plyr) Second <- ddply(dataset, "Timestamp", function(x){ c(Average = mean(x$Count), N = nrow(x)) }) </code></pre> To do the same thing by minute or hour, then you need to add fields with that info. <pre class="prettyprint"><code>library(chron) dataset$Minute <- minutes(dataset$Timestamp) dataset$Hour <- hours(dataset$Timestamp) dataset$Day <- dates(dataset$Timestamp) #aggregate by hour Hour <- ddply(dataset, c("Day", "Hour"), function(x){ c(Average = mean(x$Count), N = nrow(x)) }) #aggregate by minute Minute <- ddply(dataset, c("Day", "Hour", "Minute"), function(x){ c(Average = mean(x$Count), N = nrow(x)) }) </code></pre>

Plot time data in R to various resolutions (to the minute, to the hour, to the second, etc.)

Tags:

time

plot

r

statistics

I have some data in CSV like:

"Timestamp", "Count"
"2009-07-20 16:30:45", 10
"2009-07-20 16:30:45", 15
"2009-07-20 16:30:46", 8
"2009-07-20 16:30:46", 6
"2009-07-20 16:30:46", 8
"2009-07-20 16:30:47", 20

I can read it into R using read.cvs. I'd like to plot:

Number of entries per second, so:

"2009-07-20 16:30:45", 2
"2009-07-20 16:30:46", 3
"2009-07-20 16:30:47", 1

Average value per second:

"2009-07-20 16:30:45", 12.5
"2009-07-20 16:30:46", 7.333
"2009-07-20 16:30:47", 20

Same as 1 & 2 but then by Minute and then by Hour.

Is there some way to do this (collect by second/min/etc & plot) in R?

535

asked Aug 10 '09 18:08

ayman

2 Answers

Read your data, and convert it into a zoo object:

R> X <- read.csv("/tmp/so.csv")
R> X <- zoo(X$Count, order.by=as.POSIXct(as.character(X[,1])))

Note that this will show warnings because of non-unique timestamps.

Task 1 using aggregate with length to count:

R> aggregate(X, force, length)
2009-07-20 16:30:45 2009-07-20 16:30:46 2009-07-20 16:30:47 
                  2                   3                   1

Task 2 using aggregate:

R> aggregate(X, force, mean)
2009-07-20 16:30:45 2009-07-20 16:30:46 2009-07-20 16:30:47 
             12.500               7.333              20.000

Task 3 can be done the same way by aggregating up to higher-order indices. You can call plot on the result from aggregate:

plot(aggregate(X, force, mean))

187

answered Oct 19 '22 20:10

Dirk Eddelbuettel

Averaging the data is easy with the plyr package.

library(plyr)
Second <- ddply(dataset, "Timestamp", function(x){
    c(Average = mean(x$Count), N = nrow(x))
})

To do the same thing by minute or hour, then you need to add fields with that info.

library(chron)
dataset$Minute <- minutes(dataset$Timestamp)
dataset$Hour <- hours(dataset$Timestamp)
dataset$Day <- dates(dataset$Timestamp)
#aggregate by hour
Hour <- ddply(dataset, c("Day", "Hour"), function(x){
    c(Average = mean(x$Count), N = nrow(x))
})
#aggregate by minute
Minute <- ddply(dataset, c("Day", "Hour", "Minute"), function(x){
    c(Average = mean(x$Count), N = nrow(x))
})

answered Oct 19 '22 19:10

Thierry

Related questions
                            
                                Evaluating R code in YAML header
                            
                                Include "All other functions" in a pkgdown reference yaml
                            
                                Using LASSO in R with categorical variables
                            
                                Simple way to visualise odds ratios in R
                            
                                Does dplyr::mutate() not recycle vectors?
                            
                                Properly License R Package that Includes Other MIT Code
                            
                                R equivalent of microbenchmark that includes memory as well as runtime
                            
                                Install keras and tensorflow using Rstudio
                            
                                Create n by n matrix with unique values from 1:n
                            
                                Package build ignores Makevars flags
                            
                                R: Finding the intersect of two lines
                            
                                Checkboxes in DT shiny
                            
                                updating Rgdal in R.3.5.1 C++11 dependency... although C++11 is available
                            
                                Using ggplot to plot line segments and points together
                            
                                Unexpected behaviour in ggplot2 pie chart labeling
                            
                                Creating geom / stat from scratch
                            
                                How to use saveRDS(..., refhook = ) parameter?
                            
                                How to efficiently sort the characters in a string in R?
                            
                                Why do two references to the same vector return different memory addresses for each element of the vector?
                            
                                Pivoting wide to long format and then nesting columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With