I have a dataframe of approximately 10 million rows spanning about 570 days. After using striptime to convert the dates and times, the data looks like this:
date X1
1 2004-01-01 07:43:00 1.2587
2 2004-01-01 07:47:52 1.2585
3 2004-01-01 17:46:14 1.2586
4 2004-01-01 17:56:08 1.2585
5 2004-01-01 17:56:15 1.2585
I would like to compute the average value on each day (as in days of the year, not days of the week) and then plot them. Eg. Get all rows which have day "2004-01-01", compute average price, then do the same for "2004-01-2" and so on.
Similarly I would be interested in finding the average monthly value, or hourly price, but I imagine I can work these out once I know how to get average daily price.
My biggest difficulty here is extracting the day of the year from the date variable automatically. How can I cycle through all 365 days and compute the average value for each day, storing it in a list?
I was able to find the average value for day of the week using the weekdays() function, but I couldn't find anything similar for this.
Here's a solution using dplyr
and lubridate
. First, simplify the date by rounding it down to the nearest day-unit using floor_date
(see below comment by thelatemail), then group_by
date and calculate the mean value using summarize
:
library(dplyr)
library(lubridate)
df %>%
mutate(date = floor_date(date)) %>%
group_by(date) %>%
summarize(mean_X1 = mean(X1))
Using the lubridate
package, you can use a similar method to get the average by month, week, or hour. For example, to calculate the average by month:
df %>%
mutate(date = month(date)) %>%
group_by(date) %>%
summarize(mean_X1 = mean(X1))
And by hour:
df %>%
mutate(date = hour(date)) %>%
group_by(date) %>%
summarize(mean_X1 = mean(X1))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With