I have daily data and I'd like to count the number of times per year the daily measurement is within a certain range. The data is also grouped by factor, so I need to know number of days per year for each factor that is within a certain range (e.g. 15 to 18)
I have a large dataset of over 100 years but here is some data just for this example:
Date <- seq(as.Date("2010/01/01"), by = "day", length.out = 1095)
People <- sample.int(20, 1095, replace = TRUE)
Country <- sample(x = c("Australia", "Canada", "France"), size = 1095, replace = TRUE)
mydf <- data.frame(Date, People, Country)
I would like to know the number of times per year the value of "People" is between 15 and 18 for each country.
So my output would be a new dataframe, something like:
myDate People Country
2010 45 Australia
2010 10 Canada
2010 24 France
2011 33 Australia
2011 100 Canada
2011 4 France
2012 21 Australia
2012 66 Canada
2012 211 France
Any help would be greatly appreciated as I'm struggling with this and have looked for answers but I can't find a solution that involves the date and factor.
And here is the requisite base solution. Key points: convert dates to character year values with format.Date
and the by-grouping needs to be a list-object:
aggregate( mydf['People'], list(mydf[['Country']], format(mydf$Date, "%Y") ),
FUN=function(d) sum( d >=15 & d <=18) )
Group.1 Group.2 People
1 Australia 2010 25
2 Canada 2010 22
3 France 2010 24
4 Australia 2011 27
5 Canada 2011 19
6 France 2011 33
7 Australia 2012 19
8 Canada 2012 33
9 France 2012 24
If you want the resulting dataframe to have different column names then add those to the list inside the by-group definition:
aggregate( mydf['People'], list(Cntry=mydf[['Country']], Yr=format(mydf$Date, "%Y") ),
function(d) sum( d >=15 & d <=18) )
Cntry Yr People
1 Australia 2010 25
2 Canada 2010 22
3 France 2010 24
4 Australia 2011 27
5 Canada 2011 19
6 France 2011 33
7 Australia 2012 19
8 Canada 2012 33
9 France 2012 24
You could achieve it with lubridate
and dplyr
. Use year()
to grab the year and group by the year and the country. The last step is a conditional summing up:
library(dplyr)
library(lubridate)
mydf %>%
group_by(year = year(Date), Country) %>%
summarise(p = sum(between(People, 15, 18)))
year Country p
<dbl> <fct> <int>
1 2010. Australia 22
2 2010. Canada 34
3 2010. France 26
4 2011. Australia 21
5 2011. Canada 30
6 2011. France 13
7 2012. Australia 28
8 2012. Canada 31
9 2012. France 23
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With