Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - counting number of daily values within a specific range per factor and summarising by year

Tags:

r

I have daily data and I'd like to count the number of times per year the daily measurement is within a certain range. The data is also grouped by factor, so I need to know number of days per year for each factor that is within a certain range (e.g. 15 to 18)

I have a large dataset of over 100 years but here is some data just for this example:

Date <- seq(as.Date("2010/01/01"), by = "day", length.out = 1095)
People <- sample.int(20, 1095, replace = TRUE)
Country <- sample(x = c("Australia", "Canada", "France"), size = 1095, replace = TRUE)

mydf <- data.frame(Date, People, Country)

I would like to know the number of times per year the value of "People" is between 15 and 18 for each country.

So my output would be a new dataframe, something like:

myDate    People    Country
2010      45        Australia
2010      10        Canada
2010      24        France
2011      33        Australia
2011      100       Canada
2011      4         France
2012      21        Australia
2012      66        Canada
2012      211       France

Any help would be greatly appreciated as I'm struggling with this and have looked for answers but I can't find a solution that involves the date and factor.

like image 252
helphelp Avatar asked Jan 02 '23 15:01

helphelp


2 Answers

And here is the requisite base solution. Key points: convert dates to character year values with format.Date and the by-grouping needs to be a list-object:

aggregate( mydf['People'], list(mydf[['Country']], format(mydf$Date, "%Y") ), 
                           FUN=function(d) sum( d >=15 & d <=18) )
    Group.1 Group.2 People
1 Australia    2010     25
2    Canada    2010     22
3    France    2010     24
4 Australia    2011     27
5    Canada    2011     19
6    France    2011     33
7 Australia    2012     19
8    Canada    2012     33
9    France    2012     24

If you want the resulting dataframe to have different column names then add those to the list inside the by-group definition:

 aggregate( mydf['People'], list(Cntry=mydf[['Country']], Yr=format(mydf$Date, "%Y") ), 
                function(d) sum( d >=15 & d <=18) )
      Cntry   Yr People
1 Australia 2010     25
2    Canada 2010     22
3    France 2010     24
4 Australia 2011     27
5    Canada 2011     19
6    France 2011     33
7 Australia 2012     19
8    Canada 2012     33
9    France 2012     24
like image 76
IRTFM Avatar answered May 08 '23 21:05

IRTFM


You could achieve it with lubridate and dplyr. Use year() to grab the year and group by the year and the country. The last step is a conditional summing up:

library(dplyr)
library(lubridate)

mydf %>%
  group_by(year = year(Date), Country) %>%
  summarise(p = sum(between(People, 15, 18)))


This could yield
   year Country       p
  <dbl> <fct>     <int>
1 2010. Australia    22
2 2010. Canada       34
3 2010. France       26
4 2011. Australia    21
5 2011. Canada       30
6 2011. France       13
7 2012. Australia    28
8 2012. Canada       31
9 2012. France       23
like image 34
Jan Avatar answered May 08 '23 23:05

Jan