Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

summarizing temperature data based on a vector of temperature thresholds

Tags:

r

dplyr

I have a data frame with daily average temperature data in it, structured like so:

 'data.frame':  4666 obs. of  6 variables:
 $ Site : chr  "EB" "FFCE" "IB" "FFCE" ...
 $ Date : Date, format: "2013-01-01" "2013-01-01" "2013-01-01" "2014-01-01" ... 
 $ Day  : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Year : int  2013 2013 2013 2014 2014 2014 2014 2015 2015 2015 ...
 $ Month: int  1 1 1 1 1 1 1 1 1 1 ...
 $ Temp : num  28.5 28.3 28.3 27 27.8 ...

i am attempting to produce a summary table which just totals the number of days in a year per site above certain temperatures thresholds e.g 25c, 26c. i can achieve this manually by using dplyr like so-

Days_above = Site_Daily_average %>% 
  group_by(Year, Site) %>% 
  summarise("23" = sum(Temp > 23), "24" = sum(Temp > 24),"25"= sum(Temp > 
25), "26"= sum(Temp > 26),  "27"= sum(Temp > 27), "28"= sum(Temp > 28), "29" 
= sum(Temp > 29),"30"= sum(Temp > 30), "31" = sum(Temp > 31), "ABOVE 
THRESHOLD" = sum(Temp > maxthreshold))%>% as.data.frame()  

Which produces a table like so :

   Year Site  23  24  25  26  27  28  29 30 31 ABOVE THRESHOLD
1  2012   EB 142 142 142  91  64  22   0  0  0               0
2  2012 FFCE 238 238 238 210 119  64   0  0  0               0
3  2012   IB 238 238 238 218 138  87   1  0  0               0
4  2013   EB 115 115 115 115 115 109  44  0  0               0
5  2013 FFCE 223 223 216 197 148 114  94  0  0               0
6  2013   IB 365 365 365 348 299 194 135  3  0               0

...

however, as you can see the code is fairly verbose. The problem i am having is producing this same output for a sequence of temperature thresholds, i.e Tempclasses = Seq(16,32,0.25).

As you can see that would take a long time to type that out manually. i feel like this is a very simple calculation and there should be way to use dplyr to recognize each variable in the sequence vector, perform this function and produce an output in a complete table format. sorry if that was unclear as i am relatively new to R, any suggestions would be welcome, thankyou.

like image 246
K.west Avatar asked May 22 '18 02:05

K.west


1 Answers

Here's a tidyverse approach, likewise using mtcars for illustration:

library(tidyverse)

mtcars %>% 
  mutate(threshold = cut(mpg, 
                         breaks=seq(10, max(mtcars$mpg)+10, 5), 
                         labels=seq(10, max(mtcars$mpg)+5, 5))) %>% 
  group_by(cyl, threshold) %>% 
  tally %>% 
  ungroup %>% 
  complete(threshold, nesting(cyl), fill=list(n=0)) %>% 
  arrange(desc(threshold)) %>% 
  group_by(cyl) %>% 
  mutate(N_above = cumsum(n)) %>% 
  select(-n) %>% 
  arrange(cyl, threshold)
   threshold cyl N_above
1         10   4      11
2         15   4      11
3         20   4      11
4         25   4       6
5         30   4       4
6         35   4       0
7         10   6       7
8         15   6       7
9         20   6       3
10        25   6       0
11        30   6       0
12        35   6       0
13        10   8      14
14        15   8       8
15        20   8       0
16        25   8       0
17        30   8       0
18        35   8       0

If you want the final data in wide format, add a spread at the end and remove the arrange:

... %>%
select(-n) %>% 
spread(threshold, N_above)
  cyl 10 15 20 25 30 35
1   4 11 11 11  6  4  0
2   6  7  7  3  0  0  0
3   8 14  8  0  0  0  0
like image 86
eipi10 Avatar answered Oct 23 '22 15:10

eipi10