Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count values higher than a certain threshold by group

I have quite a huge historical meteo station csv dataset (daily wind speed data from a set of weather stations for a region) and I would need to compute the average number of days per month in which wind speed is higher than 6 m/s for each meteo station. The stations does not contain data for the same number of years. An example of the dataset is shown below.

head(windspeed_PR)

  STN    Year Month Day WDSP WDSP.ms
1 860110 1974     6  19  9.3   4.784
2 860110 1974     7  13 19.0   9.774
3 860110 1974     7  22  9.9   5.093
4 860110 1974     8  20  9.5   4.887
5 860110 1974     9  10  3.3   1.698
6 860110 1974    10  10  6.6   3.395

Therefore, I basically would need to count how many WDPS.ms values are higher than 6 for each Month of the Year and each station (STN), and then calculate the average number of days per month per meteo station

Could I please have suggestions on how to compute this value (preferibly in R)?

like image 648
Xavier de Lamo Avatar asked May 20 '15 22:05

Xavier de Lamo


1 Answers

This is fairly straightforward.

Using dplyr:

library(dplyr)
windspeed_PR %>%
    group_by(STN, Year, Month) %>%
    summarize(n_days = n(),
              n_gt6 = sum(WDSP.ms > 6),
              p_gt6 = n_gt6 / n_days)

This will return, for each station, year, month, the number of measurements, the number of measurements greater than 6, and their quotient (the proportion of measurements greater than 6).

It's not clear to me from you question if you want this further summarized (say, collapsing years), but it should form a good starting place for any additional work.

like image 91
Gregor Thomas Avatar answered Nov 15 '22 05:11

Gregor Thomas