I need help identifying the longest consecutive sequence of values (=1) within groups of observations in R.
I have data on monthly rainfall in towns. I need to identify the longest period in each year when the monthly rainfall is above the annual average (rain_above = 1). If there are two periods of equal length in each year, I would like to identify the period with the largest total rainfall.
Some example data:
df1 <- data.frame(cbind(town=c("A","A","A","A","A","A","A","A","A","A","A","A",
"A","A","A","A","A","A","A","A","A","A","A","A",
"B","B","B","B","B","B","B","B","B","B","B","B",
"B","B","B","B","B","B","B","B","B","B","B","B"),
year=c(2000,2000,2000,2000,2000,2000,2000,2000,2000,2000,2000,2000,
2001,2001,2001,2001,2001,2001,2001,2001,2001,2001,2001,2001,
2000,2000,2000,2000,2000,2000,2000,2000,2000,2000,2000,2000,
2001,2001,2001,2001,2001,2001,2001,2001,2001,2001,2001,2001),
month=c(1,2,3,4,5,6,7,8,9,10,11,12,
1,2,3,4,5,6,7,8,9,10,11,12,
1,2,3,4,5,6,7,8,9,10,11,12,
1,2,3,4,5,6,7,8,9,10,11,12),
rain_above =c(0,0,0,1,1,1,1,1,0,0,0,0,
0,0,0,0,1,1,1,1,1,0,0,0,
0,1,1,1,1,0,0,0,1,1,0,0,
1,1,1,0,0,0,1,1,1,0,0,0),
rain = c(4.5,4,5,7.1,7.7,8,7.4,7.9,5.1,4.9,4.6,4.4,
4.4,4,4.8,5.1,7.2,7.4,7.4,7.1,7.6,5.4,5.1,5,
7.3,11.3,11.5,11.6,11.1,6.5,6.4,6.2,9.9,10.2,5.4,5.5,
10.4,10.9,11.4,7.8,7.3,7.2,9.8,9.9,10,7.2,6.9,6.6)))
In df, town A has a rainy season in 2000 between month 4 and month 8. This is the only period of rain_above = 1.
Town B has a rainy season in 2001 between month 1 and 3. Even though there are two periods of equal length (3 months), the first period in this year has a larger total rainfall.
View(df)
df
town year month rain_above rain
1 A 2000 1 0 4.5
2 A 2000 2 0 4
3 A 2000 3 0 5
4 A 2000 4 1 7.1
5 A 2000 5 1 7.7
6 A 2000 6 1 8
7 A 2000 7 1 7.4
8 A 2000 8 1 7.9
9 A 2000 9 0 5.1
10 A 2000 10 0 4.9
11 A 2000 11 0 4.6
12 A 2000 12 0 4.4
13 A 2001 1 0 4.4
14 A 2001 2 0 4
15 A 2001 3 0 4.8
16 A 2001 4 0 5.1
17 A 2001 5 1 7.2
18 A 2001 6 1 7.4
19 A 2001 7 1 7.4
20 A 2001 8 1 7.1
21 A 2001 9 1 7.6
22 A 2001 10 0 5.4
23 A 2001 11 0 5.1
24 A 2001 12 0 5
25 B 2000 1 0 7.3
26 B 2000 2 1 11.3
27 B 2000 3 1 11.5
28 B 2000 4 1 11.6
29 B 2000 5 1 11.1
30 B 2000 6 0 6.5
31 B 2000 7 0 6.4
32 B 2000 8 0 6.2
33 B 2000 9 1 9.9
34 B 2000 10 1 10.2
35 B 2000 11 0 5.4
36 B 2000 12 0 5.5
37 B 2001 1 1 10.4
38 B 2001 2 1 10.9
39 B 2001 3 1 11.4
40 B 2001 4 0 7.8
41 B 2001 5 0 7.3
42 B 2001 6 0 7.2
43 B 2001 7 1 9.8
44 B 2001 8 1 9.9
45 B 2001 9 1 10
46 B 2001 10 0 7.2
47 B 2001 11 0 6.9
48 B 2001 12 0 6.6
I would like to produce an indicator variable for rainy season, that is =1 during the longest period of months with above-average rain that has the largest total amount of rain, and =0 otherwise:
df1
town year month rain_above rain season
1 A 2000 1 0 4.5 0
2 A 2000 2 0 4 0
3 A 2000 3 0 5 0
4 A 2000 4 1 7.1 1
5 A 2000 5 1 7.7 1
6 A 2000 6 1 8 1
7 A 2000 7 1 7.4 1
8 A 2000 8 1 7.9 1
9 A 2000 9 0 5.1 0
10 A 2000 10 0 4.9 0
11 A 2000 11 0 4.6 0
12 A 2000 12 0 4.4 0
13 A 2001 1 0 4.4 0
14 A 2001 2 0 4 0
15 A 2001 3 0 4.8 0
16 A 2001 4 0 5.1 0
17 A 2001 5 1 7.2 1
18 A 2001 6 1 7.4 1
19 A 2001 7 1 7.4 1
20 A 2001 8 1 7.1 1
21 A 2001 9 1 7.6 1
22 A 2001 10 0 5.4 0
23 A 2001 11 0 5.1 0
24 A 2001 12 0 5 0
25 B 2000 1 0 7.3 0
26 B 2000 2 1 11.3 1
27 B 2000 3 1 11.5 1
28 B 2000 4 1 11.6 1
29 B 2000 5 1 11.1 1
30 B 2000 6 0 6.5 0
31 B 2000 7 0 6.4 0
32 B 2000 8 0 6.2 0
33 B 2000 9 1 9.9 0
34 B 2000 10 1 10.2 0
35 B 2000 11 0 5.4 0
36 B 2000 12 0 5.5 0
37 B 2001 1 1 10.4 1
38 B 2001 2 1 10.9 1
39 B 2001 3 1 11.4 1
40 B 2001 4 0 7.8 0
41 B 2001 5 0 7.3 0
42 B 2001 6 0 7.2 0
43 B 2001 7 1 9.8 0
44 B 2001 8 1 9.9 0
45 B 2001 9 1 10 0
46 B 2001 10 0 7.2 0
47 B 2001 11 0 6.9 0
48 B 2001 12 0 6.6 0
Any help is appreciated!
I could not find something better than this. Use rle to get the longest sequence per town and year, and then use data.table::rleid and sum to check whose consecutive season has the highest rainfall:
library(dplyr)
df1 %>%
group_by(town, year) %>%
mutate(rle = with(rle(rain_above),
rep(+(values == 1 & lengths == max(lengths)), lengths))) %>%
group_by(gp = data.table::rleid(rle), .add = T) %>%
mutate(sum_rain = sum(rain)) %>%
ungroup(gp) %>%
mutate(rain_season = +(sum_rain == max(sum_rain[rle == 1])))
output
# A tibble: 48 × 9
# Groups: town, year [4]
town year month rain_above rain rle gp sum_rain rain_season
<chr> <dbl> <dbl> <dbl> <dbl> <int> <int> <dbl> <int>
1 A 2000 1 0 4.5 0 1 13.5 0
2 A 2000 2 0 4 0 1 13.5 0
3 A 2000 3 0 5 0 1 13.5 0
4 A 2000 4 1 7.1 1 2 38.1 1
5 A 2000 5 1 7.7 1 2 38.1 1
6 A 2000 6 1 8 1 2 38.1 1
7 A 2000 7 1 7.4 1 2 38.1 1
8 A 2000 8 1 7.9 1 2 38.1 1
9 A 2000 9 0 5.1 0 3 19 0
10 A 2000 10 0 4.9 0 3 19 0
11 A 2000 11 0 4.6 0 3 19 0
12 A 2000 12 0 4.4 0 3 19 0
13 A 2001 1 0 4.4 0 3 18.3 0
14 A 2001 2 0 4 0 3 18.3 0
15 A 2001 3 0 4.8 0 3 18.3 0
16 A 2001 4 0 5.1 0 3 18.3 0
17 A 2001 5 1 7.2 1 4 36.7 1
18 A 2001 6 1 7.4 1 4 36.7 1
19 A 2001 7 1 7.4 1 4 36.7 1
20 A 2001 8 1 7.1 1 4 36.7 1
21 A 2001 9 1 7.6 1 4 36.7 1
22 A 2001 10 0 5.4 0 5 15.5 0
23 A 2001 11 0 5.1 0 5 15.5 0
24 A 2001 12 0 5 0 5 15.5 0
25 B 2000 1 0 7.3 0 5 7.3 0
26 B 2000 2 1 11.3 1 6 45.5 1
27 B 2000 3 1 11.5 1 6 45.5 1
28 B 2000 4 1 11.6 1 6 45.5 1
29 B 2000 5 1 11.1 1 6 45.5 1
30 B 2000 6 0 6.5 0 7 50.1 0
31 B 2000 7 0 6.4 0 7 50.1 0
32 B 2000 8 0 6.2 0 7 50.1 0
33 B 2000 9 1 9.9 0 7 50.1 0
34 B 2000 10 1 10.2 0 7 50.1 0
35 B 2000 11 0 5.4 0 7 50.1 0
36 B 2000 12 0 5.5 0 7 50.1 0
37 B 2001 1 1 10.4 1 8 32.7 1
38 B 2001 2 1 10.9 1 8 32.7 1
39 B 2001 3 1 11.4 1 8 32.7 1
40 B 2001 4 0 7.8 0 9 22.3 0
41 B 2001 5 0 7.3 0 9 22.3 0
42 B 2001 6 0 7.2 0 9 22.3 0
43 B 2001 7 1 9.8 1 10 29.7 0
44 B 2001 8 1 9.9 1 10 29.7 0
45 B 2001 9 1 10 1 10 29.7 0
46 B 2001 10 0 7.2 0 11 20.7 0
47 B 2001 11 0 6.9 0 11 20.7 0
48 B 2001 12 0 6.6 0 11 20.7 0
You may try it with rleid from data.table like below
library(data.table)
setDT(df1)[
,
`:=`(sum_rain = sum(rain), grplen = .N),
.(town, year, rleid(rain_above))
][
, rain_season := +(sum_rain == max(sum_rain) & grplen == max(grplen)),
.(town, year)
][
,
grplen := NULL
][]
which gives
town year month rain_above rain sum_rain rain_season
1: A 2000 1 0 4.5 13.5 0
2: A 2000 2 0 4.0 13.5 0
3: A 2000 3 0 5.0 13.5 0
4: A 2000 4 1 7.1 38.1 1
5: A 2000 5 1 7.7 38.1 1
6: A 2000 6 1 8.0 38.1 1
7: A 2000 7 1 7.4 38.1 1
8: A 2000 8 1 7.9 38.1 1
9: A 2000 9 0 5.1 19.0 0
10: A 2000 10 0 4.9 19.0 0
11: A 2000 11 0 4.6 19.0 0
12: A 2000 12 0 4.4 19.0 0
13: A 2001 1 0 4.4 18.3 0
14: A 2001 2 0 4.0 18.3 0
15: A 2001 3 0 4.8 18.3 0
16: A 2001 4 0 5.1 18.3 0
17: A 2001 5 1 7.2 36.7 1
18: A 2001 6 1 7.4 36.7 1
19: A 2001 7 1 7.4 36.7 1
20: A 2001 8 1 7.1 36.7 1
21: A 2001 9 1 7.6 36.7 1
22: A 2001 10 0 5.4 15.5 0
23: A 2001 11 0 5.1 15.5 0
24: A 2001 12 0 5.0 15.5 0
25: B 2000 1 0 7.3 7.3 0
26: B 2000 2 1 11.3 45.5 1
27: B 2000 3 1 11.5 45.5 1
28: B 2000 4 1 11.6 45.5 1
29: B 2000 5 1 11.1 45.5 1
30: B 2000 6 0 6.5 19.1 0
31: B 2000 7 0 6.4 19.1 0
32: B 2000 8 0 6.2 19.1 0
33: B 2000 9 1 9.9 20.1 0
34: B 2000 10 1 10.2 20.1 0
35: B 2000 11 0 5.4 10.9 0
36: B 2000 12 0 5.5 10.9 0
37: B 2001 1 1 10.4 32.7 1
38: B 2001 2 1 10.9 32.7 1
39: B 2001 3 1 11.4 32.7 1
40: B 2001 4 0 7.8 22.3 0
41: B 2001 5 0 7.3 22.3 0
42: B 2001 6 0 7.2 22.3 0
43: B 2001 7 1 9.8 29.7 0
44: B 2001 8 1 9.9 29.7 0
45: B 2001 9 1 10.0 29.7 0
46: B 2001 10 0 7.2 20.7 0
47: B 2001 11 0 6.9 20.7 0
48: B 2001 12 0 6.6 20.7 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With