Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summarise to return the length by group

Tags:

r

dplyr

I want to add a new column to the data frame below that calculates maximum dry spell length for each month. This is what my data frame looks like:

   day month year  rr spell spell1
     1     1 1981  0   dry      1
     2     1 1981  0   dry      1
     3     1 1981  0   dry      1
     4     1 1981  1.1 dry      0
     5     1 1981  0   dry      1
     6     1 1981  0   dry      1
     7     1 1981  0   dry      1
     8     1 1981  0   dry      1
     9     1 1981  2.7 dry      0
    10     1 1981  0   dry      1

This is the output I need:

 month year  spell_length
     1 1981      3
     1 1981      4
     1 1981      1

this is what I have done so far:

group_by(df, year, month, spell1) %>% 
    summarise(spell2 = sum(spell1, na.rm = TRUE))

and this is the result:

  year month spell1 spell_length
  <int> <int>  <dbl>  <dbl>
1  1981     1      1     31
2  1981     2      0      0
3  1981     2      1     27
4  1981     3      0      0
5  1981     3      1     25
6  1981     4      0      0

data

df <- read.table(h= T, text="day month year  rr spell spell1
1     1 1981  0   dry      1
2     1 1981  0   dry      1
3     1 1981  0   dry      1
4     1 1981  1.1 dry      0
5     1 1981  0   dry      1
6     1 1981  0   dry      1
7     1 1981  0   dry      1
8     1 1981  0   dry      1
9     1 1981  2.7 dry      0
10     1 1981  0   dry      1")
like image 351
ahmad bello Avatar asked May 10 '19 08:05

ahmad bello


People also ask

What does Summarise () do in R?

Summarize Function in R Programming. As its name implies, the summarize function reduces a data frame to a summary of just one vector or value. Many times, these summaries are calculated by grouping observations using a factor or categorical variables first.

What does N () do in R?

The function n() returns the number of observations in a current group.

How do you count observations in R Dplyr?

count() lets you quickly count the unique values of one or more variables: df %>% count(a, b) is roughly equivalent to df %>% group_by(a, b) %>% summarise(n = n()) . count() is paired with tally() , a lower-level helper that is equivalent to df %>% summarise(n = n()) .


2 Answers

One option would be to group by 'run-length-id' of 'spell' (rleid from data.table - creates a new grouping id when the value changes in that column), filter out the rows having 'spell1' is 0, get the number of rows with n()

library(dplyr)
library(data.table)
df1 %>%
    group_by(year, month, grp = rleid(spell1)) %>%
    filter(spell1 ==1) %>%
    summarise(spell_length = n()) %>%
    ungroup %>%
    select(-grp)
# A tibble: 3 x 3
#   year month spell_length
#  <int> <int>        <int>
#1  1981     1            3
#2  1981     1            4
#3  1981     1            1

Or use rle from base R

rl1 <- rle(df1$spell1)
rl1$lengths[rl1$values > 0]
#[1] 3 4 1

NOTE: This solution also works when the 'spell1' values are different

like image 56
akrun Avatar answered Sep 18 '22 07:09

akrun


Using dplyr we can create groups at every occurrence of 0 using cumsum and sum the number of spells in each group.

library(dplyr)

df %>%
  group_by(month, year, group = cumsum(spell1 == 0)) %>%
  summarise(spell_length = sum(spell1)) %>%
  ungroup() %>%
  select(-group)

#    month  year spell_length
#   <int> <int>        <int>
#1     1  1981            3
#2     1  1981            4
#3     1  1981            1
like image 33
Ronak Shah Avatar answered Sep 19 '22 07:09

Ronak Shah