I want to add a new column to the data frame below that calculates maximum dry spell length for each month. This is what my data frame looks like: <pre class="prettyprint"><code> day month year rr spell spell1 1 1 1981 0 dry 1 2 1 1981 0 dry 1 3 1 1981 0 dry 1 4 1 1981 1.1 dry 0 5 1 1981 0 dry 1 6 1 1981 0 dry 1 7 1 1981 0 dry 1 8 1 1981 0 dry 1 9 1 1981 2.7 dry 0 10 1 1981 0 dry 1 </code></pre> This is the output I need: <pre class="prettyprint"><code> month year spell_length 1 1981 3 1 1981 4 1 1981 1 </code></pre> this is what I have done so far: <pre class="prettyprint"><code>group_by(df, year, month, spell1) %>% summarise(spell2 = sum(spell1, na.rm = TRUE)) </code></pre> and this is the result: <pre class="prettyprint"><code> year month spell1 spell_length <int> <int> <dbl> <dbl> 1 1981 1 1 31 2 1981 2 0 0 3 1981 2 1 27 4 1981 3 0 0 5 1981 3 1 25 6 1981 4 0 0 </code></pre> data <pre class="prettyprint"><code>df <- read.table(h= T, text="day month year rr spell spell1 1 1 1981 0 dry 1 2 1 1981 0 dry 1 3 1 1981 0 dry 1 4 1 1981 1.1 dry 0 5 1 1981 0 dry 1 6 1 1981 0 dry 1 7 1 1981 0 dry 1 8 1 1981 0 dry 1 9 1 1981 2.7 dry 0 10 1 1981 0 dry 1") </code></pre>

One option would be to group by 'run-length-id' of 'spell' (<code>rleid</code> from <code>data.table</code> - creates a new grouping id when the value changes in that column), <code>filter</code> out the rows having 'spell1' is 0, get the number of rows with <code>n()</code> <pre class="prettyprint"><code>library(dplyr) library(data.table) df1 %>% group_by(year, month, grp = rleid(spell1)) %>% filter(spell1 ==1) %>% summarise(spell_length = n()) %>% ungroup %>% select(-grp) # A tibble: 3 x 3 # year month spell_length # <int> <int> <int> #1 1981 1 3 #2 1981 1 4 #3 1981 1 1 </code></pre> <hr> Or use <code>rle</code> from <code>base R</code> <pre class="prettyprint"><code>rl1 <- rle(df1$spell1) rl1$lengths[rl1$values > 0] #[1] 3 4 1 </code></pre> NOTE: This solution also works when the 'spell1' values are different

Using <code>dplyr</code> we can create groups at every occurrence of 0 using <code>cumsum</code> and sum the number of <code>spells</code> in each group. <pre class="prettyprint"><code>library(dplyr) df %>% group_by(month, year, group = cumsum(spell1 == 0)) %>% summarise(spell_length = sum(spell1)) %>% ungroup() %>% select(-group) # month year spell_length # <int> <int> <int> #1 1 1981 3 #2 1 1981 4 #3 1 1981 1 </code></pre>

Summarise to return the length by group

Tags:

r

dplyr

I want to add a new column to the data frame below that calculates maximum dry spell length for each month. This is what my data frame looks like:

   day month year  rr spell spell1
     1     1 1981  0   dry      1
     2     1 1981  0   dry      1
     3     1 1981  0   dry      1
     4     1 1981  1.1 dry      0
     5     1 1981  0   dry      1
     6     1 1981  0   dry      1
     7     1 1981  0   dry      1
     8     1 1981  0   dry      1
     9     1 1981  2.7 dry      0
    10     1 1981  0   dry      1

This is the output I need:

 month year  spell_length
     1 1981      3
     1 1981      4
     1 1981      1

this is what I have done so far:

group_by(df, year, month, spell1) %>% 
    summarise(spell2 = sum(spell1, na.rm = TRUE))

and this is the result:

  year month spell1 spell_length
  <int> <int>  <dbl>  <dbl>
1  1981     1      1     31
2  1981     2      0      0
3  1981     2      1     27
4  1981     3      0      0
5  1981     3      1     25
6  1981     4      0      0

data

df <- read.table(h= T, text="day month year  rr spell spell1
1     1 1981  0   dry      1
2     1 1981  0   dry      1
3     1 1981  0   dry      1
4     1 1981  1.1 dry      0
5     1 1981  0   dry      1
6     1 1981  0   dry      1
7     1 1981  0   dry      1
8     1 1981  0   dry      1
9     1 1981  2.7 dry      0
10     1 1981  0   dry      1")

351

asked May 10 '19 08:05

ahmad bello

2 Answers

One option would be to group by 'run-length-id' of 'spell' (rleid from data.table - creates a new grouping id when the value changes in that column), filter out the rows having 'spell1' is 0, get the number of rows with n()

library(dplyr)
library(data.table)
df1 %>%
    group_by(year, month, grp = rleid(spell1)) %>%
    filter(spell1 ==1) %>%
    summarise(spell_length = n()) %>%
    ungroup %>%
    select(-grp)
# A tibble: 3 x 3
#   year month spell_length
#  <int> <int>        <int>
#1  1981     1            3
#2  1981     1            4
#3  1981     1            1

Or use rle from base R

rl1 <- rle(df1$spell1)
rl1$lengths[rl1$values > 0]
#[1] 3 4 1

NOTE: This solution also works when the 'spell1' values are different

answered Sep 18 '22 07:09

akrun

Using dplyr we can create groups at every occurrence of 0 using cumsum and sum the number of spells in each group.

library(dplyr)

df %>%
  group_by(month, year, group = cumsum(spell1 == 0)) %>%
  summarise(spell_length = sum(spell1)) %>%
  ungroup() %>%
  select(-group)

#    month  year spell_length
#   <int> <int>        <int>
#1     1  1981            3
#2     1  1981            4
#3     1  1981            1

answered Sep 19 '22 07:09

Ronak Shah

Related questions
                            
                                How to generate a URL to restore the user input values in Shiny
                            
                                configure error installing R-3.3.2 on Ubuntu: checking whether bzip2 support suffices... configure: error: bzip2 library and headers are required
                            
                                Memory efficient creation of sparse matrix
                            
                                What is the logic of this function in R?
                            
                                when do you want to set up new environments in R
                            
                                convert a data frame into a specifically formatted frequency table
                            
                                Time series and stl in R: Error only univariate series are allowed
                            
                                R pdf() usage inside a function()
                            
                                How to properly include dependencies in R-package?
                            
                                Elegant R function: mixed case separated by periods to underscore separated lower case and/or camel case
                            
                                add and resize a local image to a .Rmd file in RStudio that will produce a pdf
                            
                                Using source() within parallel foreach loops
                            
                                Conditional panel in Shiny dashboard
                            
                                R: converting each row of a data frame into a list item
                            
                                In R data.table, how do I pass variable parameters to an expression?
                            
                                Large Matrices in R: long vectors not supported yet
                            
                                GBM R function: get variable importance separately for each class
                            
                                Use pipe without feeding first argument
                            
                                How to apply geom_smooth() for every group?
                            
                                No RTools compatible with R version 3.5.0 was found

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With