I'm trying to find runs of years in a data frame (ideally using plyr) I'd like to get from this: <pre class="prettyprint"><code>require(plyr) dat<-data.frame( name=c(rep("A", 11), rep("B", 11)), year=c(2000:2010, 2000:2005, 2007:2011) ) </code></pre> To this: <pre class="prettyprint"><code>out<-data.frame( name=c("A", "B", "B"), range=c("2000-2010", "2000-2005", "2007-2011")) </code></pre> It's easy enough to identify whether each group has a continuous run of years: <pre class="prettyprint"><code>ddply(dat, .(name), summarise, continuous=(max(year)-min(year))+1==length(year)) </code></pre> How do I go about breaking down group "B" into two ranges? Any ideas or strategies would be really appreciated. Thanks

Whether you use a function from "plyr" or from base R, you need to first establish some groups. One way to detect the change in groups since your years are sequential is to look for where <code>diff</code> is not equal to 1. <code>diff</code> creates a vector of length one less than the input vector, so we'll initialize that with "1" and take the <code>cumsum</code> of the result. Putting that mouthful of an explanation into practice, you can try something like this: <pre class="prettyprint"><code>dat$id2 <- cumsum(c(1, diff(dat$year) != 1)) </code></pre> From here, you can use <code>aggregate</code> or your favorite grouping function to get the output you're looking for. <pre class="prettyprint"><code>aggregate(year ~ name + id2, dat, function(x) paste(min(x), max(x), sep = "-")) # name id2 year # 1 A 1 2000-2010 # 2 B 2 2000-2005 # 3 B 3 2007-2011 </code></pre> To use <code>range</code> with <code>aggregate</code>, you need to change <code>sep</code> to <code>collapse</code>, as below: <pre class="prettyprint"><code>aggregate(year ~ name + id2, dat, function(x) paste(range(x), collapse = "-")) </code></pre>

Finding ranges in runs of numbers

Tags:

r

I'm trying to find runs of years in a data frame (ideally using plyr)

I'd like to get from this:

require(plyr)

dat<-data.frame(
  name=c(rep("A", 11), rep("B", 11)),
  year=c(2000:2010, 2000:2005, 2007:2011)
  )

To this:

out<-data.frame(
  name=c("A", "B", "B"),
  range=c("2000-2010", "2000-2005", "2007-2011"))

It's easy enough to identify whether each group has a continuous run of years:

ddply(dat, .(name), summarise,
      continuous=(max(year)-min(year))+1==length(year))

How do I go about breaking down group "B" into two ranges?

Any ideas or strategies would be really appreciated.

Thanks

905

asked Aug 16 '13 15:08

Ed G

1 Answers

Whether you use a function from "plyr" or from base R, you need to first establish some groups. One way to detect the change in groups since your years are sequential is to look for where diff is not equal to 1. diff creates a vector of length one less than the input vector, so we'll initialize that with "1" and take the cumsum of the result.

Putting that mouthful of an explanation into practice, you can try something like this:

dat$id2 <- cumsum(c(1, diff(dat$year) != 1))

From here, you can use aggregate or your favorite grouping function to get the output you're looking for.

aggregate(year ~ name + id2, dat, function(x) paste(min(x), max(x), sep = "-"))
#   name id2      year
# 1    A   1 2000-2010
# 2    B   2 2000-2005
# 3    B   3 2007-2011

To use range with aggregate, you need to change sep to collapse, as below:

aggregate(year ~ name + id2, dat, function(x) paste(range(x), collapse = "-"))

145

answered Oct 24 '22 03:10

A5C1D2H2I1M1N2O1R2T1

Related questions
                            
                                sorting numerical values R
                            
                                Plotting continuous and discrete series in ggplot with facet
                            
                                Apply in R: recursive function that operates on its own previous result
                            
                                Collapse data.table column values while grouping
                            
                                Producing a rolling average of ALL the previous observations per ID in an unbalanced panel data set
                            
                                Product of likelihood too small - R only gives 0
                            
                                Violin Plot (geom_violin) with aggregated values
                            
                                Compare matrix with elements in vector by row
                            
                                How to construct a function call to pmax from the columns of a matrix
                            
                                logical test if object is a directory
                            
                                Install particular version(2.15.2) of r-base on ubuntu
                            
                                Understand and avoid infinite recursion R
                            
                                lubridate errors in R
                            
                                write.xlsx outputting merged cells directly from R
                            
                                How to build a layered plot step by step using grid in knitr?
                            
                                Nested lists: how to define the size before entering data
                            
                                Graph Visualization with igraph and R
                            
                                How to call a function that returns multiple rows and columns in a data.table?
                            
                                Function `dist` not behaving as expected on vectors with missing values
                            
                                R - How to create a function that accepts a code block as parameter?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With