extracting first and last positions in a dataset

Tags:

dplyr

I have this dataset that I'm trying to transform to get the "from" and "to" positions within a particular grouping of data points that pass a test.

Here's how the data looks:

pos <- seq(from = 10, to = 100, by = 10)
test <- c(1, 1, 1, 0, 0, 0, 1, 1, 1, 0)
df <- data.frame(pos, test)

So you can see that positions 10, 20, and 30, as well as 70, 80, and 90 pass the test (b/c test = 1) but the rest of the points don't. The answer I'm looking for would be a data frame that looks something like the "answer" data frame in the code below:

peaknum <- c(1, 2)
from <- c(10, 70)
to <- c(30, 90)
answer <- data.frame(peaknum, from, to)

Any suggestions as to how I can transform the dataset? I'm stumped.

Thanks, Steve

394

asked Mar 17 '16 19:03

Steven

2 Answers

We can use data.table. Use the rleid function to create the run-length group ids ('peaknum') based on the adjacent values that are same 'test'. Using 'peaknum' as grouping variable, we get the 'min' and 'max' of 'pos', while specifying the 'i' as 'test==1' to subset the rows. If needed, the 'peaknum' values can be changed to the sequence ('seq_len(.N)`).

library(data.table)
setDT(df)[, peaknum:= rleid(test)][test==1, 
   list(from=min(pos), to=max(pos)) ,peaknum][, peaknum:= seq_len(.N)]
#   peaknum from to
#1:       1   10 30
#2:       2   70 90

154

answered Sep 21 '22 21:09

akrun

We can do it with dplyr, though separating the nodes is a little ugly:

library(dplyr)
df %>% group_by(peaknum = rep(seq(rle(test)[['lengths']]), rle(test)[['lengths']])) %>% 
  filter(test == 1) %>% 
  summarise(from = min(pos), 
            to = max(pos)) %>%
  mutate(peaknum = seq_along(peaknum))

# Source: local data frame [2 x 3]

#   peaknum  from    to
#     (int) (dbl) (dbl)
# 1       1    10    30
# 2       2    70    90

What it does:

the first group_by uses rle to add a column that is a sequence along the repeated numbers in test, and groups it for summarise later;
filter chops rows down to only those where test is 1
summarise collapses the groups and adds max and min for each,
and lastly mutate cleans up the numbering of peaknum.

answered Sep 20 '22 21:09

alistaire

Related questions
                            
                                cbind converting factor to numeric
                            
                                flatten nested list by averaging vectors
                            
                                Mouseover in plotly and shiny
                            
                                element replacement in grid unit vector
                            
                                How to align title and subtitle in ggplot2 when generated via expression
                            
                                How to build multiclass SVM in R?
                            
                                How to pass values (choices) to selectizeInput() after selecting data from UI in shiny app?
                            
                                How to add time series objects (ts) in a data.table, by row?
                            
                                Justify text in R
                            
                                ggplot2 label: Combination of Greek symbol and exponential term,
                            
                                how to make a PCA plots as I posted here
                            
                                Cauchy prior in JAGS
                            
                                Combining cbind and paste in linear model
                            
                                interactive 3D plots in markdown file - not working anymore?
                            
                                Datatable is not printed in combination with cat command in Rmd / RStudio
                            
                                Expression containing a comma as annotation on a single facet: is it possible?
                            
                                Changing the position of the legend in ggplot2
                            
                                How to wrap RHS terms of a formula with a function
                            
                                Adding background image to Shiny NavBarPage
                            
                                R: How do you apply grep() in lapply()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With