I'm looking for an easy way for filtering rows from a data.frame, based on a list of numeric sequences. Here's a exemple: My initial data frame: <pre class="prettyprint"><code>data <- data.frame(x=c(0,1,2,0,1,2,3,4,5,12,2,0,10,11,12,13),y="other_data") </code></pre> My list: <pre class="prettyprint"><code>list1 <- list(1:5,10:13) </code></pre> My goal is to keep only the rows from "data" which contains exactly the same numeric sequences of "list1" as in the "x" column of "data". So the output data.frame should be: <pre class="prettyprint"><code>finaldata <- data.frame(x=c(1:5,10:13),y="other_data") </code></pre> Any ideas for doing this?

I started with a custom function to get the subset for one sequence, then it's easy to extend with lapply. <pre class="prettyprint"><code>#function that takes sequence and a vector #and returns indices of vector that have complete sequence get_row_indices<- function(sequence,v){ #get run lengths of whether vector is in sequence rle_d <- rle(v %in% sequence) #test if it's complete, so both v in sequence and length of #matches is length of sequence select <- rep(length(sequence)==rle_d$lengths &rle_d$values,rle_d$lengths) return(select) } #add row ID to data to show selection data$row_id <- 1:nrow(data) res <- do.call(rbind,lapply(list1,function(x){ return(data[get_row_indices(sequence=x,v=data$x),]) })) res > res x y row_id 5 1 other_data 5 6 2 other_data 6 7 3 other_data 7 8 4 other_data 8 9 5 other_data 9 13 10 other_data 13 14 11 other_data 14 15 12 other_data 15 16 13 other_data 16 </code></pre>

Extract rows from a data.frame based on common values with a list

Tags:

list

r

filtering

I'm looking for an easy way for filtering rows from a data.frame, based on a list of numeric sequences.

Here's a exemple:

My initial data frame:

data <- data.frame(x=c(0,1,2,0,1,2,3,4,5,12,2,0,10,11,12,13),y="other_data")

My list:

list1 <- list(1:5,10:13)

My goal is to keep only the rows from "data" which contains exactly the same numeric sequences of "list1" as in the "x" column of "data". So the output data.frame should be:

finaldata <- data.frame(x=c(1:5,10:13),y="other_data")

Any ideas for doing this?

309

asked Sep 17 '15 11:09

jeff6868

1 Answers

I started with a custom function to get the subset for one sequence, then it's easy to extend with lapply.

#function that takes sequence and a vector
#and returns indices of vector that have complete sequence
get_row_indices<- function(sequence,v){
  #get run lengths of whether vector is in sequence
  rle_d <- rle(v %in% sequence)
  #test if it's complete, so both v in sequence and length of 
  #matches is length of sequence
  select <- rep(length(sequence)==rle_d$lengths &rle_d$values,rle_d$lengths)

  return(select)

}


#add row ID to data to show selection
data$row_id <- 1:nrow(data)
res <- do.call(rbind,lapply(list1,function(x){
  return(data[get_row_indices(sequence=x,v=data$x),])
}))

res

> res
    x          y row_id
5   1 other_data      5
6   2 other_data      6
7   3 other_data      7
8   4 other_data      8
9   5 other_data      9
13 10 other_data     13
14 11 other_data     14
15 12 other_data     15
16 13 other_data     16

102

answered Sep 27 '22 18:09

Heroka

Related questions
                            
                                Merge data.table by two nearest variables
                            
                                Legend for geom_text with variable font family
                            
                                automatic column prefix with cbind and just one column
                            
                                R: add new column to a list of data frames with lapply
                            
                                how to control default packages loaded by RMarkdown/knitr to avoid option clash
                            
                                Unicode character with subscript
                            
                                utf-8 characters get lost when converting from list to data.frame in R
                            
                                How to add a factor level on the x-axis that represents all the observations in ggplot2?
                            
                                Load a shared library linked to Rust library in R
                            
                                How to change dplyr::tbl connection encoding to utf8?
                            
                                How to dcast a data.table with missing values before given date
                            
                                S4 documentation of "[" with 'missing' arguments
                            
                                How to predict random and fixed effect models?
                            
                                How can I scale points in Dygraphs?
                            
                                R: Fast way to create a sparse model matrix
                            
                                Plot igraph tree objects with ggtree
                            
                                Kernel Error in R Notebook using Jupyter Notebook
                            
                                R: count days that start at sunset
                            
                                How to remove rows from data frame based on subset function?
                            
                                ggplot increase distance between boxplots

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With