Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract rows from a data.frame based on common values with a list

Tags:

list

r

filtering

I'm looking for an easy way for filtering rows from a data.frame, based on a list of numeric sequences.

Here's a exemple:

My initial data frame:

data <- data.frame(x=c(0,1,2,0,1,2,3,4,5,12,2,0,10,11,12,13),y="other_data")

My list:

list1 <- list(1:5,10:13)

My goal is to keep only the rows from "data" which contains exactly the same numeric sequences of "list1" as in the "x" column of "data". So the output data.frame should be:

finaldata <- data.frame(x=c(1:5,10:13),y="other_data")

Any ideas for doing this?

like image 309
jeff6868 Avatar asked Sep 17 '15 11:09

jeff6868


People also ask

How do you select a rows of a DataFrame based on a list of values?

To select the rows from a Pandas DataFrame based on input values, we can use the isin() method.

How do I select rows from a DataFrame based on multiple column values?

You can select the Rows from Pandas DataFrame based on column values or based on multiple conditions either using DataFrame. loc[] attribute, DataFrame. query() or DataFrame. apply() method to use lambda function.

How do you extract rows from a DataFrame in Python based on index?

If you'd like to select rows based on integer indexing, you can use the . iloc function. If you'd like to select rows based on label indexing, you can use the . loc function.


1 Answers

I started with a custom function to get the subset for one sequence, then it's easy to extend with lapply.

#function that takes sequence and a vector
#and returns indices of vector that have complete sequence
get_row_indices<- function(sequence,v){
  #get run lengths of whether vector is in sequence
  rle_d <- rle(v %in% sequence)
  #test if it's complete, so both v in sequence and length of 
  #matches is length of sequence
  select <- rep(length(sequence)==rle_d$lengths &rle_d$values,rle_d$lengths)

  return(select)

}


#add row ID to data to show selection
data$row_id <- 1:nrow(data)
res <- do.call(rbind,lapply(list1,function(x){
  return(data[get_row_indices(sequence=x,v=data$x),])
}))

res

> res
    x          y row_id
5   1 other_data      5
6   2 other_data      6
7   3 other_data      7
8   4 other_data      8
9   5 other_data      9
13 10 other_data     13
14 11 other_data     14
15 12 other_data     15
16 13 other_data     16
like image 102
Heroka Avatar answered Sep 27 '22 18:09

Heroka