I'm looking for an easy way for filtering rows from a data.frame, based on a list of numeric sequences.
Here's a exemple:
My initial data frame:
data <- data.frame(x=c(0,1,2,0,1,2,3,4,5,12,2,0,10,11,12,13),y="other_data")
My list:
list1 <- list(1:5,10:13)
My goal is to keep only the rows from "data" which contains exactly the same numeric sequences of "list1" as in the "x" column of "data". So the output data.frame should be:
finaldata <- data.frame(x=c(1:5,10:13),y="other_data")
Any ideas for doing this?
To select the rows from a Pandas DataFrame based on input values, we can use the isin() method.
You can select the Rows from Pandas DataFrame based on column values or based on multiple conditions either using DataFrame. loc[] attribute, DataFrame. query() or DataFrame. apply() method to use lambda function.
If you'd like to select rows based on integer indexing, you can use the . iloc function. If you'd like to select rows based on label indexing, you can use the . loc function.
I started with a custom function to get the subset for one sequence, then it's easy to extend with lapply.
#function that takes sequence and a vector
#and returns indices of vector that have complete sequence
get_row_indices<- function(sequence,v){
#get run lengths of whether vector is in sequence
rle_d <- rle(v %in% sequence)
#test if it's complete, so both v in sequence and length of
#matches is length of sequence
select <- rep(length(sequence)==rle_d$lengths &rle_d$values,rle_d$lengths)
return(select)
}
#add row ID to data to show selection
data$row_id <- 1:nrow(data)
res <- do.call(rbind,lapply(list1,function(x){
return(data[get_row_indices(sequence=x,v=data$x),])
}))
res
> res
x y row_id
5 1 other_data 5
6 2 other_data 6
7 3 other_data 7
8 4 other_data 8
9 5 other_data 9
13 10 other_data 13
14 11 other_data 14
15 12 other_data 15
16 13 other_data 16
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With