Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter a table's row based on an external vector?

Tags:

r

filter

(1) I have a large table read in R with more than a 10000 of rows and 10 columns.

(2) The 3rd column of the table contain the name of the hospitals. Some of them are duplicated or even more.

(3) I have a vector of hospitals' name, e.g. 10 of them are needed to be study further.

(4) Could you mind to teach me how to extract all the rows in step1 with the names listed in step 3?

Here is a shorter example of my input file;

Patients Treatment Hospital Response  1        A         YYY      Good  2        B         YYY      Dead  3        A         ZZZ      Good  4        A         WWW      Good  5        C         UUU      Dead 

I have a vector of hospital that I am interested to study further, i.e YYY and UUU. How to generate a output table as follows with R?

Patients Treatment Hospital Response  1        A         YYY      Good  2        B         YYY      Dead  5        C         UUU      Dead 
like image 923
Catherine Avatar asked Apr 07 '11 16:04

Catherine


People also ask

How do you subset a DataFrame based on a vector in R?

If we have a vector and a data frame, and the data frame has a column that contains the values similar as in the vector then we can create a subset of the data frame based on that vector. This can be done with the help of single square brackets and %in% operator.

How do I select a row with a specific value in R?

By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.

How do you filter certain rows in Python?

Filter Rows by Condition You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows. You can also write the above statement with a variable.


2 Answers

Use the %in% operator.

#Sample data dat <- data.frame(patients = 1:5, treatment = letters[1:5],   hospital = c("yyy", "yyy", "zzz", "www", "uuu"), response = rnorm(5))  #List of hospitals we want to do further analysis on goodHosp <- c("yyy", "uuu") 

You can either index directly into your data.frame object:

dat[dat$hospital %in% goodHosp ,] 

or use the subset command:

subset(dat, hospital %in% goodHosp) 
like image 95
Chase Avatar answered Sep 22 '22 05:09

Chase


Using dplyr

Setting up Data --- using @Chase's sample data.

#Sample data df <- data.frame(patients = 1:5, treatment = letters[1:5],   hospital = c("yyy", "yyy", "zzz", "www", "uuu"), response = rnorm(5))  #List of hospitals we want to do further analysis on goodHosp <- c("yyy", "uuu") 

Now filter data using dplyr filter

library(dplyr) df %>% filter(hospital %in% goodHosp) 
like image 23
RK1 Avatar answered Sep 24 '22 05:09

RK1