(1) I have a large table read in R with more than a 10000 of rows and 10 columns.
(2) The 3rd column of the table contain the name of the hospitals. Some of them are duplicated or even more.
(3) I have a vector of hospitals' name, e.g. 10 of them are needed to be study further.
(4) Could you mind to teach me how to extract all the rows in step1 with the names listed in step 3?
Here is a shorter example of my input file;
Patients Treatment Hospital Response 1 A YYY Good 2 B YYY Dead 3 A ZZZ Good 4 A WWW Good 5 C UUU Dead
I have a vector of hospital that I am interested to study further, i.e YYY
and UUU
. How to generate a output table as follows with R?
Patients Treatment Hospital Response 1 A YYY Good 2 B YYY Dead 5 C UUU Dead
If we have a vector and a data frame, and the data frame has a column that contains the values similar as in the vector then we can create a subset of the data frame based on that vector. This can be done with the help of single square brackets and %in% operator.
By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.
Filter Rows by Condition You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows. You can also write the above statement with a variable.
Use the %in%
operator.
#Sample data dat <- data.frame(patients = 1:5, treatment = letters[1:5], hospital = c("yyy", "yyy", "zzz", "www", "uuu"), response = rnorm(5)) #List of hospitals we want to do further analysis on goodHosp <- c("yyy", "uuu")
You can either index directly into your data.frame object:
dat[dat$hospital %in% goodHosp ,]
or use the subset command:
subset(dat, hospital %in% goodHosp)
Using dplyr
Setting up Data --- using @Chase's sample data.
#Sample data df <- data.frame(patients = 1:5, treatment = letters[1:5], hospital = c("yyy", "yyy", "zzz", "www", "uuu"), response = rnorm(5)) #List of hospitals we want to do further analysis on goodHosp <- c("yyy", "uuu")
Now filter data using dplyr
filter
library(dplyr) df %>% filter(hospital %in% goodHosp)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With