I have a matrix that I would like to subset and eventually use to make a plot. The data is a list of counts for specific blood markers for each patient in a population. It looks like this:
df <- data.frame(MarkerID=c("Class","A123","A124"),
MarkerName=c("","X","Y"),
Patient.1=c(0,1,5),
Patent.2=c(1,2,6),
Patent.3=c(0,3,7),
Patient.4=c(1,4,8))
I would like to make a data frame of all of the patients (columns 3-6) that have a class value of zero (1st row) and a second data frame of all of the patients with a class value of 1.
In the past I have used the subset function to select rows based on the values in a column, is it possible to select a subset of columns based on the values in a row?
I've tried this:
x <- subset(data, data[1,] == 0)
however, when I do dim(x)
the number of columns is the same as dim(data)
but the number of rows is different. Any ideas on how I can make this return just those columns whose value in row 1 is 0?
Roland,
Yes. You're example df is what the data frame looks like. There are ~30,000 markers and >400 patients in the data frame so I didn't post the dput(head(data))
. Thanks for the reshaping tip, I'll give that a try.
Your example code did work to subset the columns based on the rows
data[,c(TRUE,TRUE,data[1,-(1:2)]==1)]
on the data I was then able to get a data frame with all of the rows and only the columns with the indicated class.
How to subset the data frame (DataFrame) by column value and name in R? By using R base df[] notation, or subset() you can easily subset the R Data Frame (data. frame) by column value or by column name.
The most general way to subset a data frame by rows and/or columns is the base R Extract[] function, indicated by matched square brackets instead of the usual matched parentheses. For a data frame named d the general format is d[rows, columms] .
By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.
If you wanted to get the subset of a data. frame (DataFrame) Rows & Columns in R, either use the subset() function , filter() from dplyr package or R base square bracket notation df[] . subset() is a generic R function that is used to get the rows and columns (In R terms observations & variables) from the data frame.
Your data is nor arranged in a good way. It would be better to reshape it.
In absence of input data this is just a guess:
df <- data.frame(MarkerID=c("Class","A123","A124"),
MarkerName=c("","X","Y"),
Patient.1=c(0,1,5),
Patent.2=c(1,2,6),
Patent.3=c(0,3,7),
Patient.4=c(1,4,8))
# MarkerID MarkerName Patient.1 Patent.2 Patent.3 Patient.4
#1 Class 0 1 0 1
#2 A123 X 1 2 3 4
#3 A124 Y 5 6 7 8
df[,c(TRUE,TRUE,df[1,-(1:2)]==0)]
# MarkerID MarkerName Patient.1 Patent.3
#1 Class 0 0
#2 A123 X 1 3
#3 A124 Y 5 7
Here c(TRUE,TRUE,df[1,-(1:2)]==0)
creates a logical vector, which is TRUE
for the first two columns and for those columns, which have a 0 in the first row. Then I subset the columns based on this vector.
df[,c(TRUE,TRUE,df[1,-(1:2)]==1)]
# MarkerID MarkerName Patent.2 Patient.4
#1 Class 1 1
#2 A123 X 2 4
#3 A124 Y 6 8
This would reshape your data into a more common format (for statistical software):
library(reshape2)
df2 <- merge(melt(df[1,],variable.name="Patient",value.name="class")[-(1:2)],
melt(df[-1,],variable.name="Patient"),all=TRUE)
# Patient class MarkerID MarkerName value
#1 Patent.2 1 A123 X 2
#2 Patent.2 1 A124 Y 6
#3 Patent.3 0 A123 X 3
#4 Patent.3 0 A124 Y 7
#5 Patient.1 0 A123 X 1
#6 Patient.1 0 A124 Y 5
#7 Patient.4 1 A123 X 4
#8 Patient.4 1 A124 Y 8
You could then use subset
:
subset(df2,class==0)
# Patient class MarkerID MarkerName value
#3 Patent.3 0 A123 X 3
#4 Patent.3 0 A124 Y 7
#5 Patient.1 0 A123 X 1
#6 Patient.1 0 A124 Y 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With