I want to get a subset of the columns observations of my data frame, based on a threshold. I'll explain you the question with a little more details.
I have a data frame with the methylation level of 35 patients afected by lung adenocarcinoma. This is a subset of my data:
> df.met[1:5,1:5]
A2BP1 A2M A2ML1 A4GALT AAAS
paciente6 0.36184475 0.4555788 0.6422624 0.08051388 0.15013343
paciente7 0.47566878 0.7329827 0.4938048 0.45487573 0.10827520
paciente8 0.17455497 0.7528387 0.5686839 0.37018038 0.12423923
paciente9 0.04830471 0.5166676 0.8878207 0.08881092 0.11779075
paciente10 0.16757806 0.7896194 0.5408747 0.35315243 0.09234602
Now, I need to get another object (with the same number of columns, but smaller number of rows, and diferent in every column) with a subset of the values greater than 0.1 for all the columns of the initial data frame.
My intention is to obtain a object like this one (I don't know if is possible...):
A2BP1 A2M A2ML1 A4GALT AAAS
paciente6 0.36184475 0.4555788 0.6422624 0.15013343
paciente7 0.47566878 0.7329827 0.4938048 0.45487573 0.10827520
paciente8 0.17455497 0.7528387 0.5686839 0.37018038 0.12423923
paciente9 0.5166676 0.8878207 0.11779075
paciente10 0.16757806 0.7896194 0.5408747 0.35315243
In other words, I want to avoid of my data frame, the values smaller than 0.1.
Thank you so much!
The most general way to subset a data frame by rows and/or columns is the base R Extract[] function, indicated by matched square brackets instead of the usual matched parentheses. For a data frame named d the general format is d[rows, columms] .
How to subset the data frame (DataFrame) by column value and name in R? By using R base df[] notation, or subset() you can easily subset the R Data Frame (data. frame) by column value or by column name.
You may need
df.met[!rowSums(df.met <= 0.1),,drop=FALSE]
# A2BP1 A2M A2ML1 A4GALT AAAS
#paciente7 0.4756688 0.7329827 0.4938048 0.4548757 0.1082752
#paciente8 0.1745550 0.7528387 0.5686839 0.3701804 0.1242392
Based on the edit
is.na(df.met) <- df.met <= 0.1
df.met
# A2BP1 A2M A2ML1 A4GALT AAAS
#paciente6 0.3618447 0.4555788 0.6422624 NA 0.1501334
#paciente7 0.4756688 0.7329827 0.4938048 0.4548757 0.1082752
#paciente8 0.1745550 0.7528387 0.5686839 0.3701804 0.1242392
#paciente9 NA 0.5166676 0.8878207 NA 0.1177907
#paciente10 0.1675781 0.7896194 0.5408747 0.3531524 NA
Using data.table
library(data.table)#v1.9.5+
setDT(df.met, keep.rownames=TRUE)[]
for(j in 2:ncol(df.met)){
set(df.met, i=which(df.met[[j]] <=0.1), j=j, value=NA)
}
df.met
# rn A2BP1 A2M A2ML1 A4GALT AAAS
#1: paciente6 0.3618447 0.4555788 0.6422624 NA 0.1501334
#2: paciente7 0.4756688 0.7329827 0.4938048 0.4548757 0.1082752
#3: paciente8 0.1745550 0.7528387 0.5686839 0.3701804 0.1242392
#4: paciente9 NA 0.5166676 0.8878207 NA 0.1177907
#5: paciente10 0.1675781 0.7896194 0.5408747 0.3531524 NA
df.met <- structure(list(A2BP1 = c(0.36184475, 0.47566878, 0.17455497,
0.04830471, 0.16757806), A2M = c(0.4555788, 0.7329827, 0.7528387,
0.5166676, 0.7896194), A2ML1 = c(0.6422624, 0.4938048, 0.5686839,
0.8878207, 0.5408747), A4GALT = c(0.08051388, 0.45487573, 0.37018038,
0.08881092, 0.35315243), AAAS = c(0.15013343, 0.1082752, 0.12423923,
0.11779075, 0.09234602)), .Names = c("A2BP1", "A2M", "A2ML1",
"A4GALT", "AAAS"), class = "data.frame", row.names = c("paciente6",
"paciente7", "paciente8", "paciente9", "paciente10"))
To match your desired output (values <= 0.1 replaced by empty fields) you could do:
library(dplyr)
df.met %>%
add_rownames("pacientes") %>%
mutate_each(funs(replace(., . <= 0.1, "")))
Which gives:
# Source: local data frame [5 x 6]
#
# pacientes A2BP1 A2M A2ML1 A4GALT AAAS
# 1 paciente6 0.36184475 0.4555788 0.6422624 0.15013343
# 2 paciente7 0.47566878 0.7329827 0.4938048 0.45487573 0.1082752
# 3 paciente8 0.17455497 0.7528387 0.5686839 0.37018038 0.12423923
# 4 paciente9 0.5166676 0.8878207 0.11779075
# 5 paciente10 0.16757806 0.7896194 0.5408747 0.35315243
Note: This will convert all columns to character. You should instead do:
df.met %>%
add_rownames("pacientes") %>%
mutate_each(funs(replace(., . <= 0.1, NA)))
This will preserve your initial data structure (all columns are numeric)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With