Subset rows based on a specific threshold value

Tags:

subset

I want to get a subset of the columns observations of my data frame, based on a threshold. I'll explain you the question with a little more details.

I have a data frame with the methylation level of 35 patients afected by lung adenocarcinoma. This is a subset of my data:

> df.met[1:5,1:5]
                A2BP1       A2M     A2ML1     A4GALT       AAAS
paciente6  0.36184475 0.4555788 0.6422624 0.08051388 0.15013343
paciente7  0.47566878 0.7329827 0.4938048 0.45487573 0.10827520
paciente8  0.17455497 0.7528387 0.5686839 0.37018038 0.12423923
paciente9  0.04830471 0.5166676 0.8878207 0.08881092 0.11779075
paciente10 0.16757806 0.7896194 0.5408747 0.35315243 0.09234602

Now, I need to get another object (with the same number of columns, but smaller number of rows, and diferent in every column) with a subset of the values greater than 0.1 for all the columns of the initial data frame.

My intention is to obtain a object like this one (I don't know if is possible...):

            A2BP1       A2M     A2ML1     A4GALT       AAAS
paciente6  0.36184475 0.4555788 0.6422624            0.15013343
paciente7  0.47566878 0.7329827 0.4938048 0.45487573 0.10827520
paciente8  0.17455497 0.7528387 0.5686839 0.37018038 0.12423923
paciente9             0.5166676 0.8878207            0.11779075
paciente10 0.16757806 0.7896194 0.5408747 0.35315243

In other words, I want to avoid of my data frame, the values smaller than 0.1.

Thank you so much!

591

asked Jun 21 '15 16:06

Dani

2 Answers

You may need

df.met[!rowSums(df.met <= 0.1),,drop=FALSE]
#           A2BP1       A2M     A2ML1    A4GALT      AAAS
#paciente7 0.4756688 0.7329827 0.4938048 0.4548757 0.1082752
#paciente8 0.1745550 0.7528387 0.5686839 0.3701804 0.1242392

Update

Based on the edit

is.na(df.met) <- df.met <= 0.1
df.met
#              A2BP1       A2M     A2ML1    A4GALT      AAAS
#paciente6  0.3618447 0.4555788 0.6422624        NA 0.1501334
#paciente7  0.4756688 0.7329827 0.4938048 0.4548757 0.1082752
#paciente8  0.1745550 0.7528387 0.5686839 0.3701804 0.1242392
#paciente9         NA 0.5166676 0.8878207        NA 0.1177907
#paciente10 0.1675781 0.7896194 0.5408747 0.3531524        NA

Using data.table

library(data.table)#v1.9.5+
setDT(df.met, keep.rownames=TRUE)[]

for(j in 2:ncol(df.met)){
   set(df.met, i=which(df.met[[j]] <=0.1), j=j, value=NA)
 }

 df.met
 #          rn     A2BP1       A2M     A2ML1    A4GALT      AAAS
 #1:  paciente6 0.3618447 0.4555788 0.6422624        NA 0.1501334
 #2:  paciente7 0.4756688 0.7329827 0.4938048 0.4548757 0.1082752
 #3:  paciente8 0.1745550 0.7528387 0.5686839 0.3701804 0.1242392
 #4:  paciente9        NA 0.5166676 0.8878207        NA 0.1177907
 #5: paciente10 0.1675781 0.7896194 0.5408747 0.3531524        NA

data

df.met <- structure(list(A2BP1 = c(0.36184475, 0.47566878, 0.17455497, 
0.04830471, 0.16757806), A2M = c(0.4555788, 0.7329827, 0.7528387, 
0.5166676, 0.7896194), A2ML1 = c(0.6422624, 0.4938048, 0.5686839, 
0.8878207, 0.5408747), A4GALT = c(0.08051388, 0.45487573, 0.37018038, 
0.08881092, 0.35315243), AAAS = c(0.15013343, 0.1082752, 0.12423923, 
0.11779075, 0.09234602)), .Names = c("A2BP1", "A2M", "A2ML1", 
"A4GALT", "AAAS"), class = "data.frame", row.names = c("paciente6", 
"paciente7", "paciente8", "paciente9", "paciente10"))

154

answered Oct 21 '22 15:10

akrun

To match your desired output (values <= 0.1 replaced by empty fields) you could do:

library(dplyr)
df.met %>% 
  add_rownames("pacientes") %>%
  mutate_each(funs(replace(., . <= 0.1, "")))

Which gives:

# Source: local data frame [5 x 6]
#
#    pacientes      A2BP1       A2M     A2ML1     A4GALT       AAAS
# 1  paciente6 0.36184475 0.4555788 0.6422624            0.15013343
# 2  paciente7 0.47566878 0.7329827 0.4938048 0.45487573  0.1082752
# 3  paciente8 0.17455497 0.7528387 0.5686839 0.37018038 0.12423923
# 4  paciente9            0.5166676 0.8878207            0.11779075
# 5 paciente10 0.16757806 0.7896194 0.5408747 0.35315243

Note: This will convert all columns to character. You should instead do:

df.met %>% 
  add_rownames("pacientes") %>%
  mutate_each(funs(replace(., . <= 0.1, NA)))

This will preserve your initial data structure (all columns are numeric)

answered Oct 21 '22 13:10

Steven Beaupré

Related questions
                            
                                How to change line properties in ggplot2 halfway in a time series?
                            
                                R: Count objects in column-list
                            
                                Multiplying column value by another value depending on value in certain column R
                            
                                Group categories in R according to first letters of a string?
                            
                                How do I fit distributions to sample data in R?
                            
                                Apply parentheses around elements of R dataframe
                            
                                Plot colour coded world map using ggplot2
                            
                                How to replace a value in a data frame in R?
                            
                                Data frame no longer a data frame once element is removed [duplicate]
                            
                                Specific spaces between bars in a barplot - ggplot2 - R
                            
                                Access name of .rmd file and use in R
                            
                                Make return from S3 indexing function "[" invisible
                            
                                Why does my ESS R session fall back to C locale?
                            
                                Alternatives for for loops in R?
                            
                                Pass multiple variables and greek letters to ggtitle
                            
                                making a from, to network in three column data frame in r
                            
                                Reordering rows in a data.frame?
                            
                                R Shiny list2env
                            
                                Web scraping the make/model/year of VIN numbers in RStudio
                            
                                R - two data frame columns to list of key-value pairs

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With