I have a data frame with an id column and some (potentially many) columns with values, here 'v1', 'v2': <pre class="prettyprint"><code>df <- data.frame(id = c(1:5), v1 = c(0,15,9,12,7), v2 = c(9,32,6,17,11)) # id v1 v2 # 1 1 0 9 # 2 2 15 32 # 3 3 9 6 # 4 4 12 17 # 5 5 7 11 </code></pre> <ol> <li> How can I extract rows where ALL values are larger than a certain value, say 10, which should return: <pre class="prettyprint"><code># id v1 v2 # 2 2 15 32 # 4 4 12 17 </code></pre> </li> <li> How can I extract rows with ANY (at least one) value is larger than 10: <pre class="prettyprint"><code># id v1 v2 # 2 2 15 32 # 4 4 12 17 # 5 5 7 11 </code></pre> </li> </ol>

See functions <code>all()</code> and <code>any()</code> for the first and second parts of your questions respectively. The <code>apply()</code> function can be used to run functions over rows or columns. (<code>MARGIN = 1</code> is rows, <code>MARGIN = 2</code> is columns, etc). Note I use <code>apply()</code> on <code>df[, -1]</code> to ignore the <code>id</code> variable when doing the comparisons. Part 1: <pre class="prettyprint"><code>> df <- data.frame(id=c(1:5), v1=c(0,15,9,12,7), v2=c(9,32,6,17,11)) > df[apply(df[, -1], MARGIN = 1, function(x) all(x > 10)), ] id v1 v2 2 2 15 32 4 4 12 17 </code></pre> Part 2: <pre class="prettyprint"><code>> df[apply(df[, -1], MARGIN = 1, function(x) any(x > 10)), ] id v1 v2 2 2 15 32 4 4 12 17 5 5 7 11 </code></pre> To see what is going on, <code>x > 10</code> returns a logical vector for each row (via <code>apply()</code> indicating whether each element is greater than 10. <code>all()</code> returns <code>TRUE</code> if all element of the input vector are <code>TRUE</code> and <code>FALSE</code> otherwise. <code>any()</code> returns <code>TRUE</code> if any of the elements in the input is <code>TRUE</code> and <code>FALSE</code> if all are <code>FALSE</code>. I then use the logical vector resulting from the <code>apply()</code> call <pre class="prettyprint"><code>> apply(df[, -1], MARGIN = 1, function(x) all(x > 10)) [1] FALSE TRUE FALSE TRUE FALSE > apply(df[, -1], MARGIN = 1, function(x) any(x > 10)) [1] FALSE TRUE FALSE TRUE TRUE </code></pre> to subset <code>df</code> (as shown above).

subset rows with (1) ALL and (2) ANY columns larger than a specific value

Tags:

r

r-faq

subset

I have a data frame with an id column and some (potentially many) columns with values, here 'v1', 'v2':

df <- data.frame(id = c(1:5), v1 = c(0,15,9,12,7), v2 = c(9,32,6,17,11))
#   id v1 v2
# 1  1  0  9
# 2  2 15 32
# 3  3  9  6
# 4  4 12 17
# 5  5  7 11

How can I extract rows where ALL values are larger than a certain value, say 10, which should return:
```
#   id v1 v2
# 2  2 15 32
# 4  4 12 17
```
How can I extract rows with ANY (at least one) value is larger than 10:
```
#   id v1 v2
# 2  2 15 32
# 4  4 12 17
# 5  5  7 11
```

275

asked Mar 24 '12 23:03

Rock

1 Answers

See functions all() and any() for the first and second parts of your questions respectively. The apply() function can be used to run functions over rows or columns. (MARGIN = 1 is rows, MARGIN = 2 is columns, etc). Note I use apply() on df[, -1] to ignore the id variable when doing the comparisons.

Part 1:

> df <- data.frame(id=c(1:5), v1=c(0,15,9,12,7), v2=c(9,32,6,17,11))
> df[apply(df[, -1], MARGIN = 1, function(x) all(x > 10)), ]
  id v1 v2
2  2 15 32
4  4 12 17

Part 2:

> df[apply(df[, -1], MARGIN = 1, function(x) any(x > 10)), ]
  id v1 v2
2  2 15 32
4  4 12 17
5  5  7 11

To see what is going on, x > 10 returns a logical vector for each row (via apply() indicating whether each element is greater than 10. all() returns TRUE if all element of the input vector are TRUE and FALSE otherwise. any() returns TRUE if any of the elements in the input is TRUE and FALSE if all are FALSE.

I then use the logical vector resulting from the apply() call

> apply(df[, -1], MARGIN = 1, function(x) all(x > 10))
[1] FALSE  TRUE FALSE  TRUE FALSE
> apply(df[, -1], MARGIN = 1, function(x) any(x > 10))
[1] FALSE  TRUE FALSE  TRUE  TRUE

to subset df (as shown above).

200

answered Sep 19 '22 03:09

Gavin Simpson

Related questions
                            
                                How to get the confidence intervals for LOWESS fit using R?
                            
                                R shiny Observe running Before loading of UI and this causes Null parameters
                            
                                Converting date in Year.decimal form in R
                            
                                How to convert time difference into minutes in R?
                            
                                Replicating rows in data.table by column value
                            
                                Convert a list into a string
                            
                                Collapsing / hiding figures in R markdown
                            
                                How to stop bookdown tables from floating to bottom of the page in pdf?
                            
                                Why does as.factor return a character when used inside apply?
                            
                                read.csv row.names
                            
                                How to create a KML file using R
                            
                                SI prefixes in ggplot2 axis labels
                            
                                Combine two data frames with the same column names
                            
                                Mutating multiple columns in a data frame using dplyr
                            
                                R DBI ODBC error: nanodbc/nanodbc.cpp:3110: 07009: [Microsoft][ODBC Driver 13 for SQL Server]Invalid Descriptor Index
                            
                                Library/tool for drawing ternary/triangle plots [closed]
                            
                                Installing all CRAN packages that are not already installed?
                            
                                Overlay data onto background image
                            
                                R: use of factor
                            
                                Make scale_y_log10 to have the tickmarks at 0.01,0.1,1

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With