Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

subset rows with (1) ALL and (2) ANY columns larger than a specific value

Tags:

r

r-faq

subset

I have a data frame with an id column and some (potentially many) columns with values, here 'v1', 'v2':

df <- data.frame(id = c(1:5), v1 = c(0,15,9,12,7), v2 = c(9,32,6,17,11))
#   id v1 v2
# 1  1  0  9
# 2  2 15 32
# 3  3  9  6
# 4  4 12 17
# 5  5  7 11
  1. How can I extract rows where ALL values are larger than a certain value, say 10, which should return:

    #   id v1 v2
    # 2  2 15 32
    # 4  4 12 17
    
  2. How can I extract rows with ANY (at least one) value is larger than 10:

    #   id v1 v2
    # 2  2 15 32
    # 4  4 12 17
    # 5  5  7 11
    
like image 275
Rock Avatar asked Mar 24 '12 23:03

Rock


People also ask

What is subset of rows?

A Row Subset is a selection of the rows within a whole table being viewed within the application, or equivalently a new table composed from some subset of its rows. You can define these and use them in several different ways; the usefulness comes from defining them in one context and using them in another.

How do you select rows from a DataFrame based on column values in R?

By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.

How do I remove a row from a specific value in R?

To remove rows with an in R we can use the na. omit() and <code>drop_na()</code> (tidyr) functions.


1 Answers

See functions all() and any() for the first and second parts of your questions respectively. The apply() function can be used to run functions over rows or columns. (MARGIN = 1 is rows, MARGIN = 2 is columns, etc). Note I use apply() on df[, -1] to ignore the id variable when doing the comparisons.

Part 1:

> df <- data.frame(id=c(1:5), v1=c(0,15,9,12,7), v2=c(9,32,6,17,11))
> df[apply(df[, -1], MARGIN = 1, function(x) all(x > 10)), ]
  id v1 v2
2  2 15 32
4  4 12 17

Part 2:

> df[apply(df[, -1], MARGIN = 1, function(x) any(x > 10)), ]
  id v1 v2
2  2 15 32
4  4 12 17
5  5  7 11

To see what is going on, x > 10 returns a logical vector for each row (via apply() indicating whether each element is greater than 10. all() returns TRUE if all element of the input vector are TRUE and FALSE otherwise. any() returns TRUE if any of the elements in the input is TRUE and FALSE if all are FALSE.

I then use the logical vector resulting from the apply() call

> apply(df[, -1], MARGIN = 1, function(x) all(x > 10))
[1] FALSE  TRUE FALSE  TRUE FALSE
> apply(df[, -1], MARGIN = 1, function(x) any(x > 10))
[1] FALSE  TRUE FALSE  TRUE  TRUE

to subset df (as shown above).

like image 200
Gavin Simpson Avatar answered Sep 19 '22 03:09

Gavin Simpson