I have a data frame with an id column and some (potentially many) columns with values, here 'v1', 'v2':
df <- data.frame(id = c(1:5), v1 = c(0,15,9,12,7), v2 = c(9,32,6,17,11))
# id v1 v2
# 1 1 0 9
# 2 2 15 32
# 3 3 9 6
# 4 4 12 17
# 5 5 7 11
How can I extract rows where ALL values are larger than a certain value, say 10, which should return:
# id v1 v2
# 2 2 15 32
# 4 4 12 17
How can I extract rows with ANY (at least one) value is larger than 10:
# id v1 v2
# 2 2 15 32
# 4 4 12 17
# 5 5 7 11
A Row Subset is a selection of the rows within a whole table being viewed within the application, or equivalently a new table composed from some subset of its rows. You can define these and use them in several different ways; the usefulness comes from defining them in one context and using them in another.
By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.
To remove rows with an in R we can use the na. omit() and <code>drop_na()</code> (tidyr) functions.
See functions all()
and any()
for the first and second parts of your questions respectively. The apply()
function can be used to run functions over rows or columns. (MARGIN = 1
is rows, MARGIN = 2
is columns, etc). Note I use apply()
on df[, -1]
to ignore the id
variable when doing the comparisons.
Part 1:
> df <- data.frame(id=c(1:5), v1=c(0,15,9,12,7), v2=c(9,32,6,17,11))
> df[apply(df[, -1], MARGIN = 1, function(x) all(x > 10)), ]
id v1 v2
2 2 15 32
4 4 12 17
Part 2:
> df[apply(df[, -1], MARGIN = 1, function(x) any(x > 10)), ]
id v1 v2
2 2 15 32
4 4 12 17
5 5 7 11
To see what is going on, x > 10
returns a logical vector for each row (via apply()
indicating whether each element is greater than 10. all()
returns TRUE
if all element of the input vector are TRUE
and FALSE
otherwise. any()
returns TRUE
if any of the elements in the input is TRUE
and FALSE
if all are FALSE
.
I then use the logical vector resulting from the apply()
call
> apply(df[, -1], MARGIN = 1, function(x) all(x > 10))
[1] FALSE TRUE FALSE TRUE FALSE
> apply(df[, -1], MARGIN = 1, function(x) any(x > 10))
[1] FALSE TRUE FALSE TRUE TRUE
to subset df
(as shown above).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With