Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort and filter data.frame in R?

I understand how to sort a data frame:

df[order(df$Height),]

and I understand how to filter (or subset) a data frame matching some predicate:

df[df$Weight > 120,]

but how do I sort and filter (as an example, order by Height and filter by Weight)?

like image 967
User Avatar asked Mar 16 '12 02:03

User


People also ask

How do I sort a Dataframe in R studio?

To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.

What does filter () do in R?

The filter() method in R is used to subset a data frame based on a provided condition. If a row satisfies the condition, it must produce TRUE . Otherwise, non-satisfying rows will return NA values. Hence, the row will be dropped.

What is the difference between filter and select in R?

filter() operates on rows, whereas select() operates on columns. For example, in the reprex below, I'm using the built-in mtcars dataset to illustrate using filter() to retain certain rows by a certain criterion of interest, or using select() to retain certain columns based on column names.


2 Answers

Either in two steps

 df1 <- df[df$weight > 120, ]
 df2 <- df1[order(df1$height), ]

or if you must in one step -- but it really is not any cleaner.

Data first:

R> set.seed(42)
R> df <- data.frame(weight=rnorm(10, 120, 10), height=rnorm(10, 160, 20))
R> df
   weight height
1   133.7  186.1
2   114.4  205.7
3   123.6  132.2
4   126.3  154.4
5   124.0  157.3
6   118.9  172.7
7   135.1  154.3
8   119.1  106.9
9   140.2  111.2
10  119.4  186.4

And one way of doing it is double-subsetting:

R> subset(df, weight > 120)[order(subset(df, weight > 120)$height),]
  weight height
9  140.2  111.2
3  123.6  132.2
7  135.1  154.3
4  126.3  154.4
5  124.0  157.3
1  133.7  186.1
R> 

I'd go with the two-step.

like image 113
Dirk Eddelbuettel Avatar answered Oct 05 '22 08:10

Dirk Eddelbuettel


The package data.table allows you to this in one short line of code:

Borrowing Dirk Eddelbuettel's example, set up some data:

set.seed(42)
df <- data.frame(weight=rnorm(10, 120, 10), height=rnorm(10, 160, 20))

Convert the data.frame to a data.table and subset on weight, ordering by height:

library(data.table)
dt <- data.table(df)

dt[weight>120][order(height)]

       weight   height
[1,] 140.1842 111.1907
[2,] 123.6313 132.2228
[3,] 135.1152 154.3149
[4,] 126.3286 154.4242
[5,] 124.0427 157.3336
[6,] 133.7096 186.0974
like image 34
Andrie Avatar answered Oct 05 '22 08:10

Andrie