Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting one variable in a data frame by id

I have a data frame with lot of company information separated by an id variable. I want to sort one of the variables and repeat it for every id. Let's take this example,

df <- structure(list(id = c(110, 110, 110, 90, 90, 90, 90, 252, 252
), var1 = c(26, 21, 54, 10, 18, 9, 16, 54, 39), var2 = c(234, 
12, 43, 32, 21, 19, 16, 34, 44)), .Names = c("id", "var1", "var2"
), row.names = c(NA, -9L), class = "data.frame")

Which looks like this

df
   id var1 var2
1 110   26  234
2 110   21   12
3 110   54   43
4  90   10   32
5  90   18   21
6  90    9   19
7  90   16   16
8 252   54   34
9 252   39   44

Now, I want to sort the data frame according to var1 by the vector id. Easiest solution I can think of is using apply function like this,

> apply(df, 2, sort)
       id var1 var2
 [1,]  90    9   12
 [2,]  90   10   16
 [3,]  90   16   19
 [4,]  90   18   21
 [5,] 110   21   32
 [6,] 110   26   34
 [7,] 110   39   43
 [8,] 252   54   44
 [9,] 252   54  234

However, this is not the output I am seeking. The correct output should be,

   id var1 var2
1 110   21   12
2 110   26  234
3 110   54   43
4  90    9   19
5  90   10   32
6  90   16   16
7  90   18   21
8 252   39   44
9 252   54   34

Group by id and sort by var1 column and keep original id column order.

Any idea how to sort like this?

like image 541
small_lebowski Avatar asked Nov 29 '22 21:11

small_lebowski


2 Answers

Note. As mentioned by Moody_Mudskipper, there is no need to use tidyverse and can also be done easily with base R:

df[order(ordered(df$id, unique(df$id)), df$var1), ]

A one-liner tidyverse solution w/o any temp vars:

library(tidyverse)
df %>% arrange(ordered(id, unique(id)), var1)
#    id var1 var2
# 1 110   26  234
# 2 110   21   12
# 3 110   54   43
# 4  90   10   32
# 5  90   18   21
# 6  90    9   19
# 7  90   16   16
# 8 252   54   34
# 9 252   39   44

Explanation of why apply(df, 2, sort) does not work

What you were trying to do is to sort each column independently. apply runs over the specified dimension (2 in this case which corresponds to columns) and applies the function (sort in this case).

apply tries to further simplify the results, in this case to a matrix. So you are getting back a matrix (not a data.frame) where each column is sorted independently. For example this row from the apply call:

# [1,]  90    9   12

does not even exist in the original data.frame.

like image 123
thothal Avatar answered Dec 04 '22 14:12

thothal


Another base R option using order and match

df[with(df, order(match(id, unique(id)), var1, var2)), ]
#   id var1 var2
#2 110   21   12
#1 110   26  234
#3 110   54   43
#6  90    9   19
#4  90   10   32
#7  90   16   16
#5  90   18   21
#9 252   39   44
#8 252   54   34
like image 20
markus Avatar answered Dec 04 '22 14:12

markus