Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort dataframe in R (based on column values)

I would like to semi-reversely sort a dataframe in R based on values (character) in a column.

I have the following sample dataset:

# Sample data
df <- read.table(text="id value
                 cx-01    1
                 cx-01    2
                 cx-02    1
                 cx-02    2
                 cx-02    3
                 cx-03    1
                 cx-03    2 
                 px-01    1
                 px-01    2
                 px-02    1
                 px-02    2
                 px-02    3
                 px-03    1
                 px-03    2
                 rx-01    1
                 rx-01    2
                 rx-02    1
                 rx-02    2
                 rx-02    3
                 rx-03    1
                 rx-03    2", header=TRUE)

Expected output:

      id value
1  cx-03     2
2  cx-03     1
3  cx-02     3
4  cx-02     2
5  cx-02     1
6  cx-01     2
7  cx-01     1
8  rx-03     2
9  rx-03     1
10 rx-02     3
11 rx-02     2
12 rx-02     1
13 rx-01     2
14 rx-01     1
15 px-03     2
16 px-03     1
17 px-02     3
18 px-02     2
19 px-02     1
20 px-01     2
21 px-01     1

I tried to use base R's order() function, but sadly without succes. Furthermore, I tried to use the arrange function of the plyr package, however, I did not manage to order the data as desired.

Is it possible to sort the labels in the first column based on a self-provided sequence (so not alphabetically)?

like image 618
user213544 Avatar asked Jun 15 '26 06:06

user213544


2 Answers

Using with() and order() from base R

# sample data
df <- read.table(text="id value
                 cx-01    1
                 cx-01    2
                 cx-02    1
                 cx-02    2
                 cx-02    3
                 cx-03    1
                 cx-03    2 
                 px-01    1
                 px-01    2
                 px-02    1
                 px-02    2
                 px-02    3
                 px-03    1
                 px-03    2
                 rx-01    1
                 rx-01    2
                 rx-02    1
                 rx-02    2
                 rx-02    3
                 rx-03    1
                 rx-03    2", header=TRUE, stringsAsFactors=F)

# create another data frame with variables to order on
col.ord <- data.frame(t(sapply(strsplit(df$id, "-"), print)), df$value, stringsAsFactors = F)

# reorder data frame
df[with(col.ord, order(X1, -as.integer(X2), -df.value)), ]
#>       id value
#> 7  cx-03     2
#> 6  cx-03     1
#> 5  cx-02     3
#> 4  cx-02     2
#> 3  cx-02     1
#> 2  cx-01     2
#> 1  cx-01     1
#> 14 px-03     2
#> 13 px-03     1
#> 12 px-02     3
#> 11 px-02     2
#> 10 px-02     1
#> 9  px-01     2
#> 8  px-01     1
#> 21 rx-03     2
#> 20 rx-03     1
#> 19 rx-02     3
#> 18 rx-02     2
#> 17 rx-02     1
#> 16 rx-01     2
#> 15 rx-01     1

Created on 2019-04-27 by the reprex package (v0.2.1)

like image 149
cropgen Avatar answered Jun 17 '26 20:06

cropgen


We can arrange on the numeric and the letters part of 'id' separately, along with arranging the 'value' in descending order. The letter part seems to be custom order, so either convert to factor with levels specified or use match with a vector in the same order as the expected to get the index in that order

library(tidyverse)
df %>%  
   arrange(match(str_remove(id, "-\\d+"), c("cx", "rx", "px")), 
          readr::parse_number(as.character(id)), desc(value))
#      id value
#1  cx-03     2
#2  cx-03     1
#3  cx-02     3
#4  cx-02     2
#5  cx-02     1
#6  cx-01     2
#7  cx-01     1
#8  rx-03     2
#9  rx-03     1
#10 rx-02     3
#11 rx-02     2
#12 rx-02     1
#13 rx-01     2
#14 rx-01     1
#15 px-03     2
#16 px-03     1
#17 px-02     3
#18 px-02     2
#19 px-02     1
#20 px-01     2
#21 px-01     1
like image 34
akrun Avatar answered Jun 17 '26 21:06

akrun