I am trying to order a dataframe by making use of dplyr::arrange
. The issue is that the column I am trying to sort on contains both a fixed string followed by a number, as for instance generated by the dummycode below.
dummydf<-data.frame(values=rnorm(100),sortcol=paste0("ABC",sample(1:100,100,replace=FALSE)))
By default, using dummydf %>% arrange(sortcol)
would generate a df which is sorted alphanumerically (?) but this is of course not the desired result:
values sortcol
0.708081720 ABC1
0.041348322 ABC10
1.730962886 ABC100
0.423480861 ABC11
-1.545837266 ABC12
-1.345539947 ABC13
-0.078998792 ABC14
0.088712174 ABC15
0.670583024 ABC16
1.238837680 ABC17
-1.459044293 ABC18
-2.028535223 ABC19
0.779514385 ABC2
1.360509910 ABC20
In this example, I would like to sort the column as gtools::mixedsort
would do, making sure ABC2 follows ABC1 and is not preceed by ABC1-19 and ABC100 mixedsort(as.character(dummydf$sortcol))
would do that trick.
Now, I am aware I could do this by using sub
in my arrange
argument: dummydf %>% arrange(as.numeric(sub("ABC","",sortcol)))
but that is mainly because my string is something fixed (although any regex could be used to capture the last digits following any string I suppose).
I am just wondering: is there a more "elegant" and generic way to get this done with dplyr::arrange
, in the same fashion as gtools::mixedsort
?
Kind regards,
FM
arrange.Rd arrange() orders the rows of a data frame by the values of selected columns. Unlike other dplyr verbs, arrange() largely ignores grouping; you need to explicitly mention grouping variables (or use .by_group = TRUE ) in order to group by them, and functions of variables are evaluated once per data frame, not once per group.
With dplyr’s arrange () function we can sort by more than one variable. To sort or arrange by two variables, we specify the names of two variables as arguments to arrange () function as shown below. Note that the order matters here.
The package Dplyr in R programming language provides a function called arrange () function which is useful for sorting the dataframe. arrange (.data, …) The methods given below show how this function can be used in various ways to sort a dataframe. Sorting in ascending order is the default sorting order in arrange () function.
dplyr, R package part of tidyverse suite of packages, provides a great set of tools to manipulate datasets in the tabular form. dplyr has a set of core functions for “data munging”,including select (),mutate (), filter (), summarise (), and arrange ().
Here's a functional solution making use of the mysterious identity order(order(x)) == rank(x)
.
mixedrank = function(x) order(gtools::mixedorder(x))
dummydf %>% dplyr::arrange(mixedrank(sortcol))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With