Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using gtools::mixedsort or alternatives with dplyr::arrange

Tags:

r

dplyr

I am trying to order a dataframe by making use of dplyr::arrange. The issue is that the column I am trying to sort on contains both a fixed string followed by a number, as for instance generated by the dummycode below.

  dummydf<-data.frame(values=rnorm(100),sortcol=paste0("ABC",sample(1:100,100,replace=FALSE)))

By default, using dummydf %>% arrange(sortcol) would generate a df which is sorted alphanumerically (?) but this is of course not the desired result:

values sortcol
0.708081720    ABC1
0.041348322   ABC10
1.730962886  ABC100
0.423480861   ABC11
-1.545837266   ABC12
-1.345539947   ABC13
-0.078998792   ABC14
0.088712174   ABC15
0.670583024   ABC16
1.238837680   ABC17
-1.459044293   ABC18
-2.028535223   ABC19
0.779514385    ABC2
1.360509910   ABC20

In this example, I would like to sort the column as gtools::mixedsort would do, making sure ABC2 follows ABC1 and is not preceed by ABC1-19 and ABC100 mixedsort(as.character(dummydf$sortcol)) would do that trick.

Now, I am aware I could do this by using sub in my arrange argument: dummydf %>% arrange(as.numeric(sub("ABC","",sortcol))) but that is mainly because my string is something fixed (although any regex could be used to capture the last digits following any string I suppose).

I am just wondering: is there a more "elegant" and generic way to get this done with dplyr::arrange, in the same fashion as gtools::mixedsort?

Kind regards,

FM

like image 656
FM Kerckhof Avatar asked Sep 03 '15 14:09

FM Kerckhof


People also ask

What is the use of arrange in dplyr?

arrange.Rd arrange() orders the rows of a data frame by the values of selected columns. Unlike other dplyr verbs, arrange() largely ignores grouping; you need to explicitly mention grouping variables (or use .by_group = TRUE ) in order to group by them, and functions of variables are evaluated once per data frame, not once per group.

How to sort by more than one variable in dplyr?

With dplyr’s arrange () function we can sort by more than one variable. To sort or arrange by two variables, we specify the names of two variables as arguments to arrange () function as shown below. Note that the order matters here.

How to sort a Dataframe in R using dplyr?

The package Dplyr in R programming language provides a function called arrange () function which is useful for sorting the dataframe. arrange (.data, …) The methods given below show how this function can be used in various ways to sort a dataframe. Sorting in ascending order is the default sorting order in arrange () function.

What is dplyr in R?

dplyr, R package part of tidyverse suite of packages, provides a great set of tools to manipulate datasets in the tabular form. dplyr has a set of core functions for “data munging”,including select (),mutate (), filter (), summarise (), and arrange ().


1 Answers

Here's a functional solution making use of the mysterious identity order(order(x)) == rank(x).

mixedrank = function(x) order(gtools::mixedorder(x))
dummydf %>% dplyr::arrange(mixedrank(sortcol))
like image 144
eric_kernfeld Avatar answered Sep 29 '22 13:09

eric_kernfeld