Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Sort columns of a data frame by a vector of column names

I have a data.frame that looks like this: enter image description here

which has 1000+ columns with similar names.

And I have a vector of those column names that looks like this: enter image description here

The vector is sorted by the cluster_id (which goes up to 11).

I want to sort the columns in the data frame such that the columns are in the order of the names in the vector.

A simple example of what I want is that:

Data:

 A    B    C
 1    2    3
 4    5    6

Vector: c("B","C","A")

Sorted:

 B    C    A
 2    3    1
 5    6    4

Is there a fast way to do this?

like image 631
TYZ Avatar asked Apr 17 '14 19:04

TYZ


People also ask

How to sort a data frame according to a vector in R?

Alternatively to the R code of Example 1, we can also use the functions of the dplyr package to sort our data frame according to a vector. Now, we can use the left_join function to order our data frame rows according to the values in our vector:

How do you sort a column by Index in a Dataframe?

Sorting by Column Index. Similar to the above method, it’s also possible to sort based on the numeric index of a column in the data frame, rather than the specific name. Instead of using the with() function, we can simply pass the order() function to our dataframe.

How to sort by the column of index 1 in R?

We indicate that we want to sort by the column of index 1 by using the dataframe [,1] syntax, which causes R to return the levels (names) of that index 1 column. In other words, similar to when we passed in the z vector name above, order is sorting based on the vector values that are within column of index 1:

How to sort by multiple columns using vector names?

In some cases, it may be desired to sort by multiple columns. Thankfully, doing so is very simple with the previously described methods. To sort multiple columns using vector names, simply add additional arguments to the order () function call as before:


Video Answer


2 Answers

UPDATE, with reproducible data added by OP:

df <- read.table(h=T, text="A    B    C
    1    2    3
    4    5    6")
vec <- c("B", "C", "A")
df[vec]

Results in:

  B C A
1 2 3 1
2 5 6 4

As OP desires.


How about:

df[df.clust$mutation_id]

Where df is the data.frame you want to sort the columns of and df.clust is the data frame that contains the vector with the column order (mutation_id).

This basically treats df as a list and uses standard vector indexing techniques to re-order it.

like image 95
BrodieG Avatar answered Oct 11 '22 17:10

BrodieG


Brodie's answer does exactly what you're asking for. However, you imply that your data are large, so I will provide an alternative using "data.table", which has a function called setcolorder that will change the column order by reference.

Here's a reproducible example.

Start with some simple data:

mydf <- data.frame(A = 1:2, B = 3:4, C = 5:6)
matches <- data.frame(X = 1:3, Y = c("C", "A", "B"), Z = 4:6)
mydf
#   A B C
# 1 1 3 5
# 2 2 4 6
matches
#   X Y Z
# 1 1 C 4
# 2 2 A 5
# 3 3 B 6

Provide proof that Brodie's answer works:

out <- mydf[matches$Y]
out
#   C A B
# 1 5 1 3
# 2 6 2 4

Show a more memory efficient way to do the same thing.

library(data.table)
setDT(mydf)
mydf
#    A B C
# 1: 1 3 5
# 2: 2 4 6

setcolorder(mydf, as.character(matches$Y))
mydf
#    C A B
# 1: 5 1 3
# 2: 6 2 4
like image 25
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 11 '22 17:10

A5C1D2H2I1M1N2O1R2T1