I have a data.frame that looks like this: <img src="https://i.stack.imgur.com/9AbPm.jpg" alt="enter image description here"> which has 1000+ columns with similar names. And I have a vector of those column names that looks like this: <img src="https://i.stack.imgur.com/H83J6.jpg" alt="enter image description here"> The vector is sorted by the cluster_id (which goes up to 11). I want to sort the columns in the data frame such that the columns are in the order of the names in the vector. A simple example of what I want is that: Data: <pre class="prettyprint"><code> A B C 1 2 3 4 5 6 </code></pre> Vector: c("B","C","A") Sorted: <pre class="prettyprint"><code> B C A 2 3 1 5 6 4 </code></pre> Is there a fast way to do this?

UPDATE, with reproducible data added by OP: <pre class="prettyprint"><code>df <- read.table(h=T, text="A B C 1 2 3 4 5 6") vec <- c("B", "C", "A") df[vec] </code></pre> Results in: <pre class="prettyprint"><code> B C A 1 2 3 1 2 5 6 4 </code></pre> As OP desires. <hr> How about: <pre class="prettyprint"><code>df[df.clust$mutation_id] </code></pre> Where <code>df</code> is the data.frame you want to sort the columns of and <code>df.clust</code> is the data frame that contains the vector with the column order (<code>mutation_id</code>). This basically treats <code>df</code> as a list and uses standard vector indexing techniques to re-order it.

Brodie's answer does exactly what you're asking for. However, you imply that your data are large, so I will provide an alternative using "data.table", which has a function called <code>setcolorder</code> that will change the column order by reference. Here's a reproducible example. Start with some simple data: <pre class="prettyprint"><code>mydf <- data.frame(A = 1:2, B = 3:4, C = 5:6) matches <- data.frame(X = 1:3, Y = c("C", "A", "B"), Z = 4:6) mydf # A B C # 1 1 3 5 # 2 2 4 6 matches # X Y Z # 1 1 C 4 # 2 2 A 5 # 3 3 B 6 </code></pre> Provide proof that Brodie's answer works: <pre class="prettyprint"><code>out <- mydf[matches$Y] out # C A B # 1 5 1 3 # 2 6 2 4 </code></pre> Show a more memory efficient way to do the same thing. <pre class="prettyprint"><code>library(data.table) setDT(mydf) mydf # A B C # 1: 1 3 5 # 2: 2 4 6 setcolorder(mydf, as.character(matches$Y)) mydf # C A B # 1: 5 1 3 # 2: 6 2 4 </code></pre>

R: Sort columns of a data frame by a vector of column names

Tags:

sorting

dataframe

r

vector

I have a data.frame that looks like this: enter image description here

which has 1000+ columns with similar names.

And I have a vector of those column names that looks like this: enter image description here

The vector is sorted by the cluster_id (which goes up to 11).

I want to sort the columns in the data frame such that the columns are in the order of the names in the vector.

A simple example of what I want is that:

Data:

 A    B    C
 1    2    3
 4    5    6

Vector: c("B","C","A")

Sorted:

 B    C    A
 2    3    1
 5    6    4

Is there a fast way to do this?

631

asked Apr 17 '14 19:04

TYZ

Video Answer

2 Answers

UPDATE, with reproducible data added by OP:

df <- read.table(h=T, text="A    B    C
    1    2    3
    4    5    6")
vec <- c("B", "C", "A")
df[vec]

Results in:

  B C A
1 2 3 1
2 5 6 4

As OP desires.

How about:

df[df.clust$mutation_id]

Where df is the data.frame you want to sort the columns of and df.clust is the data frame that contains the vector with the column order (mutation_id).

This basically treats df as a list and uses standard vector indexing techniques to re-order it.

answered Oct 11 '22 17:10

BrodieG

Brodie's answer does exactly what you're asking for. However, you imply that your data are large, so I will provide an alternative using "data.table", which has a function called setcolorder that will change the column order by reference.

Here's a reproducible example.

Start with some simple data:

mydf <- data.frame(A = 1:2, B = 3:4, C = 5:6)
matches <- data.frame(X = 1:3, Y = c("C", "A", "B"), Z = 4:6)
mydf
#   A B C
# 1 1 3 5
# 2 2 4 6
matches
#   X Y Z
# 1 1 C 4
# 2 2 A 5
# 3 3 B 6

Provide proof that Brodie's answer works:

out <- mydf[matches$Y]
out
#   C A B
# 1 5 1 3
# 2 6 2 4

Show a more memory efficient way to do the same thing.

library(data.table)
setDT(mydf)
mydf
#    A B C
# 1: 1 3 5
# 2: 2 4 6

setcolorder(mydf, as.character(matches$Y))
mydf
#    C A B
# 1: 5 1 3
# 2: 6 2 4

answered Oct 11 '22 17:10

A5C1D2H2I1M1N2O1R2T1

Related questions
                            
                                Convert a column in R data frame to lower case
                            
                                add_column in tibble with variable column name
                            
                                Match vectors in sequence
                            
                                Move a column conveniently
                            
                                R - How to one hot encoding a single column while keep other columns still?
                            
                                How to keep dropping the first value, until the sum of the vector is less than 20?
                            
                                how to remove partial duplicates from a data frame?
                            
                                How to make assertions in R?
                            
                                ggplot2: How to adjust fill colour in a boxplot (and change legend text)?
                            
                                R obtaining rownames date using quantmod
                            
                                How do you draw a line across a multiple-figure environment in R?
                            
                                set upper limit in ggplot to include label greater than the maximum value
                            
                                Block bootstrap from subject list
                            
                                How do I create binned factor variables from a continuous variable, with custom breaks?
                            
                                Subscripts in R when adding other text
                            
                                Coloring points in a pairs plot
                            
                                How to recreate same DocumentTermMatrix with new (test) data
                            
                                Does calculating correlation between two dataframes require a loop?
                            
                                tm: read in data frame, keep text id's, construct DTM and join to other dataset
                            
                                ggplot2 annotate layer position in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With