Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I order a dataframe by the second column in R? [duplicate]

Tags:

dataframe

r

Possible Duplicate:
How to sort a dataframe by column(s) in R

I was just wondering if some one could help me out, I have what I thought should be a easy problem to solve.

I have the table below:

SampleID           Cluster

R0132F041p          1

R0132F127           1

R0132F064           1

R0132F068p          1

R0132F015           2

R0132F094           3

R0132F105           1

R0132F013           2

R0132F114           1

R0132F014           2

R0132F039p          3

R0132F137           1

R0132F059           1

R0132F138p          2

R0132F038p          2

and I would like to sort/order it by Cluster to get the results as below:

SampleID    Cluster

R0132F041p  1

R0132F127   1

R0132F064   1

R0132F068p  1

R0132F105   1

R0132F114   1

R0132F137   1

R0132F059   1

R0132F015   2

R0132F013   2

R0132F014   2

R0132F138p  2

R0132F038p  2

R0132F094   3

R0132F039p  3

I have tried the following R code:

data<-read.table('Table.txt', header=TRUE,row.names=1,sep='\t')

data <- data.frame(data)
data <- data[order(data$Cluster),]
write.table(data, file = 'OrderedTable.txt', append = TRUE,quote=FALSE, sep = '\t', na ='NA', dec = '.', row.names = TRUE, col.names = FALSE)

and get the following output:

1   1

2   1

3   1

4   1

5   1

6   1

7   1

8   1

9   2

10  2

11  2

12  2

13  2

14  3

15  3

Why have the SampleIDs been replaced by the numbers 1-15 and what do these numbers represent, I have read the ?order() page however this seems to explain sort.list better than order() if any one could help me out on this I would be very grateful.

like image 661
sinead Avatar asked Nov 14 '12 12:11

sinead


People also ask

How do I select the second column in R?

To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.

How do I order columns from a Dataframe in R?

Sorting Data To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.

How do I select the second column?

In the real world, you will often want to select multiple columns. Luckily, SQL makes this really easy. To select multiple columns from a table, simply separate the column names with commas!

How do I duplicate two columns in R?

The best way to replicate columns in R is by using the CBIND() function and the REP() function. First, you use the REP() function to select a column and create one or more copies. Then, you use the CBIND() function to merge the original dataset and the replicated columns into a single data frame.


2 Answers

The short answer is you did it perfectly. You just are having some difficulty with reading and writing files. Going through your code:

data<-read.table('Table.txt', header=TRUE,row.names=1,sep='\t') 

The above line is reading in your data fine, but the row.names=1 told it to use the first column as names for rows. So now your SampleIDs are row names instead of being their own column. If you type data or head(data) or str(data) immediately after running this line, this should be clear. Just omit that row.names argument and it will read properly.

data <- data.frame(data) 

You don't need this above line because read.table() produces a dataframe. You can see that with str(data) as well.

data <- data[order(data$Cluster),] 

The above line is perfect.

write.table(data, file = 'OrderedTable.txt', append = TRUE,    quote=FALSE, sep = '\t', na ='NA', dec = '.', row.names = TRUE,     col.names = FALSE) 

Here you included the argument col.names = FALSE which is why your file doesn't have column names. You also don't need/want append=TRUE. If you look at help(write.table), you see it is "only relevant if file is a character string". Here it seems to make the file write without ending the last line, which would likely cause any later read.table() to complain.

The numbers 1-15 in your result look like row numbers. You don't explain how you look at the resulting file, so I cannot be sure. You likely read your file in a way that doesn't parse the row.names and is showing row numbers instead. If you make certain your SampleIDs column does not get assigned to be names of rows, you'll probably be fine.

like image 116
MattBagg Avatar answered Sep 18 '22 21:09

MattBagg


Have a look at the arrange function of the plyr package.

arrange(data, Cluster)
write.table(data, "ordered_data.txt")
like image 40
Markus Avatar answered Sep 18 '22 21:09

Markus