Possible Duplicate:
How to sort a dataframe by column(s) in R
I was just wondering if some one could help me out, I have what I thought should be a easy problem to solve.
I have the table below:
SampleID Cluster
R0132F041p 1
R0132F127 1
R0132F064 1
R0132F068p 1
R0132F015 2
R0132F094 3
R0132F105 1
R0132F013 2
R0132F114 1
R0132F014 2
R0132F039p 3
R0132F137 1
R0132F059 1
R0132F138p 2
R0132F038p 2
and I would like to sort/order it by Cluster to get the results as below:
SampleID Cluster
R0132F041p 1
R0132F127 1
R0132F064 1
R0132F068p 1
R0132F105 1
R0132F114 1
R0132F137 1
R0132F059 1
R0132F015 2
R0132F013 2
R0132F014 2
R0132F138p 2
R0132F038p 2
R0132F094 3
R0132F039p 3
I have tried the following R code:
data<-read.table('Table.txt', header=TRUE,row.names=1,sep='\t')
data <- data.frame(data)
data <- data[order(data$Cluster),]
write.table(data, file = 'OrderedTable.txt', append = TRUE,quote=FALSE, sep = '\t', na ='NA', dec = '.', row.names = TRUE, col.names = FALSE)
and get the following output:
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 2
10 2
11 2
12 2
13 2
14 3
15 3
Why have the SampleIDs been replaced by the numbers 1-15 and what do these numbers represent, I have read the ?order()
page however this seems to explain sort.list better than order() if any one could help me out on this I would be very grateful.
To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.
Sorting Data To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.
In the real world, you will often want to select multiple columns. Luckily, SQL makes this really easy. To select multiple columns from a table, simply separate the column names with commas!
The best way to replicate columns in R is by using the CBIND() function and the REP() function. First, you use the REP() function to select a column and create one or more copies. Then, you use the CBIND() function to merge the original dataset and the replicated columns into a single data frame.
The short answer is you did it perfectly. You just are having some difficulty with reading and writing files. Going through your code:
data<-read.table('Table.txt', header=TRUE,row.names=1,sep='\t')
The above line is reading in your data fine, but the row.names=1
told it to use the first column as names for rows. So now your SampleIDs are row names instead of being their own column. If you type data
or head(data)
or str(data)
immediately after running this line, this should be clear. Just omit that row.names argument and it will read properly.
data <- data.frame(data)
You don't need this above line because read.table()
produces a dataframe. You can see that with str(data)
as well.
data <- data[order(data$Cluster),]
The above line is perfect.
write.table(data, file = 'OrderedTable.txt', append = TRUE, quote=FALSE, sep = '\t', na ='NA', dec = '.', row.names = TRUE, col.names = FALSE)
Here you included the argument col.names = FALSE
which is why your file doesn't have column names. You also don't need/want append=TRUE
. If you look at help(write.table)
, you see it is "only relevant if file is a character string". Here it seems to make the file write without ending the last line, which would likely cause any later read.table()
to complain.
The numbers 1-15 in your result look like row numbers. You don't explain how you look at the resulting file, so I cannot be sure. You likely read your file in a way that doesn't parse the row.names and is showing row numbers instead. If you make certain your SampleIDs column does not get assigned to be names of rows, you'll probably be fine.
Have a look at the arrange
function of the plyr
package.
arrange(data, Cluster)
write.table(data, "ordered_data.txt")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With