Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a "flipped" table in to a data.frame correctly

Tags:

dataframe

r

I have a tab-delimited file that looks like this:

AG-AG   AG-CA   AT-AA   AT-AC   AT-AG   ...
0.0142180094786 0.009478672985781   0.0142180094786 0.4218009478672 ...

When I read this into R using read.table i get:

nc.tab <- read.table("./percent_splice_pair.tab", sep="\t", header=TRUE)
     AG.AG       AG.CA      AT.AA     AT.AC      AT.AG   ... 
1 0.01421801 0.009478673 0.01421801 0.4218009 0.03317536 ...

This feels somewhat awkward for me, because I am much more used to working with data if its like this:

splice.pair    counts
AG.AG          0.01421801
AG.CA          0.009478673
AT.AA          0.01421801
AT.AG          0.03317536
...            ...

so far, my attempts at trying to coerce the table into a data frame like this (using data.frame()) have caused very odd results. I can't work out how to get each row of the table I have as a simple list, which I can then use as columns for the data frame. colnames(nc.tab) works for the headers but things like nc.tab[1,] just give me the table + headers again. Am I missing something obvious?

--edit--

Whilst @Andrie's answer gave me the data.frame I needed, I had to do a bit of extra work to coerse the counts values into numeric values so that they would work correctly in ggplot:

nc.tab <- read.table("./percent_splice_pair.tab", header=FALSE, sep="\t")
nc.mat <- t(as.matrix(nc.tab))
sp <- as.character(nc.tab[,2])
c <- as.numeric(as.character(nc.tab[,2]))
nc.dat <- data.frame(Splice.Pair=sp, count=c)

   Splice.Pair       count
1        AG-AG 0.014218009
2        AG-CA 0.009478673
3        AT-AA 0.014218009
4        AT-AC 0.421800948
5        AT-AG 0.033175355
like image 464
MattLBeck Avatar asked Aug 12 '11 11:08

MattLBeck


1 Answers

You need the following to read and reshape your data in the way you want:

  • use read.table with the parameter header=FALSE
  • then transpose the data with the function t()
  • rename the columns

Here is the code:

x <- read.table(..., header=FALSE)
df <- as.data.frame(t(x))
names(df) <- c("splice.pair", "counts")
df

   splice.pair            counts
V1       AG-AG   0.0142180094786
V2       AG-CA 0.009478672985781
V3       AT-AA   0.0142180094786
V4       AT-AC   0.4218009478672
like image 115
Andrie Avatar answered Oct 12 '22 10:10

Andrie