Perl (or R, or SQL): Count how often string appears across columns

Question

I have a text file that looks like this:

gene1   gene2   gene3
a       d       c
b       e       d
c       f       g
d       g       
        h
        i

(Each column is a human gene, and each contains a variable number of proteins (strings, shown as letters here) that can bind to those genes).

What I want to do is count how many columns each string is represented in, output that number and all the column headers, like this:

a   1   gene1
b   1   gene1
c   2   gene1 gene3
d   3   gene1 gene2 gene3
e   1   gene2
f   1   gene2
g   2   gene2 gene3
h   1   gene2
i   1   gene2

I have been trying to figure out how to do this in Perl and R, but without success so far. Thanks for any help.

Chase · Accepted Answer

This solution seems like a bit of a hack, but it gives the desired output. It relies on using both plyr and reshape packages, though I'm sure you could find base R alternatives. The trick is that function melt lets us flatten the data out into a long format, which allows for easy(ish) manipulation from that point forward.

library(reshape)
library(plyr)

#Recreate your data
dat <- data.frame(gene1 = c(letters[1:4], NA, NA),
                  gene2 = letters[4:9],
                  gene3 = c("c", "d", "g", NA, NA, NA)
                  )

#Melt the data. You'll need to update this if you have more columns
dat.m <- melt(dat, measure.vars = 1:3)

#Tabulate counts
counts <- as.data.frame(table(dat.m$value))

#I'm not sure what to call this column since it's a smooshing of column names
otherColumn <- ddply(dat.m, "value", function(x) paste(x$variable, collapse = " "))

#Merge the two together. You could fix the column names above, or just deal with it here
merge(counts, otherColumn, by.x = "Var1", by.y = "value")

Gives:

> merge(counts, otherColumn, by.x = "Var1", by.y = "value")
  Var1 Freq                V1
1    a    1             gene1
2    b    1             gene1
3    c    2       gene1 gene3
4    d    3 gene1 gene2 gene3
....

Perl (or R, or SQL): Count how often string appears across columns

Tags:

string

mysql

r

perl

Stephen Turner

1 Answers

Chase

Recent Activity

Donate For Us

Perl (or R, or SQL): Count how often string appears across columns

Tags:

string

mysql

r

perl

Stephen Turner

1 Answers

Chase

Related questions

Recent Activity

Donate For Us