Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

convert a data frame into a specifically formatted frequency table

Tags:

r

I have a data.frame and I'm trying to create a frequency table that shows the frequency of values for each row. So I'm starting with something like this:

d <- data.frame(a=c(1,2,3), b=c(3,4,5), c=c(1,2,5))

which looks like this:

  a b c
  1 3 1
  2 4 2
  3 5 5

What I'd really like to create is a contingency data.frame or matrix that looks like this:

1, 2, 3, 4, 5, 6, 7, 8, 9
2, 0, 1, 0, 0, 0, 0, 0, 0
0, 2, 0, 1, 0, 0, 0, 0, 0
0, 0, 1, 0, 2, 0, 0, 0, 0

The top row is simply a label row and need not be in the final result. But I add it there for illustration. Each row shows the digits 1:9 and the number of times each digit shows up in each row of the starting data.

I can't wrap my head around an easy way to create this. Although it seems like the table() function should be helpful, I can't get it to give me any love. Any help or ideas are appreciated.

like image 694
JD Long Avatar asked Mar 15 '12 20:03

JD Long


2 Answers

Here you go:

t(apply(d, 1, tabulate, nbin=9))
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,]    2    0    1    0    0    0    0    0    0
[2,]    0    2    0    1    0    0    0    0    0
[3,]    0    0    1    0    2    0    0    0    0

(Though it probably doesn't matter in this application, tabulate() (which is used inside of the code for table()) is also nice for the impressive speed with which it performs its calculations.)


EDIT: tabulate() isn't set up to deal with 0s or negative integers. If you want another one liner that does, you could use table() though, doing something like this:

d <- data.frame(a=c(0,-1,-2), b=c(3,4,5), c=c(1,2,5))

t(apply(d, 1, function(X) table(c(X, -9:9)) - 1))
     -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9
[1,]  0  0  0  0  0  0  0  0  0 1 1 0 1 0 0 0 0 0 0
[2,]  0  0  0  0  0  0  0  0  1 0 0 1 0 1 0 0 0 0 0
[3,]  0  0  0  0  0  0  0  1  0 0 0 0 0 0 2 0 0 0 0
like image 117
Josh O'Brien Avatar answered Nov 05 '22 08:11

Josh O'Brien


another solution using table

library(reshape)
d <- data.frame(a=c(1,2,3), b=c(3,4,5), c=c(1,2,5))
d2 <- melt(d)
d2$rows <- rep(1:nrow(d), ncol(d))
table(d2$rows, d2$value)
like image 9
ilya Avatar answered Nov 05 '22 06:11

ilya