Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregating table() over multiple columns in R without a "by" breakdown

I have a 2-column data frame of x- and y-coordinates of points. I want to generate a table of the number of occurrences of each point. Using the table() command produces a table for all possible x-y pairs. I can eliminate the extras with

fullTable <- table(coords)
smalLTable <- subset(fullTable, fullTable > 0)

And then I'm sure I could do a little something with dimnames(fullTable) to get the appropriate coordinates, but is there a better way? Something built in? Something that with

coords <- data.frame(x = c(1, 1, 2, 2, 3, 3), y = c(1, 1, 2, 1, 1, 1))

would return

x y count
1 1 2
2 1 1
2 2 1
3 1 2
like image 250
Gregor Thomas Avatar asked Sep 11 '11 16:09

Gregor Thomas


4 Answers

Using just Vanilla R, you can do

aggregate(rep(1, nrow(coords)), by = list(x = coords$x, y = coords$y), sum)
like image 66
adamleerich Avatar answered Oct 19 '22 19:10

adamleerich


Better than ddply is count:

library(plyr)
count(coords)

It's a lot faster than table for sparse 2d results too.

like image 43
hadley Avatar answered Oct 19 '22 20:10

hadley


You can use ddply from the plyr library

plyr::ddply(coords, .(x, y), summarize, count = length(x))
like image 35
Ramnath Avatar answered Oct 19 '22 21:10

Ramnath


You could also use data.table

library(data.table)
DT <- data.table(coords)
DT[,.N,by=list(x,y)]
##   x y N
## 1: 1 1 2
## 2: 2 2 1
## 3: 2 1 1
## 4: 3 1 2

See this answer for more details on the use of .N and creating frequency tables with data.table

like image 4
mnel Avatar answered Oct 19 '22 21:10

mnel