Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find number of unique combinations in data frame and Number of observations in each combination

Tags:

r

combinations

This question follows from a previous question. Instead of having two columns, what if we have three or more columns? Consider the following data.

x <- c(600, 600, 600, 600, 600, 600, 600, 600, 600, 800, 800, 800, 800, 800, 800, 800, 800, 800,
       600, 600, 600, 600, 600, 600, 600, 600, 600, 800, 800, 800, 800, 800, 800, 800, 800, 800,
       600, 600, 600, 600, 600, 600, 600, 600, 600, 800, 800, 800, 800, 800, 800, 800, 800, 800)

y <- c(1,  1,  1,  1,  1,  1,  1, 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
       80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80,
       3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3)

z <- c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3,
       1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3,
       1, 2, 3, 1, 2, 3)

xyz <- data.frame(cbind(x, y, z))

If we treat all columns as factor with finite number of levels. What I want to get is the number of observations in each unique combination of x, y and z. The answer is 18 unique combinations with 3 observations in each combination. How can I do this in R, please? Thank you!

like image 566
LaTeXFan Avatar asked Jan 19 '26 04:01

LaTeXFan


2 Answers

Using table or tabulate with interaction

tabulate(with(xyz, interaction(x,y,z)))

table(with(xyz, interaction(x,y,z)))

or split by the interaction and use lengths,

lengths(split(xyz, with(xyz, interaction(x,y,z))))

or

aggregate(seq_along(x)~ x+y+z, data=xyz, FUN=length)
like image 80
Rorschach Avatar answered Jan 20 '26 19:01

Rorschach


An option using data.table. We convert the 'data.frame' to 'data.table' (setDT(xyz), grouped by the columns of 'xyz', get the number of elements in each group (.N)

library(data.table)
setDT(xyz)[, .N, names(xyz)]$N
#[1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

Or with dplyr, we group by the columns, get the number of elements (n()) using summarise.

library(dplyr)
xyz %>%
    group_by_(.dots=names(xyz)) %>%
    summarise(n=n()) %>%
    .$n
#[1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
like image 22
akrun Avatar answered Jan 20 '26 18:01

akrun