Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple frequency tables using data.table

Tags:

r

data.table

I'm looking for a way to do simple aggregates / counts via data.table.

Consider the iris data, which has 50 observations per species. To count the observations per species I have to summaries over a column other than species, for example "Sepal.Length".

library(data.table) dt = as.data.table(iris) dt[,length(Sepal.Length), Species] 

I find this confusing because it looks like I'm doing something on Sepal.Length at first glance, when really it's only Species that matters.

This is what I would prefer to say, but I don't get valid output:

dt[,length(Species), Species] 

Correct input and output, but clunky code:

> dt[,length(Sepal.Length), Species] Species V1 1:     setosa 50 2: versicolor 50 3:  virginica 50 

Incorrect input and output, but nicer code:

> dt[,length(Species), Species] Species V1 1:     setosa  1 2: versicolor  1 3:  virginica  1 

Is there an elegant way around this?

like image 282
geneorama Avatar asked Aug 31 '12 04:08

geneorama


People also ask

What is a simple frequency table?

What is a Frequency Table? A frequency table lists a set of values and how often each one appears. Frequency is the number of times a specific data value occurs in your dataset. These tables help you understand which data values are common and which are rare.


1 Answers

data.table has a couple of symbols that can be used within the j expression. Notably

  • .N will give you the number of number of rows in each group.

see ?data.table under the details for by

Advanced: When grouping by by or by i, symbols .SD, .BY and .N may be used in the j expression, defined as follows.

....

.N is an integer, length 1, containing the number of rows in the group.

For example:

dt[, .N ,by = Species]       Species  N 1:     setosa 50 2: versicolor 50 3:  virginica 50 
like image 171
mnel Avatar answered Sep 26 '22 11:09

mnel