Count factors occurring in group in R

Question

This is my data:

> head(Kandula_for_n)
                date      dist  date_only
1 2005-05-08 12:00:00  138.5861 2005-05-08
2 2005-05-08 16:00:00 1166.9265 2005-05-08
3 2005-05-08 20:00:00 1270.7149 2005-05-08
6 2005-05-09 08:00:00  233.1971 2005-05-09
7 2005-05-09 12:00:00 1899.9530 2005-05-09
8 2005-05-09 16:00:00  726.8363 2005-05-09

I would now like to have an additional column with the count (n) of the data entries (dist) per day. For 2005-05-08, this would be n=3 as there are 3 data entries at 12, 16 and 20 o'clock. I have applied the following code which actually gave me want I wanted:

ndist <-tapply(1:NROW(Kandula_for_n), Kandula_for_n$date_only, function(x) length(unique(x)))

After ndist<-as.data.frame(ndist), I got this:

> head(ndist)
           ndist
2005-05-08     3
2005-05-09     4
2005-05-10     6
2005-05-11     4
2005-05-12     6
2005-05-13     6

The problem is that the count is together with date_only in one column that is called ndist. But I would need them in two separate columns, one with the count and one with date_only. How can this be done? I guess its rather simple, but I just don't get it. I would appreciate if you could give me any thoughts on that.

Thanks for your efforts.

JD Long · Accepted Answer

Simply because I find tapply() hard to wrap my brain around, I like using plyr for these types of things:

## make up some data
## you get better/faster/more answers if you do this bit for us :)
dates <- seq(Sys.Date(), Sys.Date() + 5, by = 1)
Kandula_for_n <- data.frame(date_only = sample( dates + 5, 10, replace=TRUE ) , dist=rnorm(10) )

require(plyr)
ddply(Kandula_for_n, "date_only", function(x) data.frame(x, ndist=nrow(x)) )

This will give you something like:

    date_only       dist ndist
1  2011-10-30  0.2434168     5
2  2011-10-30 -0.9361780     5
3  2011-10-30  1.4593197     5
4  2011-10-30 -0.1851402     5
5  2011-10-30  0.6652419     5
6  2011-10-31  0.8876420     1
7  2011-11-03  0.5087175     2
8  2011-11-03 -1.0065152     2
9  2011-11-04  0.4236352     2
10 2011-11-04  0.4535686     2

the ddply line:

ddply(Kandula_for_n, "date_only", function(x) data.frame(x, ndist=nrow(x)) )

takes the input data, groups it by the date.only field, and for every unique value it applies the anonymous function to the data frame made up of only the records with the same value for date_only. My anonymous function simply takes the data.frame x and appends a column named ndist which is the number of rows in x.

Count factors occurring in group in R

Tags:

dataframe

r

Jan Blanke

1 Answers

JD Long

Recent Activity

Donate For Us

Count factors occurring in group in R

Tags:

dataframe

r

Jan Blanke

1 Answers

JD Long

Related questions

Recent Activity

Donate For Us