This is my data:
> head(Kandula_for_n)
date dist date_only
1 2005-05-08 12:00:00 138.5861 2005-05-08
2 2005-05-08 16:00:00 1166.9265 2005-05-08
3 2005-05-08 20:00:00 1270.7149 2005-05-08
6 2005-05-09 08:00:00 233.1971 2005-05-09
7 2005-05-09 12:00:00 1899.9530 2005-05-09
8 2005-05-09 16:00:00 726.8363 2005-05-09
I would now like to have an additional column with the count (n) of the data entries (dist) per day. For 2005-05-08, this would be n=3 as there are 3 data entries at 12, 16 and 20 o'clock. I have applied the following code which actually gave me want I wanted:
ndist <-tapply(1:NROW(Kandula_for_n), Kandula_for_n$date_only, function(x) length(unique(x)))
After ndist<-as.data.frame(ndist)
, I got this:
> head(ndist)
ndist
2005-05-08 3
2005-05-09 4
2005-05-10 6
2005-05-11 4
2005-05-12 6
2005-05-13 6
The problem is that the count is together with date_only in one column that is called ndist. But I would need them in two separate columns, one with the count and one with date_only. How can this be done? I guess its rather simple, but I just don't get it. I would appreciate if you could give me any thoughts on that.
Thanks for your efforts.
Simply because I find tapply()
hard to wrap my brain around, I like using plyr
for these types of things:
## make up some data
## you get better/faster/more answers if you do this bit for us :)
dates <- seq(Sys.Date(), Sys.Date() + 5, by = 1)
Kandula_for_n <- data.frame(date_only = sample( dates + 5, 10, replace=TRUE ) , dist=rnorm(10) )
require(plyr)
ddply(Kandula_for_n, "date_only", function(x) data.frame(x, ndist=nrow(x)) )
This will give you something like:
date_only dist ndist
1 2011-10-30 0.2434168 5
2 2011-10-30 -0.9361780 5
3 2011-10-30 1.4593197 5
4 2011-10-30 -0.1851402 5
5 2011-10-30 0.6652419 5
6 2011-10-31 0.8876420 1
7 2011-11-03 0.5087175 2
8 2011-11-03 -1.0065152 2
9 2011-11-04 0.4236352 2
10 2011-11-04 0.4535686 2
the ddply
line:
ddply(Kandula_for_n, "date_only", function(x) data.frame(x, ndist=nrow(x)) )
takes the input data, groups it by the date.only
field, and for every unique value it applies the anonymous function to the data frame made up of only the records with the same value for date_only
. My anonymous function simply takes the data.frame x
and appends a column named ndist
which is the number of rows in x
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With