In R, it's always the small things that confound me.
Say I have a data frame like this:
location species
1 seattle A
2 buffalo C
3 seattle D
4 newark J
5 boston Q
I would like to append a column to this frame that shows the number of times a location appears in the data set, with a result like this:
location species freq-loc
1 seattle A 2 #there are 2 entries with location=seattle
2 buffalo C 1 #there is 1 entry with location=buffalo
3 seattle D 2
4 newark J 1
5 boston Q 1
I know using table(data$location)
can give me a contingency table. But I don't know how to map each value in the table to a corresponding entry in the dataframe. Can somebody help?
Update
Thank you so much for all the help! Just for interest, I ran a benchmark test to see how the merge, plyr and ave solutions ran compared to each other. The testing set is a 10,000 rows subset of my original 10 by ~7mil data set.:
Unit: milliseconds
expr min lq median uq max neval
MERGE 110.877337 111.989406 112.585420 113.51679 120.23588 100
PLYR 26.305645 27.080403 27.576580 27.87157 68.40763 100
AVE 2.994528 3.117255 3.179898 3.35834 10.02955 100
Here's a base R way with ave
.
transform(d, freq.loc = ave(seq(nrow(d)), location, FUN=length))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With