Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add column to data frame that shows frequency of variable

Tags:

r

In R, it's always the small things that confound me.

Say I have a data frame like this:

  location   species
1  seattle   A
2  buffalo   C
3  seattle   D
4  newark    J
5  boston    Q

I would like to append a column to this frame that shows the number of times a location appears in the data set, with a result like this:

  location   species    freq-loc
1  seattle   A          2           #there are 2 entries with location=seattle
2  buffalo   C          1           #there is 1 entry with location=buffalo
3  seattle   D          2
4  newark    J          1
5  boston    Q          1

I know using table(data$location) can give me a contingency table. But I don't know how to map each value in the table to a corresponding entry in the dataframe. Can somebody help?

Update

Thank you so much for all the help! Just for interest, I ran a benchmark test to see how the merge, plyr and ave solutions ran compared to each other. The testing set is a 10,000 rows subset of my original 10 by ~7mil data set.:

Unit: milliseconds
expr        min         lq     median        uq       max neval
MERGE 110.877337 111.989406 112.585420 113.51679 120.23588   100
PLYR  26.305645  27.080403  27.576580  27.87157  68.40763   100
AVE   2.994528   3.117255   3.179898   3.35834  10.02955   100
like image 820
thesnorlax Avatar asked Jun 10 '13 18:06

thesnorlax


1 Answers

Here's a base R way with ave.

transform(d, freq.loc = ave(seq(nrow(d)), location, FUN=length))
like image 188
Matthew Plourde Avatar answered Oct 07 '22 10:10

Matthew Plourde