I am using R and I have a data frame containing info about the applications made by individuals for a grant. Individuals can apply for a grant as many times as they like. I want to derive a new variable that tells me how many applications each individual has made up to and including the date of the application represented by each record.
At the moment my data looks like this:
app number date app made applicant
1 2012-08-01 John
2 2012-08-02 John
3 2012-08-02 Jane
4 2012-08-04 John
5 2012-08-08 Alice
6 2012-08-09 Alice
7 2012-08-09 Jane
And I would like to add a further variable so my data frame looks like this:
app number date app made applicant applications by applicant to date
1 2012-08-01 John 1
2 2012-08-02 John 2
3 2012-08-02 Jane 1
4 2012-08-04 John 3
5 2012-08-08 Alice 1
6 2012-08-09 Alice 2
7 2012-08-09 Jane 2
I'm new to R and I'm really struggling to work out how to do this. The closest I am able to get is something like the answer in this question: How do I count the number of observations at given intervals in R?
But I can't work out how to do this based on the date in each record rather than on pre-set intervals.
Here's a less elegant way than @Justin 's:
A <- read.table(text='"app number" "date app made" "applicant"
1 2012-08-01 John
2 2012-08-02 John
3 2012-08-02 Jane
4 2012-08-04 John
5 2012-08-08 Alice
6 2012-08-09 Alice
7 2012-08-09 Jane',header=TRUE)
# order by applicant name
A <- A[order(A$applicant), ]
# get vector you're looking for
A$app2date <- unlist(sapply(unique(A$applicant),function(x, appl){
seq(sum(A$applicant == x))
}, appl = A$applicant)
)
# back in original order:
A <- A[order(A$"app.number"), ]
You can use plyr
for this. If your data is in a data.frame
dat, I would add a column called count, then use cumsum
library(plyr)
dat <- structure(list(number = 1:7, date = c("2012-08-01", "2012-08-02",
"2012-08-02", "2012-08-04", "2012-08-08", "2012-08-09", "2012-08-09"
), name = c("John", "John", "Jane", "John", "Alice", "Alice",
"Jane")), .Names = c("number", "date", "name"), row.names = c(NA,
-7L), class = "data.frame")
dat$count <- 1
ddply(dat, .(name), transform, count=cumsum(count))
number date name count
1 5 2012-08-08 Alice 1
2 6 2012-08-09 Alice 2
3 3 2012-08-02 Jane 1
4 7 2012-08-09 Jane 2
5 1 2012-08-01 John 1
6 2 2012-08-02 John 2
7 4 2012-08-04 John 3
>
I assumed your dates were already sorted, however you might want to explicitly sort them anyway before you do your "counting":
dat <- dat[order(dat$date),]
as per the comment, this can be simplified if you understand (which I didn't!) the way transform
is working:
ddply(dat, .(name), transform, count=order(date))
number date name count
1 5 2012-08-08 Alice 1
2 6 2012-08-09 Alice 2
3 3 2012-08-02 Jane 1
4 7 2012-08-09 Jane 2
5 1 2012-08-01 John 1
6 2 2012-08-02 John 2
7 4 2012-08-04 John 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With