Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I derive a variable in R showing the number of observations that have the same value recorded at earlier dates?

Tags:

r

I am using R and I have a data frame containing info about the applications made by individuals for a grant. Individuals can apply for a grant as many times as they like. I want to derive a new variable that tells me how many applications each individual has made up to and including the date of the application represented by each record.

At the moment my data looks like this:

app number  date app made     applicant
1           2012-08-01        John
2           2012-08-02        John
3           2012-08-02        Jane
4           2012-08-04        John
5           2012-08-08        Alice
6           2012-08-09        Alice
7           2012-08-09        Jane

And I would like to add a further variable so my data frame looks like this:

app number  date app made    applicant  applications by applicant to date
1           2012-08-01       John       1
2           2012-08-02       John       2
3           2012-08-02       Jane       1
4           2012-08-04       John       3
5           2012-08-08       Alice      1
6           2012-08-09       Alice      2
7           2012-08-09       Jane       2

I'm new to R and I'm really struggling to work out how to do this. The closest I am able to get is something like the answer in this question: How do I count the number of observations at given intervals in R?

But I can't work out how to do this based on the date in each record rather than on pre-set intervals.

like image 363
Madeleine Thornton Avatar asked Dec 07 '22 12:12

Madeleine Thornton


2 Answers

Here's a less elegant way than @Justin 's:

    A <- read.table(text='"app number"  "date app made"     "applicant"
    1           2012-08-01        John
    2           2012-08-02        John
    3           2012-08-02        Jane
    4           2012-08-04        John
    5           2012-08-08        Alice
    6           2012-08-09        Alice
    7           2012-08-09        Jane',header=TRUE)

    # order by applicant name
    A <- A[order(A$applicant), ]
    # get vector you're looking for
    A$app2date <- unlist(sapply(unique(A$applicant),function(x, appl){
                         seq(sum(A$applicant == x))
                       }, appl = A$applicant)
                     )
    # back in original order:
    A   <- A[order(A$"app.number"), ]
like image 125
tim riffe Avatar answered Jan 30 '23 23:01

tim riffe


You can use plyr for this. If your data is in a data.frame dat, I would add a column called count, then use cumsum

library(plyr)
dat <- structure(list(number = 1:7, date = c("2012-08-01", "2012-08-02", 
"2012-08-02", "2012-08-04", "2012-08-08", "2012-08-09", "2012-08-09"
), name = c("John", "John", "Jane", "John", "Alice", "Alice", 
"Jane")), .Names = c("number", "date", "name"), row.names = c(NA, 
-7L), class = "data.frame")

dat$count <- 1

ddply(dat, .(name), transform, count=cumsum(count))

  number       date  name count
1      5 2012-08-08 Alice     1
2      6 2012-08-09 Alice     2
3      3 2012-08-02  Jane     1
4      7 2012-08-09  Jane     2
5      1 2012-08-01  John     1
6      2 2012-08-02  John     2
7      4 2012-08-04  John     3
> 

I assumed your dates were already sorted, however you might want to explicitly sort them anyway before you do your "counting":

dat <- dat[order(dat$date),]

as per the comment, this can be simplified if you understand (which I didn't!) the way transform is working:

ddply(dat, .(name), transform, count=order(date))
  number       date  name count
1      5 2012-08-08 Alice     1
2      6 2012-08-09 Alice     2
3      3 2012-08-02  Jane     1
4      7 2012-08-09  Jane     2
5      1 2012-08-01  John     1
6      2 2012-08-02  John     2
7      4 2012-08-04  John     3
like image 32
Justin Avatar answered Jan 31 '23 01:01

Justin