Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to assign a unique identifier to multiple data frame entries

Tags:

r

reshape

plyr

I have a large data frame that has three identifiers. For example:

df <- data.frame(year=c(1999,1999,2000,2000,2000), country=c('K','K','M','M','S'), 
                 site=c('di','se','di','di','di'))

Which will produce a data frame like this:

    year country site
    1999    K     di
    1999    K     se
    2000    M     di
    2000    M     di
    2000    S     di

I want to add an additional column to the data frame and have a 'unique id' assigned by using the entries for 'year', 'country', and 'site'. It would look something like this:

    year country site unique_id
    1999    K     di     1
    1999    K     se     2
    2000    M     di     3
    2000    M     di     3
    2000    S     di     4

Any suggestions on how to do this would be greatly appreciated. I'm thinking it could somehow be done using the plyr package?

like image 210
Austin Avatar asked Apr 12 '12 06:04

Austin


1 Answers

This should work quite nicely. (It takes advantage of the fact that unique levels of a factor are each actually stored as integers, and uses as.numeric() to access/extract those integer values).

df$unique_id <- 
    as.numeric(as.factor(with(df, paste(year, country, site, sep="_"))))
df
#   year country site unique_id
# 1 1999       K   di         1
# 2 1999       K   se         2
# 3 2000       M   di         3
# 4 2000       M   di         3
# 5 2000       S   di         4
like image 110
Josh O'Brien Avatar answered Oct 31 '22 22:10

Josh O'Brien