Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

add a new column that identifies individuals

Tags:

r

My dataset consists of three treatments (C, S, and E) per individual. Looks something like this.

    Year   Cultivar   Site   Distance   Plant   Treat    yield1   yield2
1   2011   Blue       ABR    0m         1       C        0.879    1.5
2   2011   Blue       ABR    0m         1       S        0.384    2.3
3   2011   Blue       ABR    0m         1       E        0.03     0.5
4   2011   Blue       ABR    0m         2       C        0.923    1.2
5   2011   Blue       ABR    0m         2       S        0.344    0.5
6   2011   Blue       ABR    0m         2       E        0.07     0.7
7   2011   Blue       ABR    50m        1       C        0.255    3.4
8   2011   Blue       ABR    50m        1       S        1.00     2.4
9   2011   Blue       ABR    50m        1       E        0.1      0.9
.
.
.

I have two years worth of data, 2 cultivars, 15 sites, 3 distances per site, and 10 plants per distance. Basically I have a lot of data (>1400 lines). What I want to be able to do is add a new column that assigns a new number to each of individual across the study. I want my data to end up looking like this.

    Individual  Year   Cultivar   Site   Distance   Plant   Treat    yield1   yield2
1   1           2011   Blue       ABR    0m         1       C        0.879    1.5
2   1           2011   Blue       ABR    0m         1       S        0.384    2.3
3   1           2011   Blue       ABR    0m         1       E        0.03     0.5
4   2           2011   Blue       ABR    0m         2       C        0.923    1.2
5   2           2011   Blue       ABR    0m         2       S        0.344    0.5
6   2           2011   Blue       ABR    0m         2       E        0.07     0.7
7   3           2011   Blue       ABR    50m        1       C        0.255    3.4
8   3           2011   Blue       ABR    50m        1       S        1.00     2.4
9   3           2011   Blue       ABR    50m        1       E        0.1      0.9
.
.
.

I'm relatively of new to R so I apologize if this is something that should be relatively easy to do. I know that I should be able to "find" each individual as a unique combination of plant*distance*site*cultivar*year, but I honestly have no idea how I would go about coding this, and I haven't managed to find any similar help pages.

Any suggestions would be greatly appreciated!

like image 927
melanopygus Avatar asked Mar 07 '13 00:03

melanopygus


2 Answers

Here's a solution using plyr:

library(plyr)
df$id <- id(df[c("Year","Cultivar", "Site", "Distance", "Plant")], drop=TRUE) 
#Add whichever columns contain the unique combination you require
df

 Year Cultivar Site Distance Plant Treat yield1 yield2 id
1 2011     Blue  ABR       0m     1     C  0.879    1.5  1
2 2011     Blue  ABR       0m     1     S  0.384    2.3  1
3 2011     Blue  ABR       0m     1     E  0.030    0.5  1
4 2011     Blue  ABR       0m     2     C  0.923    1.2  2
5 2011     Blue  ABR       0m     2     S  0.344    0.5  2
6 2011     Blue  ABR       0m     2     E  0.070    0.7  2
7 2011     Blue  ABR      50m     1     C  0.255    3.4  3
8 2011     Blue  ABR      50m     1     S  1.000    2.4  3
9 2011     Blue  ABR      50m     1     E  0.100    0.9  3
like image 29
alexwhan Avatar answered Sep 23 '22 00:09

alexwhan


And a data.table solution using .GRP

.GRP is an integer, length 1, containing a simple group counter. 1 for the 1st group, 2 for the 2nd, etc.

library(data.table)
DT <- data.table(df)

DT[,grp :=.GRP,by = list(Year,Cultivar, Site, Distance, Plant)]
like image 104
mnel Avatar answered Sep 27 '22 00:09

mnel