My dataset consists of three treatments (C, S, and E) per individual. Looks something like this.
Year Cultivar Site Distance Plant Treat yield1 yield2
1 2011 Blue ABR 0m 1 C 0.879 1.5
2 2011 Blue ABR 0m 1 S 0.384 2.3
3 2011 Blue ABR 0m 1 E 0.03 0.5
4 2011 Blue ABR 0m 2 C 0.923 1.2
5 2011 Blue ABR 0m 2 S 0.344 0.5
6 2011 Blue ABR 0m 2 E 0.07 0.7
7 2011 Blue ABR 50m 1 C 0.255 3.4
8 2011 Blue ABR 50m 1 S 1.00 2.4
9 2011 Blue ABR 50m 1 E 0.1 0.9
.
.
.
I have two years worth of data, 2 cultivars, 15 sites, 3 distances per site, and 10 plants per distance. Basically I have a lot of data (>1400 lines). What I want to be able to do is add a new column that assigns a new number to each of individual across the study. I want my data to end up looking like this.
Individual Year Cultivar Site Distance Plant Treat yield1 yield2
1 1 2011 Blue ABR 0m 1 C 0.879 1.5
2 1 2011 Blue ABR 0m 1 S 0.384 2.3
3 1 2011 Blue ABR 0m 1 E 0.03 0.5
4 2 2011 Blue ABR 0m 2 C 0.923 1.2
5 2 2011 Blue ABR 0m 2 S 0.344 0.5
6 2 2011 Blue ABR 0m 2 E 0.07 0.7
7 3 2011 Blue ABR 50m 1 C 0.255 3.4
8 3 2011 Blue ABR 50m 1 S 1.00 2.4
9 3 2011 Blue ABR 50m 1 E 0.1 0.9
.
.
.
I'm relatively of new to R so I apologize if this is something that should be relatively easy to do. I know that I should be able to "find" each individual as a unique combination of plant*distance*site*cultivar*year, but I honestly have no idea how I would go about coding this, and I haven't managed to find any similar help pages.
Any suggestions would be greatly appreciated!
Here's a solution using plyr
:
library(plyr)
df$id <- id(df[c("Year","Cultivar", "Site", "Distance", "Plant")], drop=TRUE)
#Add whichever columns contain the unique combination you require
df
Year Cultivar Site Distance Plant Treat yield1 yield2 id
1 2011 Blue ABR 0m 1 C 0.879 1.5 1
2 2011 Blue ABR 0m 1 S 0.384 2.3 1
3 2011 Blue ABR 0m 1 E 0.030 0.5 1
4 2011 Blue ABR 0m 2 C 0.923 1.2 2
5 2011 Blue ABR 0m 2 S 0.344 0.5 2
6 2011 Blue ABR 0m 2 E 0.070 0.7 2
7 2011 Blue ABR 50m 1 C 0.255 3.4 3
8 2011 Blue ABR 50m 1 S 1.000 2.4 3
9 2011 Blue ABR 50m 1 E 0.100 0.9 3
And a data.table
solution using .GRP
.GRP is an integer, length 1, containing a simple group counter. 1 for the 1st group, 2 for the 2nd, etc.
library(data.table)
DT <- data.table(df)
DT[,grp :=.GRP,by = list(Year,Cultivar, Site, Distance, Plant)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With