I have sampling data spread over two data sets. loc
describes geographical positions, spe
contains species found. Unfortunally, the sampling stations are described by two factors (cruise
and station
),so i need to construct unique identifiers for both data sets
>loc
cruise station lon lat
1 TY1 A1 53.8073 6.7836
2 TY1 3 53.7757 6.7009
3 AZ7 A1 53.7764 6.6758
and
>spe
cruise station species abundance
1 TY1 A1 Ensis ensis 100
2 TY1 A1 Magelona 5
3 TY1 A1 Nemertea 17
4 TY1 3 Magelona 8
5 TY1 3 Ophelia 1200
6 AZ7 A1 Ophelia 950
7 AZ7 A1 Ensis ensis 89
8 AZ7 A1 Spio 1
what I need is to add a unique identifier ID
as such
cruise station species abundance ID
1 TY1 A1 Ensis ensis 100 STA0001
2 TY1 A1 Magelona 5 STA0001
3 TY1 A1 Nemertea 17 STA0001
4 TY1 3 Magelona 8 STA0002
5 TY1 3 Ophelia 1200 STA0002
6 AZ7 A1 Ophelia 950 STA0003
7 AZ7 A1 Ensis ensis 89 STA0003
8 AZ7 A1 Spio 1 STA0003
Here's the data
loc<-data.frame(cruise=c("TY1","TY1","AZ7"),station=c("A1",3,"A1"),lon=c(53.8073, 53.7757, 53.7764),lat=c(6.7836, 6.7009, 6.6758))
spe<-data.frame(cruise=c(rep("TY1",5),rep("AZ7",3)),station=c(rep("A1",3),rep(3,2),rep("A1",3)),species=c("Ensis ensis", "Magelona", "Nemertea", "Magelona", "Ophelia", "Ophelia","Ensis ensis", "Spio"),abundance=c(100,5,17,8,1200,950,89,1))
Then, I construct the ID
for loc
loc$ID<-paste("STA",formatC(1:nrow(loc),width=4,format="d",flag="0"),sep="")
but how do I map the ID
to spe
?
The way I found involves two nested loops is quite handsome for a procedural programmer like me (if nested loops can be called handsome at all). I'm so sure that a two-liner in R would do more efficient and faster, but I can't figure it out. I really want more beauty in my code, this is so un-R.
Actually, I think this is a case where merge
in base R just works:
merge(spe, loc, all.x=TRUE)
cruise station species abundance lon lat
1 AZ7 A1 Ophelia 950 53.7764 6.6758
2 AZ7 A1 Ensis ensis 89 53.7764 6.6758
3 AZ7 A1 Spio 1 53.7764 6.6758
4 TY1 3 Magelona 8 53.7757 6.7009
5 TY1 3 Ophelia 1200 53.7757 6.7009
6 TY1 A1 Ensis ensis 100 53.8073 6.7836
7 TY1 A1 Magelona 5 53.8073 6.7836
8 TY1 A1 Nemertea 17 53.8073 6.7836
To find the unique identifiers, use unique()
:
unique(paste(loc$cruise, loc$station, sep="-"))
[1] "TY1-A1" "TY1-3" "AZ7-A1"
You can combine factors with interaction
.
If you aren't bothered about the labels for the ID column the solution is really easy.
loc <- within(loc, id <- interaction(cruise, station))
spe <- within(spe, id <- interaction(cruise, station))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With