Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mapping factors to data frame

Tags:

r

I have sampling data spread over two data sets. loc describes geographical positions, spe contains species found. Unfortunally, the sampling stations are described by two factors (cruise and station),so i need to construct unique identifiers for both data sets

>loc
  cruise station     lon    lat
1    TY1      A1 53.8073 6.7836
2    TY1       3 53.7757 6.7009
3    AZ7      A1 53.7764 6.6758

and

>spe
  cruise station     species abundance
1    TY1      A1 Ensis ensis       100
2    TY1      A1    Magelona         5
3    TY1      A1    Nemertea        17
4    TY1       3    Magelona         8
5    TY1       3     Ophelia      1200
6    AZ7      A1     Ophelia       950
7    AZ7      A1 Ensis ensis        89
8    AZ7      A1        Spio         1

what I need is to add a unique identifier ID as such

  cruise station     species abundance     ID
1    TY1      A1 Ensis ensis       100 STA0001
2    TY1      A1    Magelona         5 STA0001
3    TY1      A1    Nemertea        17 STA0001
4    TY1       3    Magelona         8 STA0002
5    TY1       3     Ophelia      1200 STA0002
6    AZ7      A1     Ophelia       950 STA0003
7    AZ7      A1 Ensis ensis        89 STA0003
8    AZ7      A1        Spio         1 STA0003

Here's the data

loc<-data.frame(cruise=c("TY1","TY1","AZ7"),station=c("A1",3,"A1"),lon=c(53.8073, 53.7757, 53.7764),lat=c(6.7836, 6.7009, 6.6758))

spe<-data.frame(cruise=c(rep("TY1",5),rep("AZ7",3)),station=c(rep("A1",3),rep(3,2),rep("A1",3)),species=c("Ensis ensis", "Magelona", "Nemertea", "Magelona", "Ophelia", "Ophelia","Ensis ensis", "Spio"),abundance=c(100,5,17,8,1200,950,89,1))

Then, I construct the ID for loc

 loc$ID<-paste("STA",formatC(1:nrow(loc),width=4,format="d",flag="0"),sep="")

but how do I map the ID to spe?

The way I found involves two nested loops is quite handsome for a procedural programmer like me (if nested loops can be called handsome at all). I'm so sure that a two-liner in R would do more efficient and faster, but I can't figure it out. I really want more beauty in my code, this is so un-R.

like image 336
Janhoo Avatar asked Jul 13 '12 15:07

Janhoo


2 Answers

Actually, I think this is a case where merge in base R just works:

merge(spe, loc, all.x=TRUE)

  cruise station     species abundance     lon    lat
1    AZ7      A1     Ophelia       950 53.7764 6.6758
2    AZ7      A1 Ensis ensis        89 53.7764 6.6758
3    AZ7      A1        Spio         1 53.7764 6.6758
4    TY1       3    Magelona         8 53.7757 6.7009
5    TY1       3     Ophelia      1200 53.7757 6.7009
6    TY1      A1 Ensis ensis       100 53.8073 6.7836
7    TY1      A1    Magelona         5 53.8073 6.7836
8    TY1      A1    Nemertea        17 53.8073 6.7836

To find the unique identifiers, use unique():

unique(paste(loc$cruise, loc$station, sep="-"))
[1] "TY1-A1" "TY1-3"  "AZ7-A1"
like image 108
Andrie Avatar answered Nov 03 '22 14:11

Andrie


You can combine factors with interaction.

If you aren't bothered about the labels for the ID column the solution is really easy.

loc <- within(loc, id <- interaction(cruise, station))
spe <- within(spe, id <- interaction(cruise, station))
like image 42
Richie Cotton Avatar answered Nov 03 '22 14:11

Richie Cotton