Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Measure distance between the first and last location record per day and animal in R

I have a database with >300,000 records of animal observations. Each row represents an animal location. Each animal has a unique id (id1) and several columns with attributes related to that animal location, including date of observation and x and y coordinates.

Could someone help me create a code that will allow me to the following:

1) Subset data by BOTH date and id1

2) Measure the distance (coordinates are in UTM so distance will be in meters) between the FIRST and LAST location record for each date for each different id1

An example of the data is the following:

mydata<-read.table(text = 
"id1 date x y
1 11/02/2014 478776.4332 7922167.59 
1 11/02/2014 478776.4333 7922170.59
1 11/02/2014 478776.4334 7922180.59
1 12/02/2014 478776.4335 7922190.59 
1 12/02/2014 478776.4350 7922192.59
1 12/02/2014 478776.4360 7922195.59
2 11/02/2014 478776.4338 7922167.59
2 11/02/2014 478776.4339 7922183.59
2 11/02/2014 478776.4340 7922185.59
2 12/02/2014 478776.4350 7922188.30 
2 12/02/2014 478776.4360 7922190.59
2 12/02/2014 478776.4390 7922198.59
3 11/02/2014 478776.4338 7922167.59
3 11/02/2014 478776.4345 7922175.59
3 11/02/2014 478776.4355 7922178.85
3 12/02/2014 478776.4368 7922180.59 
3 12/02/2014 478776.4398 7922183.59
3 12/02/2014 478776.4399 7922185.59
4 11/02/2014 478776.4338 7922167.59
4 11/02/2014 478776.4340 7922172.59
4 11/02/2014 478776.4345 7922178.59
3 11/02/2014 478776.4350 7922179.59
3 12/02/2014 478776.4355 7922184.59 
3 12/02/2014 478776.4360 7922187.59
3 12/02/2014 478776.4365 7922198.59",
header = TRUE)

A much less efficient alternative would be to select the first and last records for each date and id1 and then measure distance between all pairs of points. I found a code to select the LAST record for each individual animal but I will still have to incorporate the subset by date as well:

myid.uni <- unique(mydata$id1)
a<-length(myid.uni)

last <- c()

for (i in 1:a) {
  temp<-subset(mydata, id1==myid.uni[i])
  if (dim(temp)[1] > 1) {
    last.temp<-temp[dim(temp)[1],]
  }
  else {
    last.temp<-temp
  }
  last<-rbind(last, last.temp)
}

Could someone help me with either strategy, preferentially with the most straightforward way to do this?

Thank you!

like image 318
AnnK Avatar asked May 24 '16 13:05

AnnK


2 Answers

using dplyr:

library(dplyr)
mydata %>%
  group_by(id1, date) %>%
  summarize(x1 = first(x), y1 = first(y), x2 = last(x), y2 = last(y)) %>%
  mutate(distance = sqrt((x2-x1)^2+(y2-y1)^2))

Results:

Source: local data frame [7 x 7]
Groups: id1 [4]

    id1       date       x1      y1       x2      y2  distance
  (int)     (fctr)    (dbl)   (dbl)    (dbl)   (dbl)     (dbl)
1     1 11/02/2014 478776.4 7922168 478776.4 7922181 13.000000
2     1 12/02/2014 478776.4 7922191 478776.4 7922196  5.000001
3     2 11/02/2014 478776.4 7922168 478776.4 7922186 18.000000
4     2 12/02/2014 478776.4 7922188 478776.4 7922199 10.290001
5     3 11/02/2014 478776.4 7922168 478776.4 7922180 12.000000
6     3 12/02/2014 478776.4 7922181 478776.4 7922199 18.000000
7     4 11/02/2014 478776.4 7922168 478776.4 7922179 11.000000
like image 73
Edward R. Mazurek Avatar answered Oct 19 '22 08:10

Edward R. Mazurek


You can construct the data using data.table

# load data.table and make mytable a data.table
library(data.table)
setDT(mydata)

# contstruct new dataset
newData <- mydata[, .(startx=head(x, 1), endx=tail(x, 1), 
                      starty=head(y, 1), endy=tail(y, 1)),
                  by=.(id1, date)]
# calculate distance
newData[, distance := sqrt((endx-startx)^2+(endy-starty)^2)]

I use the same (Euclidean) distance formula as @edward-r-mazurek.

like image 5
lmo Avatar answered Oct 19 '22 06:10

lmo