calculating distance between two row in a data.table

Tags:

Summary of problem: I am cleaning up a fish telemetry dataset (i.e., spatial coordinates through time) using the data.table package (version 1.9.5) in R (version) on a Windows 7 PC. Some of data points are wrong (e.g., the telemetry equipment picked up echos). We can tell these points are wrong because the fish moved a farther distance than is biologically possible and stand out as outliers. The actual dataset contains over 2,000,000 rows of data from 30 individual fish, hence the use of the data.table package.

I am removing points that are too far apart (i.e., distance traveled is greater than a maximum distance). However, I need to recalculate distance traveled between points after removing a point because 2-3 data points were sometimes misrecorded in clusters. Currently, I have a for loop that gets the job done, but is likely far from optimal and I know that I am likely missing some of the powerful tools in the data.table package.

As technical notes, my spatial scale is small enough that a Euclidean distance works and my maximum distance criteria is biology reasonable.

Where I have looked for help: I have looked through SO and found several helpful answers, but none exactly match my problem. Specifically, all of the other answers only compare one column of data to among rows.

This answer compares two rows using data.table, but only looks at one variable.
This answer looks promising and uses Reduce, but I could not figure out how to use Reduce with two columns.
This answer uses an indexing feature from data.table, but I could not figure out how to use it with a distance function.
Last, this answer demonstrates the roll function of data.table. However, I could not figure out how to use two variables with this function either.

Here is my MVCE:

library(data.table)
## Create dummy data.table
dt <- data.table(fish = 1,
                 time = 1:6,
                 easting = c(1, 2, 10, 11, 3, 4),
                 northing = c(1, 2, 10, 11, 3, 4))
dt[ , dist := 0]

maxDist = 5

## First pass of calculating distances 
for(index in 2:dim(dt)[1]){
    dt[ index,
       dist := as.numeric(dist(dt[c(index -1, index),
                list(easting, northing)]))]
}

## Loop through and remove points until all of the outliers have been
## removed for the data.table. 
while(all(dt[ , dist < maxDist]) == FALSE){
    dt <- copy(dt[ - dt[ , min(which(dist > maxDist))], ])
    ## Loops through and recalculates distance after removing outlier  
    for(index in 2:dim(dt)[1]){
        dt[ index,
           dist := as.numeric(dist(dt[c(index -1, index),
                    list(easting, northing)]))]
    }
}

689

asked Sep 14 '15 19:09

Richard Erickson

1 Answers

I'm a little confused why you keep recomputing the distance (and needlessly copying data) instead of just doing a single pass:

last = 1
idx = rep(0, nrow(dt))
for (curr in 1:nrow(dt)) {
  if (dist(dt[c(curr, last), .(easting, northing)]) <= maxDist) {
    idx[curr] = curr
    last = curr
  }
}

dt[idx]
#   fish time easting northing
#1:    1    1       1        1
#2:    1    2       2        2
#3:    1    5       3        3
#4:    1    6       4        4

138

answered Oct 13 '22 23:10

eddi

Related questions
                            
                                Shiny renderUI selectInput returned NULL
                            
                                How to unlist an arbitrary level in R nested list?
                            
                                S4 classes: arguments passed to new() don't go into their slots
                            
                                What is the command in RMarkdown to "source" and display the code from an existing .R file?
                            
                                How to call user-defined function in RcppParallel?
                            
                                readOGR .gdb with multiple Feature Datasets in R
                            
                                calculate average correlation for neighboring pixels through time
                            
                                Translating Stata to R: collapse
                            
                                Use regular expressions in R strsplit
                            
                                R check with R-devel gives global function notes related to core package functions
                            
                                Install R in linux/Unix without having root privilage?
                            
                                Multicolumn output of stargazer to be used in knitr
                            
                                remove a temporary environment variable and release memory in R
                            
                                How to make labels in the legend align right in R?
                            
                                Why is the default return type of `ceiling` and `floor` numeric?
                            
                                How to exactly remove the punctuation when using R with tm package
                            
                                Sum until a given value is reached
                            
                                Get all possible combinations by row in matrix
                            
                                Dendrogram edge (branch) colors to match tip (leaf) colors (ape package)
                            
                                why are these memoised functions different?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

calculating distance between two row in a data.table

Tags:

r

data.table

Richard Erickson

People also ask

1 Answers

eddi

Recent Activity

Donate For Us