Below is a subset of my data:
> head(dt)
name start end
1: 1 3195984 3197398
2: 1 3203519 3205713
3: 2 3204562 3207049
4: 2 3411782 3411982
5: 2 3660632 3661579
6: 3 3638391 3640590
dt <- data.frame(name = c(1, 1, 2, 2, 2, 3), start = c(3195984,
3203519, 3204562, 3411782, 3660632, 3638391), end = c(3197398,
3205713, 3207049, 3411982, 3661579, 3640590))
I want to calculate another value: the difference between the end coordinate of line n and the start coordinate of line n+1 but only if both elements share a name. To elaborate this is what I want a resulting data frame to look like:
name start end dist
1: 1 3195984 3197398
2: 1 3203519 3205713 -6121
3: 2 3204562 3207049
4: 2 3411782 3411982 −204733
5: 2 3660632 3661579 −248650
6: 3 3638391 3640590
The reason I want to do this is that I'm looking for dist values that are positive. One way I've tried this is to offset the start and end coordinates but then I run into a problem where I am comparing things with different names.
How does one do this in R?
A data.table
solution may be good here:
library(data.table)
dt <- as.data.table(dt)
dt[, dist := c(NA, end[-(length(end))] - start[-1]) , by=name]
dt
# name start end dist
#1: 1 3195984 3197398 NA
#2: 1 3203519 3205713 -6121
#3: 2 3204562 3207049 NA
#4: 2 3411782 3411982 -204733
#5: 2 3660632 3661579 -248650
#6: 3 3638391 3640590 NA
Assuming your data is sorted, you can also do it with base R functions:
dt$dist <- unlist(
by(dt, dt$name, function(x) c(NA, x$end[-(length(x$end))] - x$start[-1]) )
)
Using dplyr (with credit to @thelatemail for the calculation of dist):
library(dplyr)
dat.new <- dt %.%
group_by(name) %.%
mutate(dist = c(NA, end[-(length(end))] - start[-1]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With