I am working with bluetooth sensor data and need to identify possible duplicate readings for each unique ID. The bluetooth sensor made a scan every five seconds, and may pick up the same device in subsequent readings if the device wasn't moving quickly (i.e. sitting in traffic). There may be multiple readings from the same device if that device made a round trip, but those should be separated by several minutes. I can't wrap my head around how to get rid of the duplicate data. I need to calculate a time difference column if the macid's match.
The data has the format:
macid time
00:03:7A:4D:F3:59 82333
00:03:7A:EF:58:6F 223556
00:03:7A:EF:58:6F 223601
00:03:7A:EF:58:6F 232731
00:03:7A:EF:58:6F 232736
00:05:4F:0B:45:F7 164141
And I need to create:
macid time timediff
00:03:7A:4D:F3:59 82333 NA
00:03:7A:EF:58:6F 223556 NA
00:03:7A:EF:58:6F 223601 45
00:03:7A:EF:58:6F 232731 9310
00:03:7A:EF:58:6F 232736 5
00:05:4F:0B:45:F7 164141 NA
My first attempt at this is extremely slow and not really usable:
dedupeIDs <- function (zz) {
#Order by macid and then time
zz <- zz[order(zz$macid, zz$time) ,]
zz$timediff <- c(999999, diff(zz$time))
for (i in 2:nrow(zz)) {
if (zz[i, "macid"] == zz[i - 1, "macid"]) {
print("Different IDs")
} else {
zz[i, "timediff"] <- 999999
}
}
return(zz)
}
I'll then be able to filter the data.frame based on the time difference column.
Sample data:
structure(list(macid = structure(c(1L, 2L, 2L, 2L, 2L, 3L),
.Label = c("00:03:7A:4D:F3:59", "00:03:7A:EF:58:6F",
"00:05:4F:0B:45:F7"), class = "factor"),
time = c(82333, 223556, 223601, 232731, 232736, 164141)),
.Names = c("macid", "time"), row.names = c(NA, -6L),
class = "data.frame")
How about:
x <- structure(list(macid= structure(c(1L, 2L, 2L, 2L, 2L, 3L),
.Label = c("00:03:7A:4D:F3:59", "00:03:7A:EF:58:6F", "00:05:4F:0B:45:F7"),
class = "factor"), time = c(82333, 223556, 223601, 232731, 232736, 164141)),
.Names = c("macid", "time"), row.names = c(NA, -6L), class = "data.frame")
# ensure 'x' is ordered properly
x <- x[order(x$macid,x$time),]
# add timediff column by macid
x$timediff <- ave(x$time, x$macid, FUN=function(x) c(NA,diff(x)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With