I'm working with eye tracking data right now, so have a HUGE dataset (think millions of rows) and so would like a fast way to do this task. Here's a simplified version of it.
The data tells you where the eye is looking at each time point, and for each file we are looking at. X1,Y1 to the coordinates of the point we're looking at. There are multiple time points for each file (representing the eye looking at different location in the file through time).
Filename Time X1 Y1
1 1 10 10
1 2 12 10
I also have a file of where items are located for each filename. Each file contains (in this simplified case) two objects. X1,Y1 are the lower left coordinates and X2, Y2 are the upper right. You can imagine this as giving the bounding box where the item is located in each file. E.g.
Filename Item X1 Y1 X2 Y2
1 Dog 11 10 20 20
What I'd like to do is add another column to the first data frame that tells me what object the person is looking at during each time for each file. If there are not looking at any of the objects, I'd like the column to say "none". Things on the border count at as being looked at. E.g.
Filename Time X1 Y1 LookingAt
1 1 10 10 none
1 2 12 11 Dog
I know how to do this the for loop way, but it takes forever (and crashed my RStudio). I'm wondering if there might be a faster, more efficient way I'm missing.
Here's the dput for the first dataframe (These contain more rows that the example I showed above):
structure(list(Filename = structure(c(1L, 1L, 1L, 2L, 2L, 3L,
3L, 3L, 3L), .Label = c("1", "2", "3"), class = "factor"), Time = structure(c(1L,
2L, 3L, 1L, 2L, 1L, 2L, 4L, 5L), .Label = c("1", "2", "3", "5",
"6"), class = "factor"), X1 = structure(c(1L, 4L, 3L, 2L, 1L,
4L, 6L, 5L, 1L), .Label = c("10", "11", "12", "15", "20", "25"
), class = "factor"), Y1 = structure(c(1L, 5L, 6L, 4L, 1L, 2L,
3L, 4L, 1L), .Label = c("10", "11", "12", "15", "20", "25"), class = "factor")), .Names = c("Filename",
"Time", "X1", "Y1"), row.names = c(NA, -9L), class = "data.frame")
And here's the dput for the second:
structure(list(Filename = structure(c(1L, 1L, 2L, 2L), .Label = c("1",
"3"), class = "factor"), Item = structure(1:4, .Label = c("Cat",
"Dog", "House", "Mouse"), class = "factor"), X1 = structure(c(2L,
4L, 3L, 1L), .Label = c("10", "11", "20", "35"), class = "factor"),
Y1 = structure(c(2L, 4L, 3L, 1L), .Label = c("10", "11",
"13", "35"), class = "factor"), X2 = structure(c(1L, 3L,
4L, 2L), .Label = c("10", "11", "20", "35"), class = "factor"),
Y2 = structure(c(1L, 3L, 4L, 2L), .Label = c("10", "11",
"13", "35"), class = "factor")), .Names = c("Filename", "Item",
"X1", "Y1", "X2", "Y2"), row.names = c(NA, -4L), class = "data.frame")
Using data.table and the sample data you provided, I would approach it as follows:
# getting the data in the right format
datcols <- c("X","Y")
lucols <- c("X1","X2","Y1","Y2")
setDT(dat)[, (datcols) := lapply(.SD, function(x) as.numeric(as.character(x))), .SDcol = datcols
][, Filename := as.character(Filename)]
setDT(lu)[, (lucols) := lapply(.SD, function(x) as.numeric(as.character(x))), .SDcol = lucols
][, `:=` (Filename = as.character(Filename),
X1 = pmin(X1,X2), X2 = pmax(X1,X2), # make sure that 'X1' is always the lowest value
Y1 = pmin(Y1,Y2), Y2 = pmax(Y1,Y2))] # make sure that 'Y1' is always the lowest value
# matching the 'Items' to the correct rows
dat[, looked_at := lu$Item[Filename==lu$Filename &
between(X, lu$X1, lu$X2) &
between(Y, lu$Y1, lu$Y2)],
by = .(Filename,Time)]
which gives:
> dat
Filename Time X Y looked_at
1: 1 1 10 10 Cat
2: 1 2 15 20 NA
3: 1 3 12 25 NA
4: 2 1 11 15 NA
5: 2 2 10 10 NA
6: 3 1 15 11 NA
7: 3 2 25 12 NA
8: 3 5 20 15 House
9: 3 6 10 10 Mouse
Used data:
dat <- structure(list(Filename = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("1", "2", "3"), class = "factor"),
Time = structure(c(1L, 2L, 3L, 1L, 2L, 1L, 2L, 4L, 5L), .Label = c("1", "2", "3", "5", "6"), class = "factor"),
X = structure(c(1L, 4L, 3L, 2L, 1L, 4L, 6L, 5L, 1L), .Label = c("10", "11", "12", "15", "20", "25"), class = "factor"),
Y = structure(c(1L, 5L, 6L, 4L, 1L, 2L, 3L, 4L, 1L), .Label = c("10", "11", "12", "15", "20", "25"), class = "factor")),
.Names = c("Filename", "Time", "X", "Y"), row.names = c(NA, -9L), class = "data.frame")
lu <- structure(list(Filename = structure(c(1L, 1L, 2L, 2L), .Label = c("1", "3"), class = "factor"),
Item = structure(1:4, .Label = c("Cat", "Dog", "House", "Mouse"), class = "factor"),
X1 = structure(c(2L, 4L, 3L, 1L), .Label = c("10", "11", "20", "35"), class = "factor"),
X2 = structure(c(1L, 3L, 4L, 2L), .Label = c("10", "11", "20", "35"), class = "factor"),
Y1 = structure(c(2L, 4L, 3L, 1L), .Label = c("10", "11", "13", "35"), class = "factor"),
Y2 = structure(c(1L, 3L, 4L, 2L), .Label = c("10", "11", "13", "35"), class = "factor")),
.Names = c("Filename", "Item", "X1", "X2", "Y1", "Y2"), row.names = c(NA, -4L), class = "data.frame")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With