Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to determine similar values around a particular row in R?

Tags:

r

Goal

I want to find the duration of lane changes of vehicles as shown in the plot, using lateral position data. enter image description here

Data

Following is the data for only one vehicle:

> dput(a)
structure(list(Frame.ID = 526:1058, xcoord = c(14.346, 14.367, 
14.388, 14.419, 14.458, 14.503, 14.55, 14.6, 14.65, 14.702, 14.754, 
14.807, 14.86, 14.913, 14.966, 15.02, 15.072, 15.125, 15.178, 
15.23, 15.282, 15.333, 15.384, 15.434, 15.482, 15.529, 15.574, 
15.617, 15.657, 15.694, 15.727, 15.755, 15.78, 15.802, 15.823, 
15.841, 15.858, 15.874, 15.889, 15.903, 15.917, 15.93, 15.942, 
15.955, 15.967, 15.978, 15.989, 16, 16.011, 16.022, 16.033, 16.044, 
16.055, 16.065, 16.075, 16.085, 16.095, 16.104, 16.112, 16.12, 
16.129, 16.139, 16.151, 16.164, 16.178, 16.195, 16.212, 16.231, 
16.25, 16.27, 16.291, 16.312, 16.333, 16.356, 16.379, 16.403, 
16.428, 16.455, 16.482, 16.511, 16.542, 16.574, 16.609, 16.646, 
16.687, 16.732, 16.783, 16.839, 16.902, 16.967, 17.033, 17.1, 
17.168, 17.232, 17.294, 17.354, 17.41, 17.464, 17.513, 17.559, 
17.6, 17.636, 17.665, 17.685, 17.694, 17.7, 17.708, 17.725, 17.751, 
17.782, 17.817, 17.856, 17.897, 17.939, 17.982, 18.025, 18.067, 
18.108, 18.145, 18.178, 18.207, 18.232, 18.255, 18.274, 18.292, 
18.308, 18.323, 18.336, 18.349, 18.361, 18.372, 18.383, 18.393, 
18.403, 18.413, 18.422, 18.432, 18.441, 18.451, 18.46, 18.469, 
18.479, 18.488, 18.496, 18.505, 18.513, 18.521, 18.529, 18.537, 
18.544, 18.55, 18.556, 18.562, 18.567, 18.574, 18.58, 18.588, 
18.597, 18.609, 18.623, 18.64, 18.662, 18.69, 18.722, 18.76, 
18.802, 18.849, 18.899, 18.953, 19.012, 19.076, 19.144, 19.218, 
19.299, 19.386, 19.479, 19.574, 19.669, 19.763, 19.855, 19.945, 
20.031, 20.112, 20.187, 20.254, 20.31, 20.352, 20.385, 20.412, 
20.435, 20.45, 20.455, 20.449, 20.436, 20.416, 20.39, 20.361, 
20.328, 20.293, 20.256, 20.217, 20.178, 20.139, 20.1, 20.063, 
20.026, 19.99, 19.957, 19.925, 19.895, 19.867, 19.842, 19.819, 
19.796, 19.774, 19.751, 19.729, 19.707, 19.685, 19.662, 19.64, 
19.617, 19.594, 19.571, 19.547, 19.523, 19.499, 19.473, 19.449, 
19.426, 19.404, 19.382, 19.359, 19.336, 19.312, 19.288, 19.263, 
19.237, 19.211, 19.184, 19.156, 19.127, 19.097, 19.066, 19.033, 
18.998, 18.961, 18.921, 18.878, 18.831, 18.781, 18.727, 18.67, 
18.612, 18.554, 18.498, 18.446, 18.397, 18.349, 18.304, 18.264, 
18.233, 18.21, 18.194, 18.182, 18.175, 18.171, 18.17, 18.172, 
18.177, 18.183, 18.192, 18.202, 18.213, 18.226, 18.241, 18.258, 
18.277, 18.298, 18.321, 18.346, 18.371, 18.396, 18.422, 18.447, 
18.471, 18.495, 18.518, 18.54, 18.559, 18.577, 18.591, 18.601, 
18.606, 18.605, 18.6, 18.593, 18.584, 18.579, 18.58, 18.59, 18.607, 
18.629, 18.655, 18.682, 18.711, 18.739, 18.766, 18.792, 18.818, 
18.842, 18.864, 18.885, 18.905, 18.924, 18.943, 18.961, 18.98, 
19, 19.02, 19.038, 19.054, 19.068, 19.081, 19.092, 19.103, 19.112, 
19.121, 19.129, 19.137, 19.144, 19.15, 19.156, 19.161, 19.166, 
19.169, 19.172, 19.173, 19.173, 19.171, 19.168, 19.163, 19.156, 
19.147, 19.136, 19.123, 19.109, 19.093, 19.078, 19.061, 19.041, 
19.017, 18.988, 18.954, 18.918, 18.878, 18.836, 18.795, 18.756, 
18.722, 18.693, 18.671, 18.655, 18.642, 18.633, 18.625, 18.619, 
18.613, 18.608, 18.602, 18.593, 18.58, 18.562, 18.537, 18.504, 
18.46, 18.403, 18.33, 18.234, 18.115, 17.972, 17.806, 17.623, 
17.427, 17.223, 17.013, 16.802, 16.592, 16.389, 16.191, 15.998, 
15.806, 15.604, 15.386, 15.149, 14.891, 14.617, 14.328, 14.029, 
13.722, 13.412, 13.097, 12.773, 12.436, 12.084, 11.723, 11.361, 
11.006, 10.663, 10.334, 10.02, 9.723, 9.453, 9.219, 9.027, 8.874, 
8.753, 8.657, 8.583, 8.525, 8.481, 8.448, 8.421, 8.4, 8.384, 
8.371, 8.36, 8.351, 8.345, 8.338, 8.33, 8.319, 8.304, 8.284, 
8.258, 8.224, 8.183, 8.136, 8.084, 8.029, 7.971, 7.912, 7.853, 
7.794, 7.736, 7.681, 7.629, 7.581, 7.54, 7.506, 7.482, 7.468, 
7.46, 7.459, 7.462, 7.468, 7.477, 7.489, 7.501, 7.514, 7.526, 
7.539, 7.55, 7.562, 7.573, 7.584, 7.595, 7.607, 7.62, 7.636, 
7.654, 7.675, 7.702, 7.734, 7.773, 7.823, 7.885, 7.96, 8.046, 
8.134, 8.213, 8.278, 8.322, 8.342, 8.338, 8.308, 8.258, 8.192, 
8.112, 8.023, 7.927, 7.827, 7.725, 7.623, 7.522, 7.424, 7.334, 
7.252, 7.183, 7.128, 7.093, 7.078, 7.085, 7.117, 7.177, 7.267, 
7.385, 7.525, 7.679, 7.839, 8, 8.155, 8.296, 8.418, 8.519, 8.606, 
8.682, 8.749, 8.82, 8.891, 8.956, 9.012, 9.057, 9.09, 9.126, 
9.162, 9.197, 9.227, 9.249, 9.257, 9.254, 9.251, 9.247), Lane = c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L)), row.names = c(NA, -533L), .Names = c("Frame.ID", 
"xcoord", "Lane"), class = c("tbl_df", "tbl", "data.frame"))

Method

To estimate the lane change duration, I want to start from the row where the target lane number is reported first time. In the example the target lane is 1 and the number 1 is reported first time when the vehicle touches the pavement marking, as shown in the plot. This point can be added in the data as:

a$rows <- as.numeric(row.names(a)) # Row numbers
a$lch <- a$xcoord[match(head(which(a$Lane==1),1), a$rows)]

For this point I want to find the difference from every other point.

a$difference <- abs(a$lch - a$xcoord)

Then, comparing the differences in both directions, there will be some rows where the differences will be quite similar indicating that lateral position was almost similar. The first instances in both directions will be treated as the limits of lane change maneuver.
How can I achieve this comparison part in R? I don't know how to 'move' from Frame.ID 929 (this example) upwards and downwards to compare where the differences become similar.

like image 673
umair durrani Avatar asked Sep 06 '15 06:09

umair durrani


People also ask

How do I find which row has a specific value in R?

You can use the following basic syntax to find the rows of a data frame in R in which a certain value appears in any of the columns: library(dplyr) df %>% filter_all(any_vars(. %in% c('value1', 'value2', ...)))

How do you check if a value is in a column in R?

The %in% in R is a built-in R operator that returns TRUE if an element belongs to a vector or data frame or FALSE otherwise. The %in% will check if two vectors contain overlapping numbers.

How do I select a specific value in a column in R?

To select a column in R you can use brackets e.g., YourDataFrame['Column'] will take the column named “Column”. Furthermore, we can also use dplyr and the select() function to get columns by name or index. For instance, select(YourDataFrame, c('A', 'B') will take the columns named “A” and “B” from the dataframe.


1 Answers

Not sure if it's the best/fastest solution, but the idea is to calculate the differences between two consecutive "difference" columns you created and pick a small value as a threshold to flag that they started becoming similar. Then get the row closest to the time point of the change for each lane.

Note that I renamed your dataset as dt. Run the process step by step to see how the process works. If you are happy with it you can make the script shorter by combining some commands.

dt$rows <- as.numeric(row.names(dt)) # Row numbers
dt$lch <- dt$xcoord[match(head(which(dt$Lane==1),1), dt$rows)]
dt$difference <- abs(dt$lch - dt$xcoord)

library(dplyr)

dt %>%
  mutate(frameIDchange = Frame.ID[difference==0],                # spot the frame ID of the change
         diff_diff = difference - lag(difference, default=difference[1]),   # find the difference of two consequtive differences
         flag =  ifelse(abs(diff_diff) <= 0.01,1,0)) %>%         # flag if the difference of two consequtive differences is equal or less to our threshold (0.01)
  filter(flag==1) %>%                                            # keep only rows with threshold difference
  mutate(frameIDdiff = abs(Frame.ID-frameIDchange)) %>%          # calculate distance of frame IDs
  group_by(Lane) %>%                                             # for each lane
  filter(frameIDdiff== min(abs(frameIDdiff)))                    # return the frame id closest to the change frame id with a difference equal or less to our threshold


#   Frame.ID xcoord Lane rows    lch difference frameIDchange diff_diff flag frameIDdiff
# 1      896 18.593    2  371 12.436      6.157           929    -0.009    1          33
# 2      953  8.351    1  428 12.436      4.085           929     0.009    1          24

This tells you that the frame ids where the differences started to become similar (0.01 threshold) are 896 and 953 for lanes 2 and 1 respectively. Obviously the frame id of the change (929) belongs between those frames as expected.

You can also experiment with slightly higher/less threshold values to see how the results change.

like image 89
AntoniosK Avatar answered Nov 04 '22 21:11

AntoniosK