Here is my small example: ...........
Mark <- paste ("SN", 1:400, sep = "")
highway <- rep(1:4, each = 100)
set.seed (1234)
MAF <- rnorm (400, 0.3, 0.1)
PPC <- abs (ceiling( rnorm (400, 5, 5)))
set.seed (1234)
Position <- round(c(cumsum (rnorm (100, 5, 3)),
cumsum (rnorm (100, 10, 3)), cumsum (rnorm (100, 8, 3)),
cumsum (rnorm (100, 6, 3))), 1)
mydf <- data.frame (Mark, highway, Position, MAF, PPC)
I want to filter data which is less than 10 for PPC at the sametime greater than 0.3 for MAF.
# filter PPC < 10 & MAF > 0.3
filtered <- mydf[mydf$PPC < 10 & mydf$MAF > 0.3,]
I have grouping variable - highway and each Mark has Position on the highway. For example highway 1 for first five marks:
1.4 7.2 15.5 13.4 19.7
|-----|.......|.......|.....|.....|
"SN1" "SN2" "SN3" "SN4" "SN5"
Now I want to pick any ~ 30 Marks such that they are well distributed in each highway based on the Position on each highway (consider different length of highway) and minimum distance between two picks is not less than 10.
Edit: The idea (rough sketch)
I could think a little bit on how to solve this question. Help appreciated.
Edits: Here something I could figure out:
# The maximum (length) of each highway is:
out <- tapply(mydf$Position, mydf$highway, max)
out
1 2 3 4
453.0 1012.4 846.4 597.6
min(out)
[1] 453
#Total length of all highways
totallength <- sum(out)
# Thus average distance at which mark need to be placed:
totallength / 30
[1] 96.98
For highway 1, the theoritical marks could be at:
96.98, 96.98+ 96.98, 96.98+96.98+ 96.98, ........till it is less
than maximum (length )for highway 1.
Thus theoritically we need to choose mark at every 96.98. But the marks placed in highway may not be foud at
note: the total outcome of selection of marks need not be exactly 30 (around 30)
Since we aren't bothered about any other columns, the code is a little easier if we use split to get a list of positions.
filtered$highway <- factor(filtered$highway)
positions <- with(filtered, split(Position, highway))
A suitable number of marks in each highway can be found using the relative length of each highway.
highway_lengths <- sapply(positions, max)
total_length <- sum(highway_lengths)
n_marks_per_highway <- round(30 * highway_lengths / total_length)
We can use the quantile function to get target points that are evenly spaced along each highway.
target_mark_points <- mapply(
function(pos, n)
{
quantile(pos, seq.int(0, 1, 1 / (n - 1)))
},
positions,
n_marks_per_highway
)
For each target point, we find the nearest existing mark in the highway.
actual_mark_points <- mapply(
function(pos, target)
{
sapply(target, function(tgt)
{
d <- abs(tgt - pos)
pos[which.min(d)]
})
},
positions,
target_mark_points
)
Just to see that it works, you can visualise the marks.
is_mark_point <- mapply(
function(pos, mark)
{
pos %in% mark
},
positions,
actual_mark_points
)
filtered$is.mark.point <- unsplit(is_mark_point, filtered$highway)
library(ggplot2)
(p <- ggplot(filtered, aes(Position, highway, colour = is.mark.point)) +
geom_point()
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With