Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sub-setting by group closest to defined value

Tags:

r

dplyr

subset

I have a dataframe where I would like to select within each group the lines where y is the closest to a specific value (ex.: 5).

set.seed(1234)
df <- data.frame(x = c(rep("A", 4),
                       rep("B", 4)),
                 y = c(rep(4, 2), rep(1, 2), rep(6, 2), rep(3, 2)),
                 z = rnorm(8))

df

##   x y          z
## 1 A 4 -1.2070657
## 2 A 4  0.2774292
## 3 A 1  1.0844412
## 4 A 1 -2.3456977
## 5 B 6  0.4291247
## 6 B 6  0.5060559
## 7 B 3 -0.5747400
## 8 B 3 -0.5466319

The result would be:

##   x y          z
## 1 A 4 -1.2070657
## 2 A 4  0.2774292
## 3 B 6  0.4291247
## 4 B 6  0.5060559

Thank you, Philippe

like image 517
Philippe Massicotte Avatar asked Jan 22 '26 04:01

Philippe Massicotte


2 Answers

df %>%
  group_by(x) %>%
  mutate(
    delta = abs(y - 5)
  ) %>%
  filter(delta == min(delta)) %>%
  select(-delta)
like image 73
Thierry Avatar answered Jan 24 '26 22:01

Thierry


Alternatively using base R:

 df[do.call(c, tapply(df$y, df$x, function(x) x-5 == max(x - 5))),]
  x y          z
1 A 4 -1.2070657
2 A 4  0.2774292
5 B 6  0.4291247
6 B 6  0.5060559
like image 36
DatamineR Avatar answered Jan 24 '26 23:01

DatamineR



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!