select repeated row observations with the least absolute difference

Question

I have a data frame like this:

df <- data.frame(id = c(1,1,1,2,2,3,3,3,3),
                  vars = c(1,2,5, 1,3, 0,2,4,-1))

> df
  id vars
1  1    1
2  1    2
3  1    5
4  2    1
5  2    3
6  3    0
7  3    2
8  3    4
9  3   -1

In this data frame each id can have several observations. I now want to select for each id the pair (2 observations) that have the least absolute difference for vars.

In the above case that would be

for id 1, values 1 and 2 have the lowest absolute difference, id 2 only has 2 observations so both are automatically selected. for the id 3 the selected vars would be 0 and -1 because the absolute difference is 1, lower than all other combinations.

IceCreamToucan · Accepted Answer

You don't need to do all the comparisons (or, you can let arrange do your comparisons for you), because once you've sorted the values each value is already beside the value for which the difference is minimized.

df %>% 
  group_by(id) %>% 
  arrange(vars) %>% 
  slice(which.min(diff(vars)) + 0:1)

# # A tibble: 6 x 2
# # Groups:   id [3]
#      id  vars
#   <dbl> <dbl>
# 1     1     1
# 2     1     2
# 3     2     1
# 4     2     3
# 5     3    -1
# 6     3     0

data.table version

library(data.table)
setDT(df)

df[df[order(vars), .I[which.min(diff(vars)) + 0:1], id]$V1]

#    id vars
# 1:  3   -1
# 2:  3    0
# 3:  1    1
# 4:  1    2
# 5:  2    1
# 6:  2    3

boski · Answer

Not the most concise but works. Probably somebody can improve the idea.

df1%>%group_by(id)%>%mutate(vars2=which.min(abs(diff(combn(num(vars),2)))))%>%
  mutate(vars1=ifelse(vars%in%combn(num(vars),2)[,vars2],vars,NA))%>%select(id,vars1)%>%.[complete.cases(.),]

# A tibble: 6 x 2
# Groups:   id [3]
     id vars1
  <dbl> <dbl>
1     1     1
2     1     2
3     2     1
4     2     3
5     3     0
6     3    -1

The main idea is to do the difference on all the possible combinations of the values of each group. vars2 keeps the column with the lowest difference. If the value is one of the two present in the vars2 column, it is kept. Else, it is set as NA. Then, only complete cases are returned.

select repeated row observations with the least absolute difference

Tags:

dataframe

r

dplyr

spore234

2 Answers

IceCreamToucan

boski

Recent Activity

Donate For Us

select repeated row observations with the least absolute difference

Tags:

dataframe

r

dplyr

spore234

2 Answers

IceCreamToucan

boski

Related questions

Recent Activity

Donate For Us