I would like to select for each ID
the two closest values of Cq
. I thought I'd figured it out, but it depends on row position...
Here is an example of the form of my dataset :
df <- data.frame(ID = c("A","A","A","B","B","B","C","C","C"),
Cq = c(34.32,34.40,34.31,31.49,31.40,31.49,31.22,31.31,31.08))
ID Cq
1 A 34.32
2 A 34.40
3 A 34.31
4 B 31.49
5 B 31.40
6 B 31.49
7 C 31.22
8 C 31.31
9 C 31.08
And what I tried
df4 <-df %>%
group_by(ID) %>%
arrange(Cq) %>%
mutate(diffvals= Cq - lag(Cq)) %>%
filter(row_number() == 1 | row_number() == 2)
#Output
ID Cq diffvals
1 A 34.31 NA
2 A 34.32 0.0100
3 B 31.40 NA
4 B 31.49 0.0900
5 C 31.08 NA
6 C 31.22 0.14
And the expected Output
ID Cq
1 A 34.32
2 A 34.31
3 B 31.49
4 B 31.49
5 C 31.22
6 C 31.31
I've tried sorting my dataset before, but it doesn't change anything. I also tried using filter(diffvals=wich.min==diffvals)
but I don't know how to extract the two smallest.
If you have any ideas, it would help me a lot!
Thanks in advance
Here is a base R code, where dist
is used to enumerate distances of all pairs within groups, i.e.,
dfout <- do.call(rbind,
lapply(split(df,df$ID),
function(v) {
d <- `diag<-`(as.matrix(dist(v$Cq)),NA)
d[lower.tri(d)] <- NA
v[which(d==min(d,na.rm = T),arr.ind = T),]
}
))
such that
> dfout
ID Cq
A.1 A 34.32
A.3 A 34.31
B.4 B 31.49
B.6 B 31.49
C.7 C 31.22
C.8 C 31.31
Using dplyr
one option is to do a full_join
with itself
based on ID
. Remove the rows which are generated in combination with itself and for each ID
select the row with minimum difference and get the data in long format.
library(dplyr)
df %>%
mutate(Row = row_number()) %>%
full_join(df, by = 'ID') %>%
group_by(ID, Row) %>%
filter(Cq.x != Cq.y) %>%
group_by(ID) %>%
slice(which.min(abs(Cq.x - Cq.y))) %>%
tidyr::pivot_longer(cols = starts_with('Cq')) %>%
select(-Row, -name)
# ID value
# <fct> <dbl>
#1 A 34.3
#2 A 34.3
#3 B 31.5
#4 B 31.4
#5 C 31.2
#6 C 31.3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With