Suppose I have:
x = data.table( id=c(1,1,1,2,2,2), price=c(100,110,120,200,200,220) )
> x
id price
1: 1 100
2: 1 110
3: 1 120
4: 2 200
5: 2 200
6: 2 220
and want to find for each row the cheapest price in the group (by=id) once the current row is omitted. so the results should look like:
> x
id price cheapest_in_this_id_omitting_current_row
1: 1 100 110 # if I take this row out the cheapest is the next row
2: 1 110 100 # row 1
3: 1 120 100 # row 1
4: 2 200 200 # row 5
5: 2 200 200 # row 4 (or 5)
6: 2 220 200 # row 4 (or 5)
So it is like using:
x[, cheapest_by_id := min(price), id]
but removing the current row for each calculation.
If I could have a variable that refers to the current row inside the group like a .row_nb, I would use:
x[, min(price[-.row_nb]), id]
but this .row_nb does not seem to exist...?
We group by 'id', use combn
on the sequence of row, specify the number of elements to choose i.e. 'm' as 1 less than the number of rows (.N-1
), use the output from combn
as numeric index to subset the 'price', get the min
and assign (:=
) the output as the new column.
x[, cheapest_in_this_id_omitting_current_row:=
combn(.N:1, .N-1, FUN=function(i) min(price[i])), by = id]
x
# id price cheapest_in_this_id_omitting_current_row
#1: 1 100 110
#2: 1 110 100
#3: 1 120 100
#4: 2 200 200
#5: 2 200 200
#6: 2 220 200
Or instead of using combn
, we can loop over the sequence, use that to index the 'price', get the mean
. I guess this would be fast.
x[,cheapest_in_this_id_omitting_current_row:=
unlist(lapply(1:.N, function(i) min(price[-i]))) , id]
Here's another way:
x[order(price), min_other_p := c(price[2], rep(price[1], .N-1)), by = id]
# or
x[order(price), min_other_p := replace( rep(price[1], .N), 1, price[2] ), by = id]
id price min_other_p
1: 1 100 110
2: 1 110 100
3: 1 120 100
4: 2 200 200
5: 2 200 200
6: 2 220 200
The order
in i
is not necessary in the OP's example but needed in general.
How it works. We sort the price vector in increasing order with order
, so that price[1]
and price[2]
are the lowest two prices observed in each group. In the result, we want price[1]
-- the lowest price overall -- everywhere except in position 1, where we want the next lowest price.
To be even more explicit: Suppose we have sorted so that we have sorted so that i==1
is the row with the lowest price within a group; i==2
, the second lowest and so on. Then price[1]
is the 1st order statistic of the vector of prices in a group and price[2]
is the second order statistic of the vector of prices. It is clear that
# pseudocode
min(price[-i]) == price[2] if i==1, since price[2] == min(price[2:.N])
min(price[-i]) == price[1] otherwise, since price[1] belongs to price[-i] and is smallest
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With