Below is a table of products customer 1 has bought.
df <- data.table(customer_id = rep(1,3)
, product_1 = letters[1:3]
)
customer_id product_1
1: 1 a
2: 1 b
3: 1 c
Assume the real dataset has multiple customers, I'd like to, for each customer, create a permutation of products each has bought (without replacement). In combinatorics term:
nPk
where
n = number of (distinct) products each customer has bought
k = 2
Results:
customer_id product_1 product_2
1: 1 a b
2: 1 a c
3: 1 b c
4: 1 b a
5: 1 c a
6: 1 c b
The SQL join conditions would be:
where customer_id = customer_id
and product_1 != product_1
However, I understand data.table currently has limited support for non equi joins. Therefore, is there an alternative way of achieving this?
You can eliminate the cases where product_1 and product_2 are equal after joining
df[df, on = .(customer_id = customer_id), allow.cartesian = T
][product_1 != i.product_1
][order(product_1)]
customer_id product_1 i.product_1
1: 1 a b
2: 1 a c
3: 1 b a
4: 1 b c
5: 1 c a
6: 1 c b
Another option using by=.EACHI:
df[df, on=.(customer_id),
.(p1=i.product_1, p2=x.product_1[x.product_1!=i.product_1]), by=.EACHI]
output:
customer_id p1 p2
1: 1 a b
2: 1 a c
3: 1 b a
4: 1 b c
5: 1 c a
6: 1 c b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With