I am working with <code>data.table</code> and I want to do a non-equi left join/merge. I have one table with car prices and another table to identify which car class each car belongs to: <pre class="prettyprint"><code>data_priceclass <- data.table() data_priceclass$price_from <- c(0, 0, 200000, 250000, 300000, 350000, 425000, 500000, 600000, 700000, 800000, 900000, 1000000, 1100000, 1200000, 1300000, 1400000, 1500000, 1600000, 1700000, 1800000) data_priceclass$price_to <- c(199999, 199999, 249999, 299999, 349999, 424999, 499999, 599999, 699999, 799999, 899999, 999999, 1099999, 1199999, 1299999, 1399999, 1499999, 1599999, 1699999, 1799999, 1899999) data_priceclass$price_class <- c(1:20, 99) </code></pre> I use a non-equi join to merge the two tables. But the x[y]-join syntax of <code>data.table</code> removes duplicates. <pre class="prettyprint"><code>cars <- data.table(car_price = c(190000, 500000)) cars[data_priceclass, on = c("car_price >= price_from", "car_price < price_to"), price_class := i.price_class,] cars </code></pre> Notice that the car with value 190000 is supposed to get matches on two rows in the <code>data_priceclass</code> table, but since x[y] removes duplicates, I can't see this in the output. Normally when I join I always use the <code>merge</code> function instead of x[y], because I'm losing control when I use x[y]. But the following does not work with non-equi joins: <pre class="prettyprint"><code>merge(cars, data_priceclass, by = c("car_price >= price_from", "car_price < price_to"), all.x = T , all.y = F) </code></pre> Any tips how I can do a non-equi join with data.table that does not remove duplicates?

As noted in comments, a left join on <code>cars</code> is done by using <code>cars</code> as subsetting condition <code>i</code> in the <code>DT[i,j,by]</code> syntax. This puts <code>cars</code> on the right, which might be counter-intuitive compared to <code>SQL</code>, and I found this tutorial useful to compare both syntaxes. <pre class="prettyprint"><code>cars <- data.table(car_price = c(190000, 500000)) data_priceclass[cars, .(car_price,x.price_from,x.price_to,price_class),on = .(price_from <= car_price,price_to > car_price)] car_price x.price_from x.price_to price_class 1: 190000 0e+00 199999 1 2: 190000 0e+00 199999 2 3: 500000 5e+05 599999 8 </code></pre> If you increase car price: <pre class="prettyprint"><code>cars <- cars * 10 data_priceclass[cars, .(car_price,x.price_from,x.price_to,price_class),on = .(price_from <= car_price,price_to > car_price)] car_price x.price_from x.price_to price_class 1: 1900000 NA NA NA 2: 5000000 NA NA NA </code></pre>

R: unequi join with merge function

Tags:

r

data.table

non-equi-join

I am working with data.table and I want to do a non-equi left join/merge.

I have one table with car prices and another table to identify which car class each car belongs to:

Click to copy

data_priceclass <- data.table()
data_priceclass$price_from <- c(0, 0, 200000, 250000, 300000, 350000, 425000, 500000, 600000, 700000, 800000, 900000, 1000000, 1100000, 1200000, 1300000, 1400000, 1500000, 1600000, 1700000, 1800000) 
data_priceclass$price_to <- c(199999, 199999, 249999, 299999, 349999, 424999, 499999, 599999, 699999, 799999, 899999, 999999, 1099999, 1199999, 1299999, 1399999, 1499999, 1599999, 1699999, 1799999, 1899999)
data_priceclass$price_class <- c(1:20, 99)

I use a non-equi join to merge the two tables. But the x[y]-join syntax of data.table removes duplicates.

Click to copy

cars <- data.table(car_price = c(190000, 500000))
cars[data_priceclass, on = c("car_price >= price_from", 
                             "car_price < price_to"),
     price_class := i.price_class,]
cars

Notice that the car with value 190000 is supposed to get matches on two rows in the data_priceclass table, but since x[y] removes duplicates, I can't see this in the output. Normally when I join I always use the merge function instead of x[y], because I'm losing control when I use x[y].

But the following does not work with non-equi joins:

Click to copy

merge(cars, data_priceclass,
      by = c("car_price >= price_from", 
             "car_price < price_to"),
      all.x = T , all.y = F)

Any tips how I can do a non-equi join with data.table that does not remove duplicates?

536

asked Apr 20 '21 07:04

Helen

1 Answers

As noted in comments, a left join on cars is done by using cars as subsetting condition i in the DT[i,j,by] syntax.
This puts cars on the right, which might be counter-intuitive compared to SQL, and I found this tutorial useful to compare both syntaxes.

Click to copy

cars <- data.table(car_price = c(190000, 500000))
data_priceclass[cars, .(car_price,x.price_from,x.price_to,price_class),on = .(price_from <= car_price,price_to > car_price)]

   car_price x.price_from x.price_to price_class
1:    190000        0e+00     199999           1
2:    190000        0e+00     199999           2
3:    500000        5e+05     599999           8

If you increase car price:

Click to copy

cars <- cars * 10
data_priceclass[cars, .(car_price,x.price_from,x.price_to,price_class),on = .(price_from <= car_price,price_to > car_price)]

   car_price x.price_from x.price_to price_class
1:   1900000           NA         NA          NA
2:   5000000           NA         NA          NA

138

answered Oct 01 '22 01:10

Waldi

Related questions
                            
                                How to use label_wrap_gen with as_labeller in facet_wrap
                            
                                Is there an R function to replace a matched RegEx with a string of characters with the same length? [duplicate]
                            
                                render dropdown for single column in DT shiny BUT loaded only on cell click and with replaceData()
                            
                                How can I count the total number of occurrences at time step t of an element?
                            
                                R Shuffle and randomize columns of a data table
                            
                                How to replace exact number of characters in string based on occurrence between delimitors in R
                            
                                Using mutate with map2 and exec instead of invoke_map
                            
                                Error in summary.connection(connection) : invalid connection
                            
                                Is there a way to multiply the 2d matrices of a 3d array by a scalar in R?
                            
                                Create summary table in R using statistics from package `modifiedmk`
                            
                                Remove linear dependent variables while using the bife package
                            
                                Lookaround regular expression pattern in R
                            
                                How to append new data in googlesheet
                            
                                match all parentheses between two curly brackets
                            
                                Dodge two different geoms apart in ggplot2
                            
                                ggplot2 geom_bar fill aesthetic not changing
                            
                                How to count rows by group with n() inside dplyr::across()?
                            
                                How are apply family functions scoped?
                            
                                Tuning a LASSO model and predicting using tidymodels
                            
                                Knit PDf file from RStudio

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R: unequi join with merge function

Tags:

r

data.table

non-equi-join

Helen

People also ask

1 Answers

Waldi

Recent Activity

Donate For Us