Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement optimization of INNER JOINS (push down) for Mongo Storage Plugin in Apache Drill?

I would like to extend the Apache Drill Mongo Storage Plugin to push down INNER JOINs. Therefore I would like to rewrite INNER JOIN into the mongo aggregation pipeline.

How do we need to start to implement the rewrite in Apache Drill.

Here is a SQL example:

SELECT *
FROM `mymongo.db`.`test` `test`
  INNER JOIN `mymongo.db`.`test2` `test2`
  ON (`test`.`id` = `test2`.`fk`)
WHERE `test2`.`date` = '09.05.2017'

I have found the push down of WHERE clauses in the Mongo Storage Plugin. But I am still struggling to do the same for INNER JOINS. How would the constuctor of public class MongoPushDownInnerJoinScan extends StoragePluginOptimizerRule look like? Which equivalent of MongoGroupScan (AbstractGroupScan) would I have to implement? Any help would be very much appreciated.

like image 968
Dennis Münkle Avatar asked Jan 03 '18 17:01

Dennis Münkle


1 Answers

If you want to make an inner join with the aggregation framework similar to SQL you can do it with the pipeline stage $lookup.

$lookup:
    {
    from: <collection to join>,
    localField: <field from the input documents>,
    foreignField: <field from the documents of the "from" collection>,
    as: <output array field>
    }
}
like image 65
Raul Sanchez Reyes Avatar answered Nov 16 '22 02:11

Raul Sanchez Reyes