Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB $lookup: Limitations & Usage

With the new aggregation pipeline stage $lookup we are now able to perform 'left outer joins'.

At first glance, I want to immediately replace one of our denormalised collections with two separate collections and use the $lookup to join them upon querying. This will solve the problem of having, when necessary, to update a huge number of documents. Now we can update just one document.

But surely this is too good to be true? This is a NoSQL, document database after all!

MongoDB's CTO also highlights his concerns:

We’re still concerned that $lookup can be misused to treat MongoDB like a relational database. But instead of limiting its availability, we’re going to help developers know when its use is appropriate, and when it’s an anti-pattern. In the coming months, we will go beyond the existing documentation to provide clear, strong guidance in this area.

What are the limitations of $lookup? Can I use them in real-time, operational querying of our data or should they be left for reporting, offline situations?

like image 895
Dave New Avatar asked Feb 08 '16 06:02

Dave New


People also ask

Does MongoDB have a limit?

The maximum size an individual document can be in MongoDB is 16MB with a nested depth of 100 levels. Edit: There is no max size for an individual MongoDB database.

What is $lookup in MongoDB?

$lookup performs an equality match on the localField to the foreignField from the documents of the from collection. If an input document does not contain the localField , the $lookup treats the field as having a value of null for matching purposes.

What is MongoDB Index Key limit?

A collection cannot have more than 64 indexes. The length of the index name cannot be longer than 125 characters. A compound index can have maximum 31 fields indexed.

How many records can MongoDB handle?

Mongo can easily handle billions of documents and can have billions of documents in the one collection but remember that the maximum document size is 16mb. There are many folk with billions of documents in MongoDB and there's lots of discussions about it on the MongoDB Google User Group.


2 Answers

I share your same enthusiasm for $lookup.

I think there are trade-offs. One of the major concerns of SQL databases (and which is one of the reasons for the genesis of NoSQL) is that at large scale, joins can take a lot of time (well, relatively speaking).

It definitely helps in giving you a declarative model for your data, but then if you start to model your entire NoSQL database as though its a database of rows and tables (just using refs, for example), then you begin modeling it as though it's simply a SQL database (to a degree). Even MongoDB mentioned it (like you put in your question):

We’re still concerned that $lookup can be misused to treat MongoDB like a relational database.

You mentioned:

This will solve the problem of having, when necessary, to update a huge number of documents. Now we can update just one document.

I'm not sure what your collections look like exactly, but that definitely sounds like it could be a good use for $lookup.

Can I use them in real-time, operational querying

I would say, again, it depends on your use-case. You'll have to compare:

  • Desired semantics of your queries (declarative vs imperative)
  • Whether modeling your data as more relational (and thus using $lookup) in certain circumstances is worth the potential trade-off in computational time (that's assuming that querying across collections is even something to be concerned about, computationally speaking)

etc...

I'm sure in the coming months we'll see perf tests of the "left outer joins" and perhaps MongoDB will start writing some posts about when $lookup is an antipattern.

Hope this answer helps add to the discussion.

like image 99
Josh Beam Avatar answered Oct 23 '22 03:10

Josh Beam


First of all MongoDB is a document-based database and will always be. So the $lookup aggregation pipeline stage new in version 3.2 didn't change MongoDB to relational database (RDBMS) as MongoDB's CTO mentioned:

We’re still concerned that $lookup can be misused to treat MongoDB like a relational database.

The first limitation of $lookup as mentioned in the documentation is that it:

Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing.

Which means that you can't use it with a sharded collection.

Also the $lookup operator doesn't work directly with an array as mentioned in post therefore you will need a preliminary $unwind stage to denormalize the localField if it is an array.

Now you said:

This will solve the problem of having, when necessary, to update a huge number of documents.

This is a good idea if your data are updated often than they are read. as mentioned in 6 Rules of Thumb for MongoDB Schema Design: Part 3 especially if you have a large hierarchical data sets.

Denormalizing one or more fields makes sense if those fields are read much more often than they are updated.

I believe that with careful schema design you probably will not need the $lookup operator.

like image 41
styvane Avatar answered Oct 23 '22 03:10

styvane