Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How efficient are MongoDB projections?

Is there a lot of overhead in excluding nearly all of the data in a document when querying a mongo database?

For example, in the case where I only want field1 and field2, for a collection with a document structure of:

{
    "field1" : 1
    "field2" : true
    "field3" : ["big","array",...]
    "field4" : ["another","big","array",...]
}

would I benefit more from:

  1. Creating a separate collection alongside this collection containing only field1 and field2, or
  2. Using .find() on the original documents with inclusion/exclusion parameters

Note: The inefficiency of saving the same data twice isn't a concern for me as much as the efficiency of querying the data

Many thanks!

like image 439
Ash Avatar asked Dec 15 '12 23:12

Ash


People also ask

How does projection work in MongoDB?

In MongoDB, projection means selecting only the necessary data rather than selecting whole of the data of a document. If a document has 5 fields and you need to show only 3, then select only 3 fields from them.

Is MongoDB good for millions of records?

Working with MongoDB and ElasticSearch is an accurate decision to process millions of records in real-time. These structures and concepts could be applied to larger datasets and will work extremely well too.

Does MongoDB Sharding improve performance?

Sharded clusters in MongoDB are another way to potentially improve performance. Like replication, sharding is a way to distribute large data sets across multiple servers. Using what's called a shard key, developers can copy pieces of data (or “shards”) across multiple servers.

Is MongoDB resource intensive?

Indexes are resource-intensive: even with compression in the MongoDB WiredTiger storage engine, they consume RAM and disk. As fields are updated, associated indexes must be maintained, incurring additional CPU and disk I/O overhead.


1 Answers

Projection is somewhat similar to using column names explicitly in SQL, so it seems a little counter-intuitive to ask if returning smaller amount of data would incur overhead over returning larger amount of data (full document).

So you have to find the document (depending on how you .find() it may be fast or slow) but returning only first two fields of the document rather than all the fields (complete document) would make it faster not slower.

Having a second collection may only benefit if you are concerned about your collection fitting into RAM. If the documents in the duplicate collection are much smaller then they can presumably fit into a smaller amount of total RAM decreasing a chance that a page will need to be swapped in from disk. However, if you are writing to this collection as well as original collection then you have to have a lot more data in RAM than if you just have the original collection.

So while the intricate details may depend on your individual set-up, the general answer would probably be 2. you will benefit more from using projection and only returning the two fields you need.

like image 169
Asya Kamsky Avatar answered Oct 07 '22 09:10

Asya Kamsky