Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How efficient is searching by tags?

Tags:

mongoose

Situation:

I have 1 million users and I want to search them by tags.

Here's the schema:

const Account = new Schema({
  tags: { type: [String], default: ['cats'] }
})

Here's the first query:

const query = { tags: 'cats'}
const fields = { _id: 0 }
const options = { skip: 0, limit: 5 }

Account.find(query, fields, options)

After calling find method Mongoose will start searching and return the first 5 entries it matches. The performance in this case is optimal.

Here's the second query:

const query = { tags: 'cats'}
const fields = { _id: 0 }
const options = { skip: 500 000, limit: 5 }

Account.find(query, fields, options)

What going on in this case is of interest to me.

Does Mongoose match 500 000 entries first and then returns the next 5 entries?

Or does it somehow 'jump' to the 500 000 element without having to match them all beforehand?

I'm trying to understand how efficient this proccess is and whether I can improve it somehow. Do you have any advice?

like image 658
manidos Avatar asked Aug 14 '17 06:08

manidos


People also ask

How do I store post tags in a database?

The easiest way: create a tag table, create an entity table, create a many-to-many table like { tag_id, entity_id }. For every tag/entity insert that pair into link table. The best way: do not store tag data in relational DB, use mongo, arangodb or other nosql db for this.

What are tags in a database?

In information systems, a tag is a keyword or term assigned to a piece of information (such as an Internet bookmark, multimedia, database record, or computer file). This kind of metadata helps describe an item and allows it to be found again by browsing or searching.


2 Answers

Interesting. My understanding is that Findhere is very similar in query functionalities to Select, wherein it will work inefficiently by first going through 500,000 values before skipping them.

Hence, what I suggested is implementing something known as a Multilevel index. Basically create a nested indexing system, which although taking up data space, is very efficient, especially for sequential search practices where you do not wish to implement a tree outright. Create a set of outer indices which are more general, then link them to inner indices respectively, before moving onto stored data blocks.

Implementing this should not be too difficult and should drastically improve the amount of traversing being done. Hope this helps.

like image 193
Soumyajyoti Bhattacharya Avatar answered Oct 07 '22 03:10

Soumyajyoti Bhattacharya


Yes, it is matching 500,000 entries first.

From the MongoDB Definitive Guide 2nd Edition:

$skip takes a number, n, and discards the first n documents from the result set. As with “normal” querying, it isn’t efficient for large skips, as it must find all of the matches that must be skipped and then discard them.

You can examine your query in detail from the Mongo shell using explain():

https://docs.mongodb.com/manual/reference/method/cursor.explain/

Ex:

// Create a dummy collection
var tagTypes = ['cats', 'dogs', 'chickens']
    for (i=0; i < 1000; i++) {
        let tag = tagTypes[Math.floor(Math.random()*tagTypes.length)]
        db.people.insert({"tags" : [tag]})
    }

then explain it

db.people.find({"tags" : "cats"}).skip(100).limit(10).explain("executionStats”)

totalDocsExamined is showing you that it is matching everything that it's skipping

"executionStats" : {
...
    "totalDocsExamined" : 420

If you were to create an index on tags:

db.people.ensureIndex({ "tags" : 1 })

Running it again you'd get:

"executionStats" : {
...
    "totalDocsExamined" : 110
like image 2
drew Avatar answered Oct 07 '22 02:10

drew