Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Datastore solution for tag search

I've got millions of items ordered by a precomputed score. Each item has many boolean attributes. Let says that there is about ten thousand possible attributes totally, each item having dozen of them.

I'd like to be able to request in realtime (few milliseconds) the top n items given ~any combination of attributes.

What solution would you recommend? I am looking for something extremely scalable.

--
- We are currently looking at mongodb and array index, do you see any limitation ?
- SolR is a possible solution but we do not need text search capabilities.

like image 615
log0 Avatar asked Apr 30 '12 21:04

log0


People also ask

Is Datastore deprecated?

Because Cloud Datastore API v1 is released, Cloud Datastore API v1beta3 is now deprecated.

What is indexing in Datastore?

An index is defined on a list of properties of a given entity kind, with a corresponding order (ascending or descending) for each property. For use with ancestor queries, the index may also optionally include an entity's ancestors. An index table contains a column for every property named in the index's definition.

What is Datastore used for?

Datastore is a highly scalable NoSQL database for your applications. Datastore automatically handles sharding and replication, providing you with a highly available and durable database that scales automatically to handle your applications' load.


2 Answers

Mongodb can handle what you want, if you stored your objects like this

{ score:2131, attributes: ["attr1", "attr2", "attr3"], ... }

Then the following query will match all the items that have att1 and attr2

c = db.mycol.find({ attributes: { $all: [ "attr1", "attr2" ] } })

but this won't match it

c = db.mycol.find({ attributes: { $all: [ "attr1", "attr4" ] } })

the query returns a cursor, if you want this cursor to be sorted, then just add the sort parameters to the query like so

c = db.mycol.find({ attributes: { $all: [ "attr1", "attr2" ] }}).sort({score:1})

Have a look at Advanced Queries to see what's possible.

Appropriate indexes can be setup as follows

db.mycol.ensureIndex({attributes:1, score:1})

And you can get performance information using

db.mycol.find({ attributes: { $all: [ "attr1" ] }}).explain()

Mongo explains how many objects were scanned, how long the operation took and various other statistics.

like image 177
Ivo Bosticky Avatar answered Sep 18 '22 22:09

Ivo Bosticky


This is exactly what Mongo can deal with. The fact that your attributes are boolean type helps here. A possible schema is listed below:

[
    {
        true_tags:[attr1, attr2, attr3, ...],
        false_tags: [attr4, attr5, attr6, ...]
    },
]

Then we can index on true_tags and false_tags. And it should be efficient to search with $in, $all, ... query operators.

like image 20
James Gan Avatar answered Sep 19 '22 22:09

James Gan