Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caching repeating query results in MongoDB

I am going to build a page that is designed to be "viewed" alot, but much fewer users will "write" into the database. For example, only 1 in 100 users may post his news on my site, and the rest will just read the news.

In the above case, 100 SAME QUERIES will be performed when they visit my homepage while the actual database change is little. Actually 99 of those queries are a waste of computer power. Are there any methods that can cache the results of the first query, and when they detect the same query in a short time, can deliver the cached result?

I use MongoDB and Tornado. However, some posts say that the MongoDB does not do caching.

Making a static, cached HTML with something like Nginx is not preferred, because I want to render a personalized page by Tornado each time.

like image 484
MK Yung Avatar asked Jan 09 '13 18:01

MK Yung


People also ask

Does MongoDB cache query results?

MongoDB keeps most recently used data in RAM. If you have created indexes for your queries and your working data set fits in RAM, MongoDB serves all queries from memory. MongoDB does not cache the query results in order to return the cached results for identical queries.

Is MongoDB good for caching?

Caching with MongoDB is extremely easy. Whether you want to search for a specific data in the cache, or query the database itself to fetch the required data, or just simply add new data to the database, it can all be done with utmost ease using NCache.

How does MongoDB reduce query execution time?

Steps taken to try to reduce execution time: Only necessary fields are projected which reduced the time by 10-30% depending on the query. Tried to reduce the size of each document to only contain necessary and non-empty fields. Tried to make tags more selective.


2 Answers

I use MongoDB and Tornado. However, some posts say that the MongoDB does not do caching.

I dunno who said that but MongoDB does have a way to cache queries, in fact it uses the OS' LRU to cache since it does not do memory management itself.

So long as your working set fits into the LRU without the OS having to page it out or swap constantly you should be reading this query from memory at most times. So, yes, MongoDB can cache but technically it doesn't; the OS does.

Actually 99 of those queries are a waste of computer power.

Caching mechanisms to solve these kind of problems is the same across most techs whether they by MongoDB or SQL. Of course, this only matters if it is a problem, you are probably micro-optimising if you ask me; unless you get Facebook or Google or Youtube type traffic.

The caching subject goes onto a huge subject that ranges from caching queries in either pre-aggregated MongoDB/Memcache/Redis etc to caching HTML and other web resources to make as little work as possible on the server end.

Your scenario, personally as I said, sounds as though you are thinking wrong about the wasted computer power. Even if you were to cache this query in another collection/tech you would probably use the same amount of power and resources retrieving the result from that tech than if you just didn't bother. However that assumption comes down to you having the right indexes, schema, set-up etc.

I recommend you read some links on good schema design and index creation:

  • http://docs.mongodb.org/manual/core/indexes/
  • https://docs.mongodb.com/manual/core/data-model-operations/#large-number-of-collections

Making a static, cached HTML with something like Nginx is not preferred, because I want to render a personalized page by Tornado each time.

Yea I think by trying to worry about query caching you are pre-maturely optimising, especially if you don't want to take off, what would be 90% of the load on your server each time; loading the page itself.

I would focus on your schema and indexes and then worry about caching if you really need it.

like image 198
Sammaye Avatar answered Oct 04 '22 06:10

Sammaye


The author of the Motor (MOngo + TORnado) package gives an example of caching his list of categories here: http://emptysquare.net/blog/refactoring-tornado-code-with-gen-engine/

Basically, he defines a global list of categories and queries the database to fill it in; then, whenever he need the categories in his pages, he checks the list: if it exists, he uses it, if not, he queries again and fills it in. He has it set up to invalidate the list whenever he inserts to the database, but depending on your usage you could create a global timeout variable to keep track of when you need to re-query next. If you're doing something complicated, this could get out of hand, but if it's just a list of the most recent posts or something, I think it would be fine.

like image 22
Josh Buell Avatar answered Oct 04 '22 05:10

Josh Buell