Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using MongoDB, any easy way to re-use Map/Reduce results?

For example, when doing Analytics, there can be a map/reduce run that takes 10 seconds. After it is run, if other webpages can make use of that result, then it will be saving 10 seconds per page.

It will be good to have the map/reduce result cached somehow.

It is possible to record a sucessful map/reduce run as map_reduce_result_[timestamp] in the db, and then keep this timestamp in db.run_log in MongoDB. This timestamp is the UNIX epoch time, for example. So when other pages need to access the result, they can get the max timestamp, and then just look up that result stored in MongoDB. But doing so is a little bit like a hack and wonder if there are better ways to do it.

like image 825
nonopolarity Avatar asked Dec 28 '22 07:12

nonopolarity


1 Answers

Your approach will save each map-reduce result in a separate collection. This is fine if you need to access these 'historical' results.

If you're only interested in the last result, you can use a single collection to act as a cache. You can specify the output collection of a map-reduce job using the out option.

db.collection.mapReduce(map, reduce, { out: "cachedResult" });

The permanent cachedResult collection will then contain the result.

As you can read in the documentation, the map-reduce job will still use a temporary collection while executing. This temporary collection is atomically renamed to the output collection on completion. This means that you can safely rerun the map-reduce job using the same output collection, without having to worry about an invalid cache while map-reducing.

like image 159
Niels van der Rest Avatar answered Jan 09 '23 23:01

Niels van der Rest