Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using map/reduce for mapping the properties in a collection

Update: follow-up to MongoDB Get names of all keys in collection.

As pointed out by Kristina, one can use Mongodb 's map/reduce to list the keys in a collection:

db.things.insert( { type : ['dog', 'cat'] } ); db.things.insert( { egg : ['cat'] } ); db.things.insert( { type :  [] });  db.things.insert( { hello : []  } );  mr = db.runCommand({"mapreduce" : "things", "map" : function() {     for (var key in this) { emit(key, null); } },   "reduce" : function(key, stuff) {     return null; }})   db[mr.result].distinct("_id")  //output: [ "_id", "egg", "hello", "type" ] 

As long as we want to get only the keys located at the first level of depth, this works fine. However, it will fail retrieving those keys that are located at deeper levels. If we add a new record:

db.things.insert({foo: {bar: {baaar: true}}}) 

And we run again the map-reduce +distinct snippet above, we will get:

[ "_id", "egg", "foo", "hello", "type" ]  

But we will not get the bar and the baaar keys, which are nested down in the data structure. The question is: how do I retrieve all keys, no matter their level of depth? Ideally, I would actually like the script to walk down to all level of depth, producing an output such as:

["_id","egg","foo","foo.bar","foo.bar.baaar","hello","type"]       

Thank you in advance!

like image 416
Andrea Fiore Avatar asked Jun 08 '10 11:06

Andrea Fiore


People also ask

What is the use of map-reduce in MongoDB?

Map-Reduce Results In MongoDB, the map-reduce operation can write results to a collection or return the results inline. If you write map-reduce output to a collection, you can perform subsequent map-reduce operations on the same input collection that merge replace, merge, or reduce new results with previous results.

Can return the results of a map-reduce operation as a document or May?

mapReduce can return the results of a map-reduce operation as a document, or may write the results to collections. The input and the output collections may be sharded. For most aggregation operations, the Aggregation Pipeline provides better performance and more coherent interface.

What are map and reduce functions?

MapReduce serves two essential functions: it filters and parcels out work to various nodes within the cluster or map, a function sometimes referred to as the mapper, and it organizes and reduces the results from each node into a cohesive answer to a query, referred to as the reducer.

Which of the following database command is used for MapReduce function?

Which of the following database command is used for mapreduce function? Explanation: For map-reduce operations, MongoDB provides the mapReduce database command.


1 Answers

OK, this is a little more complex because you'll need to use some recursion.

To make the recursion happen, you'll need to be able to store some functions on the server.

Step 1: define some functions and put them server-side

isArray = function (v) {   return v && typeof v === 'object' && typeof v.length === 'number' && !(v.propertyIsEnumerable('length')); }  m_sub = function(base, value){   for(var key in value) {     emit(base + "." + key, null);     if( isArray(value[key]) || typeof value[key] == 'object'){       m_sub(base + "." + key, value[key]);     }   } }  db.system.js.save( { _id : "isArray", value : isArray } ); db.system.js.save( { _id : "m_sub", value : m_sub } ); 

Step 2: define the map and reduce functions

map = function(){   for(var key in this) {     emit(key, null);     if( isArray(this[key]) || typeof this[key] == 'object'){       m_sub(key, this[key]);     }   } }  reduce = function(key, stuff){ return null; } 

Step 3: run the map reduce and look at results

mr = db.runCommand({"mapreduce" : "things", "map" : map, "reduce" : reduce,"out": "things" + "_keys"}); db[mr.result].distinct("_id"); 

The results you'll get are:

["_id", "_id.isObjectId", "_id.str", "_id.tojson", "egg", "egg.0", "foo", "foo.bar", "foo.bar.baaaar", "hello", "type", "type.0", "type.1"] 

There's one obvious problem here, we're adding some unexpected fields here: 1. the _id data 2. the .0 (on egg and type)

Step 4: Some possible fixes

For problem #1 the fix is relatively easy. Just modify the map function. Change this:

emit(base + "." + key, null); if( isArray... 

to this:

if(key != "_id") { emit(base + "." + key, null); if( isArray... } 

Problem #2 is a little more dicey. You wanted all keys and technically "egg.0" is a valid key. You can modify m_sub to ignore such numeric keys. But it's also easy to see a situation where this backfires. Say you have an associative array inside of a regular array, then you want that "0" to appear. I'll leave the rest of that solution up to you.

like image 83
Gates VP Avatar answered Sep 22 '22 05:09

Gates VP