Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find longest and shortest length of a value for a field in mongoDb?

The data type of the field is String. I would like to find the length of the longest and shortest value for a field in mongoDB.

I have totally 500000 documents in my collection.

like image 860
sofs1 Avatar asked Oct 16 '14 02:10

sofs1


People also ask

How do I get the length of a field in MongoDB?

As for the logical condition, there are String Aggregation Operators that you can use $strLenCP operator to check the length of the string. If the length is $gt a specified value, then this is a true match and the document is "kept". Otherwise it is "pruned" and discarded.

What is length string MongoDB?

The length value displays the length of the string in the “name” column. For example: The length of the string 'Dallas Mavs' is 11.

What is the max length of string in MongoDB?

MongoDB connector limits text field to 255 characters.

What is $EXPR in MongoDB?

$expr can build query expressions that compare fields from the same document in a $match stage. If the $match stage is part of a $lookup stage, $expr can compare fields using let variables. See Perform Multiple Joins and a Correlated Subquery with $lookup for an example.


1 Answers

In modern releases MongoDB has the $strLenBytes or $strLenCP aggregation operators than allow you to simply do:

Class.collection.aggregate([
  { "$group" => {
    "_id" => nil,
    "max" => { "$max" => { "$strLenCP" => "$a" } },
    "min" => { "$min" => { "$strLenCP" => "$a" } }
  }}
]) 

Where "a" is the string property in your document you want to get the min and max length from.


To output the minimum and maximum length, the best approach available is to use mapReduce with a few tricks to just keep the values.

First you define a mapper function which is just really going to output a single item from your collection to reduce the load:

map = Q%{
    function () {

      if ( this.a.length < store[0] )
        store[0] = this.a.length;

      if ( this.a.length > store[1] )
        store[1] = this.a.length;

      if ( count == 0 )
        emit( null, 0 );

      count++;

    }
}

Since this is working mostly with a globally scoped variable keeping the min and max lengths you just want to substitute this in a finalize function on the single document emitted. There is no reduce stage, but define a "blank" function for this even though it is not called:

reduce = Q%{ function() {} }

finalize = Q%{
    function(key,value) {
        return {
            min: store[0],
            max: store[1]
        };
    }
}

Then call the mapReduce operation:

Class.map_reduce(map,reduce).out(inline: 1).finalize(finalize).scope(store: [], count: 0)

So all the work is done on the server and not by iterating results sent to the client application. On a small set like this:

{ "_id" : ObjectId("543e8ee7ddd272814f919472"), "a" : "this" }
{ "_id" : ObjectId("543e8eedddd272814f919473"), "a" : "something" }
{ "_id" : ObjectId("543e8ef6ddd272814f919474"), "a" : "other" }

You get a result like this (shell output, but much the same for the driver ):

{
    "results" : [
            {
                    "_id" : null,
                    "value" : {
                            "min" : 4,
                            "max" : 9
                    }
            }
    ],
    "timeMillis" : 1,
    "counts" : {
            "input" : 3,
            "emit" : 1,
            "reduce" : 0,
            "output" : 1
    },
    "ok" : 1
}

So mapReduce allows the JavaScript processing on the server to do this fairly quickly, reducing your network traffic. There is no other native way at present for MongoDB to return a string length right now, so the JavaScript processing is necessary on the server.

like image 65
Neil Lunn Avatar answered Oct 15 '22 13:10

Neil Lunn