Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongodb sort by array of strings and use index

Tags:

mongodb

How can I make query with sorting by an array of string which will be execute without "stage" : "SORT" in its plan?

I'm using mongo 3.6
"mycoll" collection contains about 500.000 documents like these:

{
    someobject:{
        arrayfield:["asd","qwe"]
    }
}

{
    someobject:{
        arrayfield:["zxc"]
    }
}

this query

db.mycoll.find().sort({ "someobject.arrayfield": 1 }).skip(125340).limit(20)

produces an error

Sort operation used more than the maximum 33554432 bytes of RAM

I have and index on "someobject.arrayfield",but explain() gives me:

 "winningPlan" : {
            "stage" : "SKIP",
            "skipAmount" : 125340,
            "inputStage" : {
                    "stage" : "SORT",
                    "sortPattern" : {
                            "someobject.arrayfield" : 1
                    },
                    "limitAmount" : 125360,
                    "inputStage" : {
                            "stage" : "SORT_KEY_GENERATOR",
                            "inputStage" : {
                                    "stage" : "FETCH",
                                    "inputStage" : {
                                            "stage" : "IXSCAN",
                                            "keyPattern" : {
                                                    "someobject.arrayfield" : 1
                                            },
                                            "indexName" : "arrayfield_indexname",

                                            "isMultiKey" : true,
                                            "multiKeyPaths" : {
                                                    "someobject.arrayfield" : [
                                                            "someobject.arrayfield"
                                                    ]
                                            },
                                            "isUnique" : false,
                                            "isSparse" : false,
                                            "isPartial" : false,
                                            "indexVersion" : 2,
                                            "direction" : "forward",
                                            "indexBounds" : {
                                                    "someobject.arrayfield" : [
                                                            "[MinKey, MaxKey]"
                                                    ]
                                            }
                                    }
                            }
                    }
            }
    }

I know, I can increase the limits, use aggregation with 'allowdiskusage' or query

db.mycoll.find().sort({ "someobject.arrayfield.1": 1 }).skip(125340).limit(20)

with index on "someobject.arrayfield.1"

like image 793
Vlad Mamaev Avatar asked Oct 17 '18 14:10

Vlad Mamaev


People also ask

Does MongoDB sort use index?

MongoDB may use multiple indexes to support a sort operation if the sort uses the same indexes as the query predicate. If MongoDB cannot use an index or indexes to obtain the sort order, MongoDB must perform a blocking sort operation on the data.

How do I sort an array in MongoDB aggregate?

To sort the whole array by value, or to sort by array elements that are not documents, identify the input array and specify 1 for an ascending sort or -1 for descending sort in the sortBy parameter.

How do you sort an index array?

sort(long[], int, int) method sorts the specified range of the specified array of longs into ascending numerical order. The range to be sorted extends from index fromIndex, inclusive, to index toIndex, exclusive.


1 Answers

I have a potential solution, depending on what the values in your array actually are and if you just need a stable sort, or if you require a sort based on the array-comparison logic that mongodb uses.

Skip down to the proposed solution section if you don't want to read some details about how mongodb compares arrays.


At first, I was curious exactly how a .sort() on an array field would order the results. Would it use the first array value to do the comparison? Or some combination of the values?

After some testing, it looks like mongodb uses all the values in the array to compare and order them. This was my test data (_id field omitted for brevity):

db.mycoll.find().sort({"someobject.arrayfield":1})
{ "someobject" : { "arrayfield" : [ "rty", "aaa" ] } }
{ "someobject" : { "arrayfield" : [ "xcv", "aaa", "bcd" ] } }
{ "someobject" : { "arrayfield" : [ "aaa", "xcv", "bcd" ] } }
{ "someobject" : { "arrayfield" : [ "asd", "qwe" ] } }
{ "someobject" : { "arrayfield" : [ "bnm" ] } }
{ "someobject" : { "arrayfield" : [ "dfg", "sdf" ] } }
{ "someobject" : { "arrayfield" : [ "qwe" ] } }

As you can see, it's not sorting based on the first value of the array, but instead comparing the entire array using some internal logic. How does it determine that [ "rty", "aaa" ] should come before [ "xcv", "aaa", "bcd" ] exactly? And why does [ "xcv", "aaa", "bcd" ] come before [ "aaa", "xcv", "bcd" ]? Or are they equal and it's using the _id as a tie breaker? I really don't know.

I thought maybe it was using the standard javascript comparison operators, but that doesn't appear to be the case either. I made an array of each of those arrays and called .sort() on it in and got this:

x.sort()
[ [ 'aaa', 'xcv', 'bcd' ],
  [ 'asd', 'qwe' ],
  [ 'bnm' ],
  [ 'dfg', 'sdf' ],
  [ 'qwe' ],
  [ 'rty', 'aaa' ],
  [ 'xcv', 'aaa', 'bcd' ] ]

Which makes sense, because apparently javascript array comparison joins the elements with a comma delimiter and then does a string comparison.

PROPOSED SOLUTION

The array comparison logic in mongodb is a mystery to me. But, that opens up a possibility where you might not care about mongodb's mysterious array comparison logic. If all you want is a stable sort so that you can skip and limit for pagination, then I think I have a solution for you.

If we create an index on the first value of the array, like so (using background:1 to avoid locking the database):

db.mycoll.createIndex( { "someobject.arrayfield.0":1 }, {background:1} )

Then we can perform the find query and sort on the first object in the array, which will avoid the SORT stage:

mongos> db.mycoll.find().sort({"someobject.arrayfield.0":1}).explain()

"winningPlan" : {
   "stage" : "LIMIT",
   "limitAmount" : 1,
   "inputStage" : {
      "stage" : "SKIP",
      "skipAmount" : 1,
      "inputStage" : {
         "stage" : "FETCH",
         "inputStage" : {
            "stage" : "IXSCAN",
            "keyPattern" : {
               "someobject.arrayfield.0" : 1
            },
            "indexName" : "someobject.arrayfield.0_1",
            "isMultiKey" : false,
            "multiKeyPaths" : {
               "someobject.arrayfield.0" : [ ]
            },
            "isUnique" : false,
            "isSparse" : false,
            "isPartial" : false,
            "indexVersion" : 2,
            "direction" : "forward",
            "indexBounds" : {
               "someobject.arrayfield.0" : [
                  "[MinKey, MaxKey]"
               ]
            }
         }
      }
   }
}

No more SORT stage!


This proposed solution is based on a big assumption that you are willing to accept a different sort order than your original query was providing. I hope that this solution will work and you are able to implement it this way. If not, maybe someone else can expand on this idea.

like image 62
RovingBlade Avatar answered Oct 13 '22 03:10

RovingBlade