Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get all elements with matching id in array of id

I am using Mongoose in my Express / React web application and I'm storing data in a Mongo Database.

I store 'songs' in a songs collection and the user has an array containing the ids of the songs he listened to for example.

Then to render what he's been listening to, I have to link the array with song ids with song ids from the songs collection.

I am currently using

song.find({_id: {$in: ids}}).exec(callback)

to fetch all the songs matching the ids in the 'ids' array. The 'ids' array may contain the same id several times if the user listened the song again and again.

The thing is that mongoose returns only once the song corresponding to the id and thus the song is not displayed multiple times. Is there a way I cant tell mongoose to pass to the callback as many object as the id is repeated ?

To sum up:

ids: ['a', 'a', 'a', 'b', 'c']
song.find({_id: {$in: ids}}).exec(callback)
dataPassedToCallback: [songA, songB, songC]

Expecting

dataPassedToCallback: [songA, songA, songA, songB, songC]
like image 211
Yooooomi Avatar asked May 23 '18 19:05

Yooooomi


People also ask

How do you find a matching element in an array?

To find the first array element that matches a condition:Use the Array. find() method to iterate over the array. Check if each value matches the condition. The find method returns the first array element that satisfies the condition.

How do you find the specific ID of an array?

In JavaScript, we can use the Array. prototype. find() method to find an object by ID in an array of objects.

How do I find an element in an array of arrays?

Use forEach() to find an element in an array The Array. prototype. forEach() method executes the same code for each element of an array. The code is simply a search of the index Rudolf (🦌) is in using indexOf.

How do you find the index of an object in an array?

JavaScript Array findIndex() The findIndex() method executes a function for each array element. The findIndex() method returns the index (position) of the first element that passes a test. The findIndex() method returns -1 if no match is found.


1 Answers

There seem to be a couple of possible cases here about what you could be asking.

From the perspective of $in, MongoDB really looks at this as "shorthand" for an $or condition so effectively these two statements are the same:

"field": { "$in": ["a", "a", "a", "b", "c"] }

and

"$or": [
  { "field": "a" },
  { "field": "a" },
  { "field": "a" },
  { "field": "b" },
  { "field": "c" }
]

At least in terms of the "documents they select" which is merely the "individual" documents the database actually contains. The $in is actually a little more optimal here because the query engine can see that the OR is on the "same key", and this saves some cost in the query plan execution.

Also, just to note the actual "query plan execution" which can be viewed with explain() will actually show the "duplicate" entries are removed anyway:

        "filter" : {
                "_id" : {
                        "$in" : [
                                "a",
                                "b",
                                "c"
                        ]
                }
        },

Notably though the $or would not actually remove the conditions, which is really just another point of why $in is more efficient as a query here, and even so the $or still is not going to get the same matching document more than once.

But from the perspective of "selection", then asking for the same criteria "multiple times" does not result in retrieving "multiple times". The same is true of the order of arguments in that they have no effect on how the order is returned from the database itself. Nor would it really make any sense to actually retrieve "multiple copies" from a "database" perspective since this is basically redundant.

Instead what you are really asking is "I have a list, now I want to substitute those values with documents from the database". That is actually a reasonable ask, and relatively easy to achieve. Your actual implementation really just depends on where you get the data from.

Mapping the Results

In the case you have a "list" from an external source and want the database objects, then the logical thing to do is return the matching documents and then substitute into your ordered list with the returned documents.

In modern NodeJS environments this is as simple as:

let list = ["a", "a", "a", "b", "c"];
let songs = await Song.find({ "_id": { "$in": list } });

songs = list.map(e => songs.find(s => s._id === e));

And now the songs list has an entry for each item in list in the same order but actually with the real database document as returned.

If you are dealing with actual ObjectId values within _id, then it's better to "cast" the values in the list and use the ObjectId.equals() function to compare the "objects":

// of course not "valid" ObjectId here; but
let list = ["a", "a", "a", "b", "c"].map(e => ObjectId(e));     // casting
let songs = await Song.find({ "_id": { "$in": list } });

songs = list.map(e => songs.find(s => s._id.equals(e)));        // compare

Without the async/await keywords enabled by default from NodeJS 8.x releases or enabling explicitly in earlier versions, then standard Promise resolution will do:

// of course not "valid" ObjectId here; but
let list = ["a", "a", "a", "b", "c"].map(e => ObjectId(e));     // casting

Song.find({ "_id": { "$in": list } }).then(songs =>
  list.map(e => songs.find(s => s._id.equals(e)))              // compare
).then(songs => {
  // do something
})

Or with a callback

let list = ["a", "a", "a", "b", "c"].map(e => ObjectId(e));     // casting
Song.find({ "_id": { "$in": list } },(err,songs) => {
  songs = list.map(e => songs.find(s => s._id.equals(e)));     // compare
})

Note that this is significantly different to "mapping the function" as was mentioned in comment on the question. There really is no point in "asking the database multiple times" when you already have the results returned from "one" request. Therefore doing something like:

let songs = await Promise.all(list.map(_id => Song.findById(_id)));

That's quite horribly redundant and creating additional requests and overhead just for the sake of doing requests. So you would not do that and instead do the "one" request and "re-map" onto the list as that simply makes the most sense.

More to the point of the actual implementation you have though is that this "re-mapping" still really has no place at this level of the API. What should really be happening is "ideally" your "front end" actually makes the request with the "unique" _id list "only". Then the request is passed through allowing the database to respond and simply return the matching documents. As a workflow:

Front End              Back End                            Front End
---------              ------------                        -------
List -> Unique List -> Endpoint => Database => Endpoint -> Doc List -> Remap List

So really from the server "Endpoint" and "Database" perpective the "documents" as returned should be all they handle. This decreases the payload of network traffic in the request by removing all duplicates. Only when processing at the "Front End" when receiving the response of those "three" documents in the sample would you actually "re-map" to the final list containing the duplicate copies.

Populate

On the other hand if you are actually using data already contained in a document, then Mongoose already supports this where your "list" is already an array within a document. For example as a document for a SongList model:

{
  "list": ["a", "a", "a", "b", "c"]
}

Calling populate where that "list" is actually a list of references to the Song model items will return each "copy" and in order that the list in the document is stored with:

SongList.find().populate('list')

The reason for this is .populate() basically issues that same $in query anyway, using the arguments found in the "list" field for the document. Then those query results are actually "mapped" onto that array using what is essentially exactly the same code as demonstrated above.

So if that is your actual use case, this is already "built in" and there is no need to go and do the query yourself:


The following shows an example listing of adding "three" songs and using the same "mapping" techniques as well as showing what populate() just does automatically

const { Schema, Types: { ObjectId } } = mongoose = require('mongoose');
const { uniq } = require('lodash');

const uri = 'mongodb://localhost/songs';

mongoose.set('debug', true);
mongoose.Promise = global.Promise;

const songSchema = new Schema({
  name: String
});

const songListSchema = new Schema({
  list: [{ type: Schema.Types.ObjectId, ref: 'Song' }]
});

const Song = mongoose.model('Song', songSchema);

const SongList = mongoose.model('SongList', songListSchema);

const log = data => console.log(JSON.stringify(data, undefined, 2));

(async function() {

  try {

    const conn = await mongoose.connect(uri);
    const db = conn.connections[0].db;

    let { version } = await db.command({ "buildInfo": 1 });
    version = parseFloat(version.match(new RegExp(/(?:(?!-).)*/))[0]);

    await Promise.all(Object.entries(conn.models).map(([k,m]) => m.remove()));

    let [a,b,c] = await Song.insertMany(['a','b','c'].map(name => ({ name })));

    await SongList.create({ list: [ a, a, b, a, c ] });


    // populate is basically mapping the list
    let popresult = await SongList.find().populate('list');
    log({ popresult });


    // Using an id list
    let list = [a, a, b, a, c].map(e => e._id);

    // Use a unique copy for the $in to save bandwidth
    let unique = uniq(list);


    // Map the result
    let songs = await Song.find({ _id: { $in: unique } });
    songs = list.map(e => songs.find(s => s._id.equals(e)));
    log({ songs })


    if ( version >= 3.4 ) {
    // Force the server to return copies
      let stupid = await Song.aggregate([
        { "$match": { "_id": { "$in": unique } } },
        { "$addFields": {
          "copies": {
            "$filter": {
              "input": {
                "$map": {
                  "input": {
                    "$zip":  {
                      "inputs": [
                        { "$literal": list },
                        { "$range": [0, { "$size": { "$literal": list } } ] }
                      ]
                    }
                  },
                  "in": {
                    "_id": { "$arrayElemAt": [ "$$this", 0 ] },
                    "idx": { "$arrayElemAt": [ "$$this", 1 ] }
                  }
                }
              },
              "cond": { "$eq": ["$$this._id", "$_id"] }
            }
          }
        }},
        { "$unwind": "$copies" },
        { "$sort": { "copies.idx": 1 } },
        { "$project": { "copies": 0 } }
      ]);
      log({ stupid })

    }

  } catch(e) {
    console.error(e)
  } finally {
    process.exit()
  }


})()

And this gives you output as follows:

Mongoose: songs.remove({}, {})
Mongoose: songlists.remove({}, {})
Mongoose: songs.insertMany([ { _id: 5b06c2ff373eb00d9610aa6e, name: 'a', __v: 0 }, { _id: 5b06c2ff373eb00d9610aa6f, name: 'b', __v: 0 }, { _id: 5b06c2ff373eb00d9610aa70, name: 'c', __v: 0 } ], {})
Mongoose: songlists.insertOne({ list: [ ObjectId("5b06c2ff373eb00d9610aa6e"), ObjectId("5b06c2ff373eb00d9610aa6e"), ObjectId("5b06c2ff373eb00d9610aa6f"), ObjectId("5b06c2ff373eb00d9610aa6e"), ObjectId("5b06c2ff373eb00d9610aa70") ], _id: ObjectId("5b06c2ff373eb00d9610aa71"), __v: 0 })
Mongoose: songlists.find({}, { fields: {} })
Mongoose: songs.find({ _id: { '$in': [ ObjectId("5b06c2ff373eb00d9610aa6e"), ObjectId("5b06c2ff373eb00d9610aa6f"), ObjectId("5b06c2ff373eb00d9610aa70") ] } }, { fields: {} })
{
  "popresult": [
    {
      "list": [
        {
          "_id": "5b06c2ff373eb00d9610aa6e",
          "name": "a",
          "__v": 0
        },
        {
          "_id": "5b06c2ff373eb00d9610aa6e",
          "name": "a",
          "__v": 0
        },
        {
          "_id": "5b06c2ff373eb00d9610aa6f",
          "name": "b",
          "__v": 0
        },
        {
          "_id": "5b06c2ff373eb00d9610aa6e",
          "name": "a",
          "__v": 0
        },
        {
          "_id": "5b06c2ff373eb00d9610aa70",
          "name": "c",
          "__v": 0
        }
      ],
      "_id": "5b06c2ff373eb00d9610aa71",
      "__v": 0
    }
  ]
}
Mongoose: songs.find({ _id: { '$in': [ ObjectId("5b06c2ff373eb00d9610aa6e"), ObjectId("5b06c2ff373eb00d9610aa6f"), ObjectId("5b06c2ff373eb00d9610aa70") ] } }, { fields: {} })
{
  "songs": [
    {
      "_id": "5b06c2ff373eb00d9610aa6e",
      "name": "a",
      "__v": 0
    },
    {
      "_id": "5b06c2ff373eb00d9610aa6e",
      "name": "a",
      "__v": 0
    },
    {
      "_id": "5b06c2ff373eb00d9610aa6f",
      "name": "b",
      "__v": 0
    },
    {
      "_id": "5b06c2ff373eb00d9610aa6e",
      "name": "a",
      "__v": 0
    },
    {
      "_id": "5b06c2ff373eb00d9610aa70",
      "name": "c",
      "__v": 0
    }
  ]
}
Mongoose: songs.aggregate([ { '$match': { _id: { '$in': [ 5b06c2ff373eb00d9610aa6e, 5b06c2ff373eb00d9610aa6f, 5b06c2ff373eb00d9610aa70 ] } } }, { '$addFields': { copies: { '$filter': { input: { '$map': { input: { '$zip': { inputs: [ { '$literal': [Array] }, { '$range': [Array] } ] } }, in: { _id: { '$arrayElemAt': [ '$$this', 0 ] }, idx: { '$arrayElemAt': [ '$$this', 1 ] } } } }, cond: { '$eq': [ '$$this._id', '$_id' ] } } } } }, { '$unwind': '$copies' }, { '$sort': { 'copies.idx': 1 } }, { '$project': { copies: 0 } } ], {})
{
  "stupid": [
    {
      "_id": "5b06c2ff373eb00d9610aa6e",
      "name": "a",
      "__v": 0
    },
    {
      "_id": "5b06c2ff373eb00d9610aa6e",
      "name": "a",
      "__v": 0
    },
    {
      "_id": "5b06c2ff373eb00d9610aa6f",
      "name": "b",
      "__v": 0
    },
    {
      "_id": "5b06c2ff373eb00d9610aa6e",
      "name": "a",
      "__v": 0
    },
    {
      "_id": "5b06c2ff373eb00d9610aa70",
      "name": "c",
      "__v": 0
    }
  ]
}

"Stupid" Aggregation Tricks

This really is not a solution but really more of a post on the subject before somebody else mentions it or something similar.

Falling more under the category of "stupid tricks" is actually forcing the server to return the "copies" of the documents.

let stupid = await Song.aggregate([
  { "$match": { "_id": { "$in": list } } },
  { "$addFields": {
    "copies": {
      "$filter": {
        "input": {
          "$map": {
            "input": {
              "$zip":  {
                "inputs": [
                  list,
                  { "$range": [0, { "$size": { "$literal": list } } ] }
                ]
              }
            },
            "in": {
              "_id": { "$arrayElemAt": [ "$$this", 0 ] },
              "idx": { "$arrayElemAt": [ "$$this", 1 ] }
            }
          }
        },
        "cond": { "$eq": ["$$this._id", "$_id"] }
      }
    }
  }},
  { "$unwind": "$copies" },
  { "$sort": { "copies.idx": 1 } },
  { "$project": { "copies": 0 } }
]);

That actually will return all the document "copies" from the server. It does so via the $unwind on the list output processed with $filter to keep only those values which match the current document _id. Multiples will be retained in that array which when processed with $unwind effectively produces a "copy" of the document for each array entry.

As a bonus we keep the "idx" of the items in the list via mapping an "index" position into the array via $zip and $range The following $sort will then place the documents in order of how they appear in the input list, just to mimic the Array.map() which is being done in the code you should be using.

We can then simply $project to "exclude" that field which was only there as a temporary measure.

All of that said, it's not really a great idea to do such a thing. As already mentioned you are essentially increasing the payload by doing so, when it's really far more logical to construct the "mapping" in the client. And ideally the "end" client as already mentioned.

like image 128
Neil Lunn Avatar answered Sep 28 '22 05:09

Neil Lunn