Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterating over MongoDB cursor is slow

I need to iterate over a full MongoDb collection with ~2 million documents. Therefore I am using the cursor feature and the eachAsync function. However I noticed that it's pretty slow (it takes more than 40 minutes). I tried different batchSizes up to 5000 (which would be just 400 queries against MongoDB).

The application doesn't take much CPU (0.2% - 1%), nor does it take much RAM or IOPs. So apparently my code can be optimized to speed up this process.

The code:

  const playerProfileCursor = PlayerProfile.find({}, { tag: 1 }).cursor({ batchSize: 5000 })
  const p2 = new Promise<Array<string>>((resolve, reject) => {
    const playerTags:Array<string> = []
    playerProfileCursor.eachAsync((playerProfile) => {
      playerTags.push(playerProfile.tag)
    }).then(() => {
      resolve(playerTags)
    }).catch((err) => {
      reject(err)
    })
  })

When I set a breakpoint inside of the eachAsync function body it will immediately hit. So there is nothing stuck, it's just so slow. Is there a way to speed this up?

like image 852
kentor Avatar asked Mar 07 '23 20:03

kentor


1 Answers

That feature was added in version 4.12 (most up to date atm) and isn't really documented yet.

eachAsync runs with a concurrency of 1 by default, but you can change it in the parameter 'parallel'. (as seen here)

Thus your code could look something like this:

const playerProfileCursor = PlayerProfile.find({}, { tag: 1 }).cursor({ batchSize: 5000 })
const p2 = new Promise<Array<string>>((resolve, reject) => {
const playerTags:Array<string> = []
playerProfileCursor.eachAsync((playerProfile) => {
  playerTags.push(playerProfile.tag)
}, { parallel: 50 }).then(() => {
  resolve(playerTags)
}).catch((err) => {
  reject(err)
})
})
like image 171
Riki Avatar answered Mar 11 '23 09:03

Riki