Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting in mongo sharding environments degrades performance

I have a query that performs something like this,

last_shipment_id = OrderDelivery.where(platform: 'business').desc(:shipment_id).limit(1).pluck(:shipment_id)[0]

It works great with only 1 ~ 5ms when I have correctly index with { platform: 1, shipment_id: -1 } without sharding environment in staging machine

However, our production is set up with 4 sharding mongo db, the result ends up in 1000 ~ 3000ms.

Does anyone know how this could happen or how might solve this case?

I've read about this slides https://www.slideshare.net/mongodb/how-queries-work-with-sharding

Well, it's said on slide 13, but still not sure it has mentioned how to solve the case though.

like image 743
Bernie Chiu Avatar asked May 14 '17 10:05

Bernie Chiu


People also ask

Does sharding improve performance in MongoDB?

Sharded clusters in MongoDB are another way to potentially improve performance. Like replication, sharding is a way to distribute large data sets across multiple servers.

Does sharding improve write performance?

Advantages of shardingIncreased read/write throughput — By distributing the dataset across multiple shards, both read and write operation capacity is increased as long as read and write operations are confined to a single shard.

What are the benefits of sharding in MongoDB?

Advantages of Sharding MongoDB distributes the read and write workload across the shards in the sharded cluster, allowing each shard to process a subset of cluster operations. Both read and write workloads can be scaled horizontally across the cluster by adding more shards.

How sharding happens in MongoDB?

Sharding is the process of distributing data across multiple hosts. In MongoDB, sharding is achieved by splitting large data sets into small data sets across multiple MongoDB instances.


1 Answers

You don't mention what your shard key is, but this query has to be scattered to all three shards, and that means if any of the shards are slow, the overall result will be slow. In the gist you include explain for shard1 (which is fast) but omit it for shard2 and shard3 and overall numbers show that it's slow on one of those shards.

This means either the optimal index is not present on one of those shards, or there is a different index present that's being picked even though it's suboptimal. The solution in the former case is to build all correct indexes, the solution in the second case is to use hint with the query to force the use of the correct index.

like image 63
Asya Kamsky Avatar answered Oct 20 '22 17:10

Asya Kamsky