Cosmos db graph vs Azure Sql Server - Performance and cost

Tags:

Imagine a social network app. Users follow other users and users take photos. Photos have tags of other users.

I'm trying to get an effective Cosmos db implementation of a graph for that app. I provide an SQL Server version as well as a benchmark.

Here is the graph: enter image description here

Here is a table version of it:

enter image description here

Here is the Gremlin query:

Click to copy

g.V('c39f435b-350e-4d08-a7b6-dfcadbe4e9c5')
.out('follows').as('name')
.out('took').order(local).by('postedAt', decr).as('id', 'postedAt')
.select('id', 'name', 'postedAt').by(id).by('name').by('postedAt')
.limit(10)

Here is the equivalent SQL query (linq actually):

Click to copy

Follows
.Where(f => f.FollowerId == "c39f435b-350e-4d08-a7b6-dfcadbe4e9c5")
.Select(f => f.Followees)
.SelectMany(f => f.Photos)
.OrderByDescending(f => f.PostedAt)
.Select(f => new { f.User.Name, f.Id, f.PostedAt})
.Take(10)

That user follows 136 users who collectively took 257 photos.

Both SQL Server and Cosmos db are in West Europe Azure location. I'm in France. I did a bit of testing on Linpad.

The Gremlin Query runs in over 1.20s and consumes about 330 RU. FYI, 400RU/s costs 20$/month.
The SQL query runs in 70ms. The db is 10 DTU (1 instance of S0). So it costs 12.65eur / month

How can I get the feed faster and cheaper with cosmos db?

Note: In order to get the RU charged, I'm using Microsoft.Azure.Graph. But I can also use Gremlin.Net and get similar results.

727

asked Mar 02 '18 09:03

François

1 Answers

I know this question is old but here is my tip to help you use cosmos db in efficient way and reduce the RU/s as possible.

330 RU is a lot for such a query, the problem here that makes you consume a lot of RU is the partitioning, when you add partition to the database you are telling cosmos db to partition the data logically by the partition key that you provide, so in your case the best partition key is the user.

Generally to know the best partition key you should first start from your queries, so for example write down all you queries and check what is the top attribute or field you filter your queries with to get back your data. the attribute you choose is the partition key.

If you didn't add partition key you will tell cosmosdb to search for users and if users are spread on many servers and many partitions when scaled, the cosmosdb will search in all partitions ( servers ) and this will cost you a lot, so if you have for example 6 servers, cosmosdb will run queries on the 6 servers till it finds your user, it may find it in first server or the second but also it may find it in the last server so it will take a lot of time and not guaranteed.

The second thing is containers, container is the unit of scaling in cosmosdb, so when cosmosdb wants to scale, it scales the container and all the data in it. so a good practice is to add entities that are queried a lot in their own container so cosmosdb can scale them easily using the partition key assigned to each container.

Maybe I helped you to reduce the RU/s in a different way. hope this answer helps who face the same problem.

answered Oct 05 '22 23:10

ahmed nader

Related questions
                            
                                SSL Certificate Verification fails when logging into Azure Container Registry
                            
                                ConcurrentModificationException when using Spark collectionAccumulator
                            
                                Azure Service Bus queue messages got stuck
                            
                                Azure Powershell throws "Unknown User Type"
                            
                                How to measure exactly what "data out" in Azure web app?
                            
                                MSAL for Android fails performing B2C login
                            
                                Azure SignalR Error: (429) Too Many Requests
                            
                                Streaming an Azure blob to the client asynchronously with .NET 4.5 async, await
                            
                                Azure Autoscale Restarts Running Instances
                            
                                Azure Redis Session State error Timeout performing EVAL, inst:1 , queue:2
                            
                                HTTP 502 from one instance of an Azure Web App
                            
                                Occasional __RequestVerificationToken errors
                            
                                How to avoid concurrency issues when scaling writes horizontally?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Cosmos db graph vs Azure Sql Server - Performance and cost

Tags:

azure

graph-databases

gremlin

azure-cosmosdb

sql-graph

François

People also ask

1 Answers

ahmed nader

Recent Activity

Donate For Us