Imagine a social network app. Users follow other users and users take photos. Photos have tags of other users.
I'm trying to get an effective Cosmos db implementation of a graph for that app. I provide an SQL Server version as well as a benchmark.
Here is the graph:
Here is a table version of it:
Here is the Gremlin query:
g.V('c39f435b-350e-4d08-a7b6-dfcadbe4e9c5')
.out('follows').as('name')
.out('took').order(local).by('postedAt', decr).as('id', 'postedAt')
.select('id', 'name', 'postedAt').by(id).by('name').by('postedAt')
.limit(10)
Here is the equivalent SQL query (linq actually):
Follows
.Where(f => f.FollowerId == "c39f435b-350e-4d08-a7b6-dfcadbe4e9c5")
.Select(f => f.Followees)
.SelectMany(f => f.Photos)
.OrderByDescending(f => f.PostedAt)
.Select(f => new { f.User.Name, f.Id, f.PostedAt})
.Take(10)
That user follows 136 users who collectively took 257 photos.
Both SQL Server and Cosmos db are in West Europe Azure location. I'm in France. I did a bit of testing on Linpad.
How can I get the feed faster and cheaper with cosmos db?
Note: In order to get the RU charged, I'm using Microsoft.Azure.Graph
. But I can also use Gremlin.Net
and get similar results.
If your requirement is to use a globally distributed and multi-model database. When you have a bit more budget as the cosmos DB is a bit expensive as compared to SQL server.
The Serverless Cosmos DB calculates costs per million request units. For example, it costs $0.25 for 1 million RUs in East US. Serverless tier also costs for transactional storage apart from the request units. It charges $0.250 per GB \month.
Azure SQL is based on SQL Server engine, you can easily migrate applications and continue to use the tools, languages, and resources that you're familiar with. Azure Cosmos DB is used for web, mobile, gaming, and IoT application that needs to handle massive amounts of data, reads, and writes at a global scale.
Cosmos DB is not a replacement for SQL Server. You would very, very rarely, if ever, migrate your data from an existing SQL Server database to Cosmos DB.
I know this question is old but here is my tip to help you use cosmos db
in efficient way and reduce the RU/s
as possible.
330 RU
is a lot for such a query, the problem here that makes you consume a lot of RU
is the partitioning, when you add partition to the database you are telling cosmos db
to partition the data logically by the partition key that you provide, so in your case the best partition key is the user.
Generally to know the best partition key you should first start from your queries, so for example write down all you queries and check what is the top attribute or field you filter your queries with to get back your data. the attribute you choose is the partition key.
If you didn't add partition key you will tell cosmosdb
to search for users and if users are spread on many servers and many partitions when scaled, the cosmosdb
will search in all partitions ( servers ) and this will cost you a lot, so if you have for example 6 servers, cosmosdb
will run queries on the 6 servers till it finds your user, it may find it in first server or the second but also it may find it in the last server so it will take a lot of time and not guaranteed.
The second thing is containers
, container is the unit of scaling in cosmosdb
, so when cosmosdb
wants to scale, it scales the container and all the data in it. so a good practice is to add entities that are queried a lot in their own container so cosmosdb
can scale them easily using the partition key assigned to each container.
Maybe I helped you to reduce the RU/s
in a different way. hope this answer helps who face the same problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With