Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to insert 100,000+ records into DocumentDB

As the title suggests, I need to insert 100,000+ records into a DocumentDb collection programatically. The data will be used for creating reports later on. I am using the Azure Documents SDK and a stored procedure for bulk inserting documents (See question Azure documentdb bulk insert using stored procedure).

The following console application shows how I'm inserting documents.

InsertDocuments generates 500 test documents to pass to the stored procedure. The main function calls InsertDocuments 10 times, inserting 5,000 documents overall. Running this application results in 500 documents getting inserted every few seconds. If I increase the number of documents per call I start to get errors and lost documents.

Can anyone recommend a faster way to insert documents?

static void Main(string[] args)
{
    Console.WriteLine("Starting...");

    MainAsync().Wait();
}

static async Task MainAsync()
{
    int campaignId = 1001,
        count = 500;

    for (int i = 0; i < 10; i++)
    {
        await InsertDocuments(campaignId, (count * i) + 1, (count * i) + count);
    }
}

static async Task InsertDocuments(int campaignId, int startId, int endId)
{
    using (DocumentClient client = new DocumentClient(new Uri(documentDbUrl), documentDbKey))
    {
        List<dynamic> items = new List<dynamic>();

        // Create x number of documents to insert
        for (int i = startId; i <= endId; i++)
        {
            var item = new
            {
                id = Guid.NewGuid(),
                campaignId = campaignId,
                userId = i,
                status = "Pending"
            };

            items.Add(item);
        }

        var task = client.ExecuteStoredProcedureAsync<dynamic>("/dbs/default/colls/campaignusers/sprocs/bulkImport", new RequestOptions()
        {
            PartitionKey = new PartitionKey(campaignId)
        },
        new
        {
            items = items
        });

        try
        {
            await task;

            int insertCount = (int)task.Result.Response;

            Console.WriteLine("{0} documents inserted...", insertCount);
        }
        catch (Exception e)
        {
            Console.WriteLine("Error: {0}", e.Message);
        }
    }
}
like image 400
Mark Clancy Avatar asked Jan 19 '17 14:01

Mark Clancy


3 Answers

The fastest way to insert documents into Azure DocumentDB. is available as a sample on Github: https://github.com/Azure/azure-documentdb-dotnet/tree/master/samples/documentdb-benchmark

The following tips will help you achieve the best througphput using the .NET SDK:

  • Initialize a singleton DocumentClient
  • Use Direct connectivity and TCP protocol (ConnectionMode.Direct and ConnectionProtocol.Tcp)
  • Use 100s of Tasks in parallel (depends on your hardware)
  • Increase the MaxConnectionLimit in the DocumentClient constructor to a high value, say 1000 connections
  • Turn gcServer on
  • Make sure your collection has the appropriate provisioned throughput (and a good partition key)
  • Running in the same Azure region will also help

With 10,000 RU/s, you can insert 100,000 documents in about 50 seconds (approximately 5 request units per write).

With 100,000 RU/s, you can insert in about 5 seconds. You can make this as fast as you want to, by configuring throughput (and for very high # of inserts, spread inserts across multiple VMs/workers)

EDIT: You can now use the bulk executor library at https://learn.microsoft.com/en-us/azure/cosmos-db/bulk-executor-overview, 7/12/19

like image 58
Aravind Krishna R. Avatar answered Nov 16 '22 07:11

Aravind Krishna R.


The Cosmos Db team have just released a bulk import and update SDK, unfortunately only available in Framework 4.5.1 but this apparently does a lot of the heavy lifting for you and maximize use of throughput. see

https://learn.microsoft.com/en-us/azure/cosmos-db/bulk-executor-overview https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-sdk-bulk-executor-dot-net

like image 36
Glenit Avatar answered Nov 16 '22 07:11

Glenit


Cosmos DB SDK has been updated to allow bulk insert: https://learn.microsoft.com/en-us/azure/cosmos-db/tutorial-sql-api-dotnet-bulk-import via the AllowBulkExecution option.

like image 3
Fabian Nicollier Avatar answered Nov 16 '22 06:11

Fabian Nicollier