Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Batch Insert in Azure storage table

I am new to using azure storage table. I was trying to insert my entities in batch but I found that you cannot do batch operation having different partition key.

Is there some way I can do that there are about 10,000 - 20,000 of file details which I want to insert in a table.

Here is what I have tried so far :

public class Manifest:TableEntity
{
    private string name;
    private string extension;
    private string filePath;
    private string relativePath;
    private string mD5HashCode;
    private string lastModifiedDate;

    public void AssignRowKey()
    {
        this.RowKey = relativePath.ToString();
    }
    public void AssignPartitionKey()
    {
        this.PartitionKey = mD5HashCode;
    }
    public string Name { get { return name; } set { name = value; } }
    public string Extension { get { return extension; } set { extension = value; } }
    public string FilePath { get { return filePath; } set { filePath = value; } }
    public string RelativePath { get { return relativePath; } set { relativePath = value; } }
    public string MD5HashCode { get { return mD5HashCode; } set { mD5HashCode = value; } }
    public string LastModifiedDate { get { return lastModifiedDate; } set { lastModifiedDate = value; } }

}

My method this is in different class:

static async Task BatchInsert(CloudTable table, IEnumerable<FileDetails> files)
    {
        int rowOffset = 0;

        var tasks = new List<Task>();

        while (rowOffset < files.Count())
        {
            // next batch
            var rows = files.Skip(rowOffset).Take(100).ToList();

            rowOffset += rows.Count;                

            var task = Task.Factory.StartNew(() =>
            {                  

                var batch = new TableBatchOperation();

                foreach (var row in rows)
                {
                    Manifest manifestEntity = new Manifest
                    {
                        Name = row.Name,
                        Extension = row.Extension,
                        FilePath = row.FilePath,
                        RelativePath = row.RelativePath.Replace('\\', '+'),
                        MD5HashCode = row.Md5HashCode,
                        LastModifiedDate = row.LastModifiedDate.ToString()
                    };
                    manifestEntity.AssignPartitionKey();                        
                    manifestEntity.AssignRowKey();
                    batch.InsertOrReplace(manifestEntity);
                }

                // submit
                table.ExecuteBatch(batch);

            });

            tasks.Add(task);
        }

         await Task.WhenAll(tasks);
}
like image 698
Sushant Baweja Avatar asked Dec 24 '22 03:12

Sushant Baweja


2 Answers

If you want to use batch operation, entities in the batch must have the same PartitionKey. Unfortunately there's no other option but to save them individually in your case.

The reason the partition key even exists is that Azure can distribute data across machines with no coordination between partitions. The system is designed such that different partitions cannot be used in the same transaction or operation.

You could upvote this issue to advance the realization of this function.

like image 82
Joey Cai Avatar answered Dec 25 '22 16:12

Joey Cai


Relatively there is no way to insert more than one entity without the same partition key using batch operation.

Some limitations of batch operations are

  • All entities in a single batch operation must have the same partition key.
  • A single batch operation can include only 100 entities.

Alternatively you would insert entities using "TableOperation.Insert()" which allows you to insert entities with the same partition key.

like image 26
Balasubramaniam Avatar answered Dec 25 '22 17:12

Balasubramaniam