Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure data factory copy activity performance tuning

https://learn.microsoft.com/en-us/azure/data-factory/data-factory-load-sql-data-warehouse. According this link with 1000 DWU and polybase I should get 200MBps throughput. But I am getting 4.66 MBps. I have added user in xlargerc resource class to achieve best possible throughput from azure sql datawarehouse.

Below is the Pipeline JSON.

                         {
              "name": "UCBPipeline-Copy",
                 "properties": {
                   "description": "pipeline with copy activity",
                 "activities": [
            {
                "type": "Copy",
                "typeProperties": {
                    "source": {
                        "type": "BlobSource"
                    },
                    "sink": {
                        "type": "SqlDWSink",
                        "allowPolyBase": true,
                        "writeBatchSize": 0,
                        "writeBatchTimeout": "00:00:00"
                    },
                    "cloudDataMovementUnits": 4
                },
                "inputs": [
                    {
                        "name": "USBBlob_Concept
                    }
                ],
                "outputs": [
                    {
                        "name": "AzureDW_Concept"
                    }
                ],
                "policy": {
                    "timeout": "01:00:00",
                    "concurrency": 1
                },
                "scheduler": {
                    "frequency": "Day",
                    "interval": 1
                },
                "name": "AzureBlobtoSQLDW_Concept",
                "description": "Copy Activity"
            }
        ],
        "start": "2017-02-28T18:00:00Z",
        "end": "2017-03-01T19:00:00Z",
        "isPaused": false,
        "hubName": "sampledf1_hub",
        "pipelineMode": "Scheduled"
    }
}

Input dataset :

{
    "name": "AzureBlob_Concept",
    "properties": {
        "published": false,
        "type": "AzureBlob",
        "linkedServiceName": "AzureZRSStorageLinkedService",
        "typeProperties": {
            "fileName": "conceptTab.txt",
            "folderPath": "source/",
            "format": {
                "type": "TextFormat",
                "columnDelimiter": "\t"
            }
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        },
        "external": true,
        "policy": {}
    }
}

output dataset:

{
    "name": "AzureDW_Concept",
    "properties": {
        "published": false,
        "type": "AzureSqlDWTable",
        "linkedServiceName": "AzureSqlDWLinkedService",
        "typeProperties": {
            "tableName": "concept"
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        }
    }
}

is anything is missing in the configuration?

like image 533
vidyak Avatar asked Dec 21 '25 07:12

vidyak


1 Answers

I took a look on runId "e98ac557-a507-4a6e-8833-978eff1723c3", which should belong to your Copy Activity. From our service logs, the source file is not large enough (270 MB in your case), so that the service call latency would make the throughput not good enough. You could try loading bigger files to have better throughput.

like image 159
Yingqin Avatar answered Dec 24 '25 06:12

Yingqin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!