Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure Data Factory copy data is slow

Source database: PostgreSQL hosted on Azure VM D16s_v3 Destination database: SQL Server developer edition hosted on Azure VM D4s_v3 Source database is around 1TB in size Destination database is empty with existing schema identical to source database

Throughput is only 1mb/s. Nothing helps. (I've selected max DIU) SQL Server doesn't have any keys or indexes at this point.

Batch size is 10000

See screenshot: enter image description here

like image 994
user194076 Avatar asked Nov 17 '22 23:11

user194076


1 Answers

I got nailed by something similar when using ADF to copy data from an on-premises Oracle source to an Azure SQL Database sink. The same exact job performed via SSIS was something like 5 times faster. We began to suspect that something was amiss with data types, because the problem disappeared if we cast all of our high-precision Oracle NUMBER columns to less precision, or to something like integer.

It got so bad that we opened a case with Microsoft about it, and our worst fears were confirmed.

The Azure Data Factory runtime decimal type has a maximum precision of 28. If a decimal/numeric value from the source has a higher precision, ADF will first cast it to a string. The performance of the string casting code is abysmal.

Check to see if your source has any high-precision numeric data, or if you have not explicitly defined schema, see if you're perhaps accidentally using string.

like image 140
Pittsburgh DBA Avatar answered Jan 12 '23 17:01

Pittsburgh DBA