We have very long running stored procedure doing ETL work in order to load data from raw table into star schema (Fact - Dimensions) in Azure database.
This stored procedure takes around 10 hours to 20 hours running over 10 million rows (using MERGE statement).
At the moment, we run stored procedure from C# code (ADO.NET) with keeping CommandTimeout = 0
(forever). But sometime the connection is dropped as the connection to Azure database is unstable.
Is it possible to run stored procedure on database level without keeping connection opened all the time, and then log the process of Stored procedure in the Progress table to track down the progress?
I see some recommendations:
Agent Job, seems not possible on Azure database as it does not support at the moment.
SqlCommand.BeginExecuteNonQuery
: I am not sure 100% BeginExecuteNonQuery
still keeps connection opened under the hood or not.
Is there any other way to do this?
As of 15th August, 2022, The answer is No. They won't allow it. So the billing will continue for your Azure Database starting the day you create it. There really is no way to pause / stop billing for your Azure SQL Database.
Use of stored procedures can reduce network traffic between clients and servers, because the commands are executed as a single batch of code. This means only the call to execute the procedure is sent over a network, instead of every single line of code being sent individually.
Stored procedures are difficult to unit test. With an ORM, you can mock your database code so as to be able to test your business logic quickly. With stored procedures, you have to rebuild an entire test database from scratch. Stored procedures offer no performance advantage whatsoever.
If the error is transient, retry to open a connection. Do not directly retry a SQL Database or SQL Managed Instance SELECT statement that failed with a transient error. Instead, establish a fresh connection, and then retry the SELECT .
You can use Azure Automation runbook.
https://azure.microsoft.com/en-us/blog/azure-automation-your-sql-agent-in-the-cloud/?cdn=disable
https://gallery.technet.microsoft.com/scriptcenter/How-to-use-a-SQL-Command-be77f9d2
https://azure.microsoft.com/en-us/blog/azure-automation-runbook-management/?cdn=disable
http://www.sqlservercentral.com/articles/Azure+SQL+database/117804/
Hope this helps.
Regards,
Alberto Morillo
Azure Data Factory has a Stored Procedure task which could do this. It has a timeout
property in the policy
section which is optional. If you leave it out, it defaults to infinite:
"policy": {
"concurrency": 1,
"retry": 3
},
If you specify the timeout as 0 when creating the activity, you'll see it disappear when you provision the task in the portal. You could also try specify the timeout at 1 day (24 hours), eg "timeout": "1.00:00:00"
, although I haven't tested it times out correctly.
You could also set the timeout to 0 in the connection string although again I haven't tested this option, eg
{
"name": "AzureSqlLinkedService",
"properties": {
"type": "AzureSqlDatabase",
"typeProperties": {
"connectionString": "Server=tcp:<servername>.database.windows.net,1433;Database=<databasename>;User ID=<username>@<servername>;Password=<password>;Trusted_Connection=False;Encrypt=True;Connection Timeout=0"
}
}
}
I would regard this as more straightforward than Azure Automation but that's a personal choice. Maybe try both options and see which works best for you.
I agree with some of the other comments being made about the MERGE
taking too long for that volume of records. I suspect either your table does not have appropriate indexing to support the MERGE
or you're either running too low a service tier. What service tier are you running on, eg Basic,Standard, Premium (P1-P15). Consider raising a separate question with the DDL of your table including indexes and some sample data, the MERGE statement and service tier, I'm sure that can go faster.
As a test / quick fix, you could always refactor the MERGE as the appropriate INSERT
/ UPDATE
/ DELETE
- I bet it goes faster. Let us know.
The connection between Azure Data Factory and Azure database should be stable. If it isn't you can raise support tickets. However for cloud architecture (and really any architecture) you need to make good design decisions which allow for the possibility of things going wrong. That means architecturally, you have to design for the possibility of the connection dropping, or the job failing. Example is make sure your job is restartable from the point of failure, make sure the error reporting is good etc.
Also, from experience, given your data volumes (which I regard as low), this job is taking far too long. There must be an issue with it or the design. It is my strongest recommendation that you attempt to resolve this issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With