Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Populate Azure Data Factory dataset from query

Cannot find an answer via google, msdn (and other microsoft) documentation, or SO.

In Azure Data Factory you can get data from a dataset by using copy activity in a pipeline. The pipeline definition includes a query. All the queries I have seen in documentation are simple, single table queries with no joins. In this case, a dataset is defined as a table in the database with "TableName"= "mytable". Additionally, one could retrieve data from a stored procedure, presumably allowing more complex sql.

Is there a way to define a more complex query in a pipeline that includes joins and/or transformation logic that alters the data from or pipeline from a query rather than stored procedure. I know that you can specify fields in a dataset, but don't know how to get around the "tablename" property.

If there is a way, what would that method be?

input is on-premises sql server. output is azure sql database.

UPDATED for clarity.

like image 467
Stephen Lloyd Avatar asked Feb 10 '23 01:02

Stephen Lloyd


1 Answers

Yes, the sqlReaderQuery can be much more complex than what is provided in the examples, and it doesn't have to only use the Table Name in the Dataset.

In one of my pipelines, I have a Dataset with the TableName "dbo.tbl_Build", but my sqlReaderQuery looks at several tables in that database. Here's a heavily truncated example:

with BuildErrorNodes as (select infoNode.BuildId, ...) as MessageValue from dbo.tbl_BuildInformation2 as infoNode inner join dbo.tbl_BuildInformationType as infoType on (infoNode.PartitionId = infoType), BuildInfo as ...

It's a bit confusing to list a single table name in the Dataset, then use multiple tables in the query, but it works just fine.

like image 135
Luke Pillar Avatar answered Mar 12 '23 04:03

Luke Pillar