Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ETL/ Data Warehousing Approach using APIs

Tags:

I have about 20 different data sources, all small applications which have APIs (some soap and some rest). I need to combine all this data, transform it and store data into a data warehouse. I am considering Azure SQL database. However, I am not sure what tool I can use to actually obtain the APIs and data. Each data source has limitations in terms of connections that can be made in a day and most of these APis require running multiple APIs to get the data.

I looked at Azure Data Factory---- It does not support SOAP APIs I looked at Logic Apps ---- I'm not sure it can perform complex ETLs and can retrieve data

The other option I was thinking was to have Azure VM with SQL Server installed and then using Talend to get the data via REST AND SOAP Connectors and the perform the ETLs. The other approach I can take is instead of using Talend, using SSIS. But I do believe I would need third party Rest and Soap Connectors as well.

I am looking for the most cost effective and scaleable solution.

Any suggestion would be extremely helpful.

like image 899
a415 Avatar asked Mar 25 '17 06:03

a415


1 Answers

Azure Data Factory (currently) is more an orchestrator of transformation activities than it is a fully fledged ETL tool, and it can run custom activities (e.g. in C#) that can work with web service APIs. Its moving on the whole time. See here:

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-use-custom-activities

Using Azure Data Factory to get data from a REST API

However you're right that given the scale of your transformations, you may wish to consider an alternative. At the time of speaking, I don't think many people have cracked the nut of how to replicate a properly complex on-premise ETL experience in the cloud- but this will change rapidly.

The other option you mentioned would work (IaaS Azure VM with ETL software installed). For Talend, you wouldn't need SQL Server installed, unless you wanted to store the data there instead of in Azure SQL Database. It comes pre-loaded with lots of connectors and is a good option if you have Talend skills.

Alternatively SSIS would also work: its pre-loaded Web Service task covers your SOAP APIs https://www.mssqltips.com/sqlservertip/3272/example-using-web-services-with-sql-server-integration-services/ and you can either use a 3rd party task/connector for REST, or use a Script Task/Transform, i.e. your own custom C# code.

like image 140
Rich Avatar answered Sep 21 '22 10:09

Rich