I'm considering using Data Lake technologies which I have been studying for the latest weeks, compared with the traditional ETL SSIS scenarios, which I have been working with for so many years.
I think of Data Lake as something very linked to big data, but where is the line between using Data Lake technolgies vs SSIS?
Is there any advantage of using Data Lake technologies with 25MB ~100MB ~ 300MB files? Parallelism? flexibility? Extensible in the future? Is there any performance gain when the files to be loaded are not so big as U-SQL best scenario...
What are your thoughts? Would it be like using a hammer to crack a nut? Please, don't hesitate to ask me any questions to clarify the situation. Thanks in advance!!
21/03 EDIT More clarifications:
Don't get me wrong, I really like ADL techonologies, but I think that for now, it's for something very specific and still there is no substitution for SSIS in the cloud. What do you thing? Am I wrong?
It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch, streaming and interactive analytics. Azure Data Lake works with existing IT investments for identity, management and security for simplified data management and governance.
ADF helps in transforming, scheduling and loading the data as per project requirement. Whereas Azure Data Lake is massively scalable and secure data lake storage for storing optimized workloads. It can store structured, semi structured and unstructured data seamlessly.
However, organizations find that simply pouring all of the data into object storage such as Amazon S3 does not mean you have an operational data lake quite yet; to actually put that data to use in analytics or machine learning, developers need to build ETL flows that transform raw data into structured datasets they can ...
Traditional SMP dedicated SQL pools use an Extract, Transform, and Load (ETL) process for loading data. Synapse SQL, within Azure Synapse Analytics, uses distributed query processing architecture that takes advantage of the scalability and flexibility of compute and storage resources.
For me, if the data is highly structured and relational, the right place for it is a relational database. In Azure you have several choices:
For all database options you can use clustered columnstore indexes, (the default in ADW), which can give massive compression, between 5x and 10x.
400MB per day for a year totals ~143GB, which honestly is not that much in modern data warehouse terms, which are normally measured in terabytes (TB).
Where Azure Data Lake Analytics (ADLA) comes in, is doing things you cannot do in ordinary SQL, like:
ADLA also offers federated queries, the ability to "query data where it lives", ie bring together structured data from your database and unstructured data from your lake.
Your decision seems more to do with whether or not you should be using the cloud. If you need the elastic and scalable features of cloud then Azure Data Factory is the tool for moving data from place to place in the cloud.
HTH
Be careful. This question is likely to get closed for being too broad.
There are many arguments for and against. We can't discuss them all here.
ADL isn't a replacement for SSIS. The consultants answer as always will be.. it depends what your doing/trying to do.
A simplistic answer might be. ADL is unlimited and highly scalable. SSIS is not. But, yes, ADL has a high entry point for small files because of that scalability.
Generally I don't think the two technologies are comparable.
If you want SSIS in Azure. Wait for MS to release it as a PaaS. Or use a virtual machine.
I think for simpler transformations it may be a good solution, however if you have complexities, notifications etc. it may be incompatible. A typical scenario would be something like transforming a JSON document to CSV, then taking the CSV and running that through SSIS for further transforms. There is certainly a future state that will enable U-SQL to to be much more powerful, for now I think there are separate and distinct uses for U-SQL/ADLA/ADLS and SSIS.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With