I was asked to prototype two ETL frameworks. The requirements are as follows:
The raw file can be anything (excel, csv, html page etc..) The target database is MySQL.
Dont just drop names, please indicate the advantages/disadvantages based from your experience.
Thanks!
Open source ETL toolsBubbles. CloverETL. Pentaho Data Integration (Kettle) Petl.
ETL is outdated. It works with traditional data center infrastructures, which cloud technologies are already replacing. The loading time takes hours, even for businesses with data sets that are just a few terabytes in size. ELT is the future of data warehousing and efficiently utilizes current cloud technologies.
Hevo Data is a closed-source, managed ETL service that was created in 2017. As of September 2021, they have built 110 data connectors and have hundreds of customers. Hevo Data offers real-time replication to their destinations.
Not an Open-source Tool: Informatica does not have an open-source version. This makes it difficult for customers to afford the maintenance fee, which is quite expensive. Most companies prefer flexible licensing to access, use, and distribute it among users.
One of the most popular Java based ETL would be Talend.
Jaspersoft ETL is another one extended from Talend and has a nice eclipse based UI.
I've used Kettle. It has its own GUI, but if you rather use the API to do the ETL yourself it's also supported. It has proved to be very useful to me and there are a few plugins already available for it.
Another option is CloverETL. It is written in Java and there is an open source, LGPL version of its Engine. As well it has a free version of GUI called CloverETL Community.
It can process any of the indicated sources and connects to a number of databases, including MySQL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With