Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Straight Java/Groovy versus ETL tool (Talend/etc) - what libraries would you use?

Assume you have a small project which on the surface looks like a good match for an ETL tool like Talend.

But assume further, that you have never used Talend and furthermore, you do not trust "visual programming" tools in general and would rather code everything the old fashioned way (text on a nice IDE!) with the help of an appropriate language & support libraries.

What are some language patterns & support libraries that could help you stay away from the ETL tool temptation/trap?

like image 212
Alex R Avatar asked Mar 12 '10 01:03

Alex R


People also ask

What is ETL tool in Java?

An ETL tool is a software used to extract, transform, and loading the data. In today's data-driven world, a vast amount of data is generated from various organizations, machines, and gadgets irrespective of their size.

What is Talend ETL tool?

Talend is an ETL tool for Data Integration. It provides software solutions for data preparation, data quality, data integration, application integration, data management and big data. Talend has a separate product for all these solutions. Data integration and big data products are widely used.

What is an ETL tool used for?

What is an ETL tool? Extract, Transform and Load (ETL)) is the process used to turn raw data into information that can be used for actionable business intelligence (BI). An ETL tool is an instrument that automates this process by providing three essential functions: Extraction of data from underlying data sources.


2 Answers

It depends on whether the deliverable is the processor or the output itself. If you just need to deliver the output, you don't need to maintain the code. If the code needs to be maintained then will it be you maintaining it or somebody else?

If somebody else needs to maintain I'd use Java or give them Talend.

If it's throwaway code, I'd use what will be easier or fun to program with.

If you need to maintain it and the processing is complex, I'd use Scala. It has:

  • some libraries to interact with databases
  • xml literals
  • parser combinators
  • interesting features on its collection packages (map, filter, groupBy, partition, ...)
  • and of course any other existing Java libraries.
like image 184
huynhjl Avatar answered Nov 15 '22 22:11

huynhjl


I used to think that "visual programming" is something for people who can't program. Then I was exposed to Talend in a project, and I realized that this type of tool is exactly right for the job, when it comes to moving data from A to B, and transforming it in the process. It's component-oriented software design, by a more academic label.

I still consider myself a decent programmer who can do anything, and then some, with a text editor and a shell prompt. But I've become a big fan of Talend as well.

Full disclosure: I now work for the company :-)

like image 30
drmirror Avatar answered Nov 15 '22 21:11

drmirror