Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to log progress of tasks in Talend Open Studio?

Tags:

talend

I have some sample jobs that migrate data from one database to another and I would like to have some information about the current progress, like the one you have when the job is run interactively from the application itself (I export and run it from command line). I use flowMeter and statsCatcher but everything i got is the overall time and the overall number of records passed (e.g. 4657 sec, 50.000.000 rows). Is there any solution to get a decent log ?

like image 608
Bax Avatar asked Jun 17 '13 14:06

Bax


People also ask

What is the use of T log component in Talend?

The tLogRow component is part of the Logs & Errors family of components. tLogRow allows you to write data, that is flowing through your Job (rows), to the console.

How do you execute a Talend job?

In the Menu tree view of your Talend Administration Center, click Job Conductor to display the Job conductor page. From the toolbar on the Job Conductor page, click Add > Normal Task to clear the Execution task configuration panel. In the Label field, enter the name you want to give to the task to be triggered.


3 Answers

Your solution is about adding a conditional clause to logging. Something true one row every, let's say, 50000. This condition using a sequence should work:

Numeric.sequence("log_seq",1,1) % 50000 == 0 

You can use the custom component bcLogBack to basically output your log using an sl4j facade stack. The component has an option called "Conditional logging" to send the message only when the condition evaluate to true.

Alternatively, if you don't like the idea of install a custom component, you can end your subjob using the standard tLogRow (or tWarn, tDie or whatever) prefixed by a tFilter with the same expression as advanced condition. This way you'll let the stream pass (and the log message to be triggered) just one time every 50000. Here's a very basic job diagram

//---->tMySqlOutput--->tFilter-----//filter--->tWarn (or tLogRow)
like image 197
Gabriele B Avatar answered Oct 16 '22 16:10

Gabriele B


As far as I know, tLogRow outputs to the console. So you can easily plug an output into it.

If tLogRow isn't enough, you can plug your output into a TJavaFlex component. There you could use something like log4j or any custom output.

You can also use tFileDelimitedOutput as a log file. This component have a nice "append" option that works like a charm for this use case.


For your question above : How to obtain the log information

By experience, I can tell that some components outputs the flow. For example, the tMysqlInput outputs the successfully inserted rows.

Generally, to log the information I use the component tReplicate which allows me to output a copy of the flow to a log file.

 tMySqlOutput ---- tReplicate ----- tMap -------- tMySqlInput (insert in DB)
                              +---- tMap -------- tDelimitedFile (log info)
like image 40
Jean-Michel Garcia Avatar answered Oct 16 '22 17:10

Jean-Michel Garcia


You can also use tWarn in combination with tLogCatcher:

tMySqlOutput ---- tFilter ---- tWarn

tLogCatcher ---- tMap ---- tLogRow

tFilter would prevent you from logging a progress on every row completion (see Gabriele B's answer). tWarn would have the actual message you want to log out.

tLogCatcher should get inputs from all of the tWarns, tMapper transforms each row from the logCatcher into an output row, and tLogRow logs it.

That answer is described in more detail (with pictures): http://blog.wdcigroup.net/2012/05/error-handling-in-talend-using-tlogcatcher/

like image 1
Denise Avatar answered Oct 16 '22 17:10

Denise