Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pentaho Data Integration: Error Handling

I'm building out an ETL process with Pentaho Data Integration (CE) and I'm trying to operationalize my Transformations and Jobs so that they'll be able to be monitored. Specifically, I want to be able to catch any errors and then send them to an error reporting service like Honeybadger or New Relic. I understand how to do row-level error reporting but I don't see a way to do job or transaction failure reporting.

Here is an example job.

  • The down path is where the transformation succeeds but has row errors. There we can just filter the results and log them.
  • The path to the right is the case where the transformation fails all-together (e.g. DB credentials are wrong). This is where I'm having trouble: I can't figure out how to get the error info to be sent.

Example job
How do I capture transformation failures to be logged?

like image 870
jonnysamps Avatar asked Oct 19 '22 11:10

jonnysamps


1 Answers

You can not capture job-level errors details inside the job itself. However there are other options for monitoring.

First option is using database logging for transformations or jobs (see the "Log" tab in the job/trans parameters dialog) - this way you always have up-to-date information about the execution status so you can, say, write a job that periodically scans the logging database and sends error reports wherever you need.

Meanwhile this option seems to be something pretty heavy-weight for development and support and not too flexible for further modifications. So in our company we ended up with monitoring on a job-execution level - i.e. when you run a job with kitchen.bat and it fails by any reason you get an "error" status of execution of the kitchen, so you can easily examine it and perform necessary actions with whenever tools you'd like - .bat commands, PowerShell or (in our case) Jenkins CI.

like image 149
morincer Avatar answered Oct 21 '22 06:10

morincer