Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Logging hadoop map process

Tags:

hadoop

I'm working on hadoop. I have 100k Zip files and processing files using mapreduce But now I have a task that I need to keep track some logs.

1.Zip files processed 2. Zip files need to be process 3. Status of the process, Like error or success

I'm doing it by using the following method

catch (Exception Ex)
        {
            System.out.println("Killing task ");
            runningJob.killTask((TaskAttemptID)context.getTaskAttemptID(), true);

        }

Like this . But now i need to store it in a a common place

How can i do it

I though of storing it in Hbsae. Ideas are welcome Kindly help me

like image 695
backtrack Avatar asked Nov 02 '22 09:11

backtrack


1 Answers

Here some ideas for you:

  1. Use custom task counters. http://lintool.github.io/Cloud9/docs/content/counters.html they are very lightweight and great way to keep track of small values.

  2. If you need to record more details. There are two ways of doing this. First you can just output log statements as part of your map job. Then you split your pipeline, using two simple filters (map jobs). First filter will take the output of your zip processing and will plug into the rest of your pipeline, second filter will take the log statements and save them into separate location, for further analysis.

    Using HBase would work too, but will bring extra complexity and utilize lot more resources on your cluster. Unless you already have an HBase as a part of your pipeline.

like image 194
Vlad Avatar answered Nov 15 '22 08:11

Vlad