I'm working on hadoop. I have 100k Zip files and processing files using mapreduce But now I have a task that I need to keep track some logs.
1.Zip files processed 2. Zip files need to be process 3. Status of the process, Like error or success
I'm doing it by using the following method
catch (Exception Ex)
{
System.out.println("Killing task ");
runningJob.killTask((TaskAttemptID)context.getTaskAttemptID(), true);
}
Like this . But now i need to store it in a a common place
How can i do it
I though of storing it in Hbsae. Ideas are welcome Kindly help me
Here some ideas for you:
Use custom task counters. http://lintool.github.io/Cloud9/docs/content/counters.html they are very lightweight and great way to keep track of small values.
If you need to record more details. There are two ways of doing this. First you can just output log statements as part of your map job. Then you split your pipeline, using two simple filters (map jobs). First filter will take the output of your zip processing and will plug into the rest of your pipeline, second filter will take the log statements and save them into separate location, for further analysis.
Using HBase would work too, but will bring extra complexity and utilize lot more resources on your cluster. Unless you already have an HBase as a part of your pipeline.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With