Standard practices for logging in MapReduce jobs

Tags:

I'm trying to find the best approach for logging in MapReduce jobs. I'm using slf4j with log4j appender as in my other Java applications, but since MapReduce job runs in a distributed manner across the cluster I don't know where should I set the log file location, since it is a shared cluster with limited access privileges.

Is there any standard practices for logging in MapReduce jobs, so you can easily be able to look at the logs across the cluster after the job completes?

859

asked Jan 23 '15 21:01

Frank

1 Answers

You could use log4j which is the default logging framework that hadoop uses. So, from your MapReduce application you could do something like this:

import org.apache.log4j.Logger;
// other imports omitted

public class SampleMapper extends Mapper<LongWritable, Text, Text, Text> {
    private Logger logger = Logger.getLogger(SampleMapper.class);

    @Override
    protected void setup(Context context) {
        logger.info("Initializing NoSQL Connection.")
        try {
            // logic for connecting to NoSQL - ommitted
        } catch (Exception ex) {
            logger.error(ex.getMessage());
        }
    }

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        // mapper code ommitted
    }
}

This sample code will user log4j logger to log events to the inherited Mapper logger. All the log events will be logged to their respective task log's. You could visit the task logs from either JobTracker(MRv1)/ResourceManager(MRv2) webpage.

If you are using yarn you could access the application logs from command line using the following command:

yarn logs -applicationId <application_id>

While if you are using mapreduce v1, there is no single point of access from command line; hence you have to log into each TaskTracker and look in the configured path generally /var/log/hadoop/userlogs/attempt_<job_id>/syslog specified in ${hadoop.log.dir}/userlogs contains log4j output.

146

answered Oct 16 '22 09:10

Ashrith

Related questions
                            
                                Remove huge gaps between check boxes on panel
                            
                                Type variable vs instanceof for identification
                            
                                Openshift war successfully deployed but I still see the default welcome page
                            
                                Redirecting the stdout and stdin - Java
                            
                                Unfolding SimpleMapApp - Image files missing
                            
                                Java Object Array null Element Memory
                            
                                Set object reference to null or call the finalize() method?
                            
                                Java generics - purpose of wildcard except for lower bounds?
                            
                                Gradle: gradle install with javadocs
                            
                                Not able to enable a button after it is disabled in java swing
                            
                                How to Overlap buttons/Text over an image with JavaFX 8?
                            
                                TextFlow vs TextArea, layout problems; why is TextFlow messing it where TextArea does not?
                            
                                JavaFX8 - FXML How to call method with parameters in onAction-tag?
                            
                                AdvertisingIdClient getAdvertisingIdInfo blocked by main thread
                            
                                Fill ArrayList with colors for Android
                            
                                What is the most efficient way to reverse a char array?
                            
                                How to make java calendar to start weekday from Monday?
                            
                                Log4j2: How can I get Class Name and Line Number without using Throwable?
                            
                                How is using libraries checking nulls better than getting NPE? [duplicate]
                            
                                Why is java's SimpleDateFormat substracting 1 second from my UTC date when using SimpleTimeZone

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Standard practices for logging in MapReduce jobs

Tags:

java

hadoop

hadoop2

mapreduce

mapr

Frank

People also ask

1 Answers

Ashrith

Recent Activity

Donate For Us