Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Custom log4j appender in Hadoop 2

How to specify custom log4j appender in Hadoop 2 (amazon emr)?

Hadoop 2 ignores my log4j.properties file that contains custom appender, overriding it with internal log4j.properties file. There is a flag -Dhadoop.root.logger that specifies logging threshold, but it does not help for custom appender.

like image 237
Artiom Gourevitch Avatar asked Dec 09 '22 08:12

Artiom Gourevitch


1 Answers

I know this question has been answered already, but there is a better way of doing this, and this information isn't easily available anywhere. There are actually at least two log4j.properties that get used in Hadoop (at least for YARN). I'm using Cloudera, but it will be similar for other distributions.

Local properties file

Location: /etc/hadoop/conf/log4j.properties (on the client machines)

There is the log4j.properties that gets used by the normal java process. It affects the logging of all the stuff that happens in the java process but not inside of YARN/Map Reduce. So all your driver code, anything that plugs map reduce jobs together, (e.g., cascading initialization messages) will log according to the rules you specify here. This is almost never the logging properties file you care about.

As you'd expect, this file is parsed after invoking the hadoop command, so you don't need to restart any services when you update your configuration.

If this file exists, it will take priority over the one sitting in your jar (because it's usually earlier in the classpath). If this file doesn't exist the one in your jar will be used.

Container properties file

Location: etc/hadoop/conf/container-log4j.properties (on the data node machines)

This file decides the properties of the output from all the map and reduce tasks, and is nearly always what you want to change when you're talking about hadoop logging.

In newer versions of Hadoop/YARN someone caught a dangerously virulent strain of logging fever and now the default logging configuration ensures that single jobs can generate several hundred of megs of unreadable junk making your logs quite hard to read. I'd suggest putting something like this at the bottom of the container-log4j.properties file to get rid of most of the extremely helpful messages about how many bytes have been processed:

log4j.logger.org.apache.hadoop.mapreduce=WARN
log4j.logger.org.apache.hadoop.mapred=WARN
log4j.logger.org.apache.hadoop.yarn=WARN
log4j.logger.org.apache.hadoop.hive=WARN
log4j.security.logger=WARN

By default this file usually doesn't exist, in which case the copy of this file found in hadoop-yar-server-nodemanager-stuff.jar (as mentioned by uriah kremer) will be used. However, like with the other log4j-properties file, if you do create /etc/hadoop/conf/container-log4j.properties it will be used on all your YARN stuff. Which is good!

Note: No matter what you do, a copy of container-log4j-properties in your jar will not be used for these properties, because the YARN nodemanager jars are higher in the classpath. Similarly, despite what the internet tells you -Dlog4j.configuration=PATH_TO_FILE will not alter your container logging properties because the option doesn't get passed on to yarn when the container is initialized.

like image 80
Jeffrey Theobald Avatar answered Dec 10 '22 20:12

Jeffrey Theobald