How to specify custom log4j appender in Hadoop 2 (amazon emr)?
Hadoop 2 ignores my log4j.properties file that contains custom appender, overriding it with internal log4j.properties file. There is a flag -Dhadoop.root.logger
that specifies logging threshold, but it does not help for custom appender.
I know this question has been answered already, but there is a better way of doing this, and this information isn't easily available anywhere. There are actually at least two log4j.properties that get used in Hadoop (at least for YARN). I'm using Cloudera, but it will be similar for other distributions.
Location: /etc/hadoop/conf/log4j.properties
(on the client machines)
There is the log4j.properties that gets used by the normal java process. It affects the logging of all the stuff that happens in the java process but not inside of YARN/Map Reduce. So all your driver code, anything that plugs map reduce jobs together, (e.g., cascading initialization messages) will log according to the rules you specify here. This is almost never the logging properties file you care about.
As you'd expect, this file is parsed after invoking the hadoop command, so you don't need to restart any services when you update your configuration.
If this file exists, it will take priority over the one sitting in your jar (because it's usually earlier in the classpath). If this file doesn't exist the one in your jar will be used.
Location: etc/hadoop/conf/container-log4j.properties
(on the data node machines)
This file decides the properties of the output from all the map and reduce tasks, and is nearly always what you want to change when you're talking about hadoop logging.
In newer versions of Hadoop/YARN someone caught a dangerously virulent strain of logging fever and now the default logging configuration ensures that single jobs can generate several hundred of megs of unreadable junk making your logs quite hard to read. I'd suggest putting something like this at the bottom of the container-log4j.properties file to get rid of most of the extremely helpful messages about how many bytes have been processed:
log4j.logger.org.apache.hadoop.mapreduce=WARN
log4j.logger.org.apache.hadoop.mapred=WARN
log4j.logger.org.apache.hadoop.yarn=WARN
log4j.logger.org.apache.hadoop.hive=WARN
log4j.security.logger=WARN
By default this file usually doesn't exist, in which case the copy of this file found in hadoop-yar-server-nodemanager-stuff.jar (as mentioned by uriah kremer) will be used. However, like with the other log4j-properties file, if you do create /etc/hadoop/conf/container-log4j.properties
it will be used on all your YARN stuff. Which is good!
Note: No matter what you do, a copy of container-log4j-properties in your jar will not be used for these properties, because the YARN nodemanager jars are higher in the classpath. Similarly, despite what the internet tells you -Dlog4j.configuration=PATH_TO_FILE
will not alter your container logging properties because the option doesn't get passed on to yarn when the container is initialized.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With