What is the most elegant and robust way on dataproc to adjust log levels for Spark?

Tags:

As explained in previous answers, the ideal way to change the verbosity of a Spark cluster is changing the corresponding log4j.properties. However, on dataproc Spark runs on Yarn, therefore we have to adjust the global configuration and not /usr/lib/spark/conf

Several suggestions:

On dataproc we have several gcloud commands and properties we can pass during cluster creation. See documentation Is it possible to change the log4j.properties under /etc/hadoop/conf by specifying

--properties 'log4j:hadoop.root.logger=WARN,console'

Maybe not, as from the docs:

The --properties command cannot modify configuration files not shown above.

Another way would be to use a shell script during cluster init and run sed:

# change log level for each node to WARN
sudo sed -i -- 's/log4j.rootCategory=INFO, console/log4j.rootCategory=WARN, console/g'\
                     /etc/spark/conf/log4j.properties
sudo sed -i -- 's/hadoop.root.logger=INFO,console/hadoop.root.logger=WARN,console/g'\
                    /etc/hadoop/conf/log4j.properties

But is it enough or do we need to change the env variable hadoop.root.logger as well?

834

asked Mar 23 '16 08:03

Frank

2 Answers

At the moment, you're right that --properties doesn't support extra log4j settings, but it's certainly something we've talked about adding; some considerations include how much to balance the ability to do fine-grained control over Spark vs Yarn vs other long-running daemons' logging configs (hiveserver2, HDFS daemons, etc) compared to keeping a minimal/simple setting which is plumbed through to everything in a shared way.

At least for Spark driver logs, you can use the --driver-log-levels setting a job-submission time which should take precedence over any of the /etc/*/conf settings, but otherwise as you describe, init actions are a reasonable way to edit the files for now on cluster startup, keeping in mind that they may change over time and releases.

198

answered Sep 28 '22 09:09

Dennis Huo

Recently, the support for log4j properties have been added via the --properties tag. For example: you can now use "--properties 'hadoop-log4j:hadoop.root.logger=WARN,console'". See this page(https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/cluster-properties) for more details

answered Sep 28 '22 10:09

Himanshu Kohli

Related questions
                            
                                Adding python libraries to google datalab environment
                            
                                gsutil not working on mac and python3.5
                            
                                Google Cloud Platform DataFlow workers IP addresses
                            
                                How to customize notification display and tone when using GCMReceiver and GcmListenerService
                            
                                BigQuery "copy table" not working for small tables
                            
                                GeoFire query on User location
                            
                                Promote ephemeral IP to static?
                            
                                gCloud / GCE Disk Size warning - is it meaningful?
                            
                                No Content-Length field in the HTTP response header (google app engine)
                            
                                Export nested BigQuery data to cloud storage
                            
                                Does the APNs authentication key work for all my app ids?
                            
                                Firebase storage - Limit size of image that users upload to firebase storage
                            
                                Port and Proxy Config on ng-build
                            
                                How to configure %APP_NAME% in firebase app?
                            
                                What are consequences of having GCM SENDER ID being exposed?
                            
                                Failed to resolve target intent service, Error while delivering the message: ServiceIntent not found
                            
                                Limitation to number of documents under one Collection in firebase firestore
                            
                                Storing hierarchical data in Google App Engine Datastore?
                            
                                "This app is not authorized to use Firebase Authentication" in Emulator
                            
                                Use firebase cloud function to send POST request to non-google server

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the most elegant and robust way on dataproc to adjust log levels for Spark?

Tags:

logging

google-cloud-dataproc

Frank

People also ask

2 Answers

Dennis Huo

Himanshu Kohli

Recent Activity

Donate For Us