I'm running Spark on EMR as described in Run Spark and Spark SQL on Amazon Elastic MapReduce:
This tutorial walks you through installing and operating Spark, a fast and general engine for large-scale data processing, on an Amazon EMR cluster. You will also create and query a dataset in Amazon S3 using Spark SQL, and learn how to monitor Spark on an Amazon EMR cluster with Amazon CloudWatch.
I'm trying to suppress the INFO
logs by editing $HOME/spark/conf/log4j.properties
to no avail.
Output looks like:
$ ./spark/bin/spark-sql
Spark assembly has been built with Hive, including Datanucleus jars on classpath
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/.versions/2.4.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/.versions/spark-1.1.1.e/lib/spark-assembly-1.1.1-hadoop2.4.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2014-12-14 20:59:01,819 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
2014-12-14 20:59:01,825 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
2014-12-14 20:59:01,825 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
2014-12-14 20:59:01,825 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
How to suppress the INFO messages above?
In Spark 2.0. 2 we have SparkSession which contains SparkContext instance as well as sqlContext instance. Step 2: Load from the database in your case Mysql. Step 3: Now you can run your SqlQuery just like you do in SqlDatabase.
You can also just add the configuration option at cluster creation, if you know you want to suppress logging for a new EMR cluster.
EMR accepts configuration options as JSON, which you can enter directly into the AWS console, or pass in via a file when using the CLI.
In this case, in order to change the log level to WARN
, here's the JSON:
[
{
"classification": "spark-log4j",
"properties": {"log4j.rootCategory": "WARN, console"}
}
]
In the console, you'd add this in the first creation step:
Or if you're creating the cluster using the CLI:
aws emr create-cluster <options here> --configurations config_file.json
You can read more in the EMR documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With