How to stop such messages from coming on my spark-shell console.
5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 89213 records.
5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 2 ms. row count = 120141
5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 2 ms. row count = 89213
5 May, 2015 5:14:30 PM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutp
[Stage 12:=================================================> (184 + 4) / 200]
Thanks
Parquet has higher execution speed compared to other standard file formats like Avro,JSON etc and it also consumes less disk space in compare to AVRO and JSON.
The solution from SPARK-8118 issue comment seem to work:
You can disable the chatty output by creating a properties file with these contents:
org.apache.parquet.handlers=java.util.logging.ConsoleHandler
java.util.logging.ConsoleHandler.level=SEVERE
And then passing the path of the file to Spark when the application is submitted. Assuming the file lives in /tmp/parquet.logging.properties (of course, that needs to be available on all worker nodes):
spark-submit \
--conf spark.driver.extraJavaOptions="-Djava.util.logging.config.file=/tmp/parquet.logging.properties" \`
--conf spark.executor.extraJavaOptions="-Djava.util.logging.config.file=/tmp/parquet.logging.properties" \
...
Credits go to Justin Bailey.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With