I am having issues reading an ORC file directly from the Spark shell. Note: running Hadoop 1.2, and Spark 1.2, using pyspark shell, can use spark-shell (runs scala).
I have used this resource http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.4/Apache_Spark_Quickstart_v224/content/ch_orc-spark-quickstart.html .
from pyspark.sql import HiveContext
hiveCtx = HiveContext(sc)
inputRead = sc.hadoopFile("hdfs://user@server:/file_path",
classOf[inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat],
classOf[outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat])
I get an error generally saying wrong syntax. One time, the code seemed to work, I used just the 1st of three arguments passed to hadoopFile, but when I tried to use
inputRead.first()
the output was RDD[nothing, nothing]. I don't know if this is because the inputRead variable did not get created as an RDD or if it was not created at all.
I appreciate any help!
Use Spark DataFrameReader's orc() method to read ORC file into DataFrame. This supports reading snappy, zlib or no compression, it is not necessary to specify in compression option while reading a ORC file.
Spark on HDP supports the Optimized Row Columnar ("ORC") file format, a self-describing, type-aware column-based file format that is one of the primary file formats supported in Apache Hive. The columnar format lets the reader read, decompress, and process only the columns that are required for the current query.
To read ORC files, use the OrcFile class to create a Reader that contains the metadata about the file. There are a few options to the ORC reader, but far fewer than the writer and none of them are required. The reader has methods for getting the number of rows, schema, compression, etc. from the file.
In Spark 1.5, I'm able to load my ORC file as:
val orcfile = "hdfs:///ORC_FILE_PATH"
val df = sqlContext.read.format("orc").load(orcfile)
df.show
You can try this code, it's working for me.
val LoadOrc = spark.read.option("inferSchema", true).orc("filepath")
LoadOrc.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With