Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use Spark ORC indexes?

What is the option to enable orc indexing from spark?

          df
            .write()
            .option("mode", "DROPMALFORMED")
            .option("compression", "snappy")
            .mode("overwrite")
            .format("orc")
            .option("index", "user_id")
            .save(...);

I'm making up .option("index", uid), what would I have to put there to index column "user_id" from orc.

like image 819
ForeverConfused Avatar asked Oct 29 '17 21:10

ForeverConfused


People also ask

How do you read an ORC table in Spark?

For existing Hive tables, Spark can read them without createOrReplaceTempView . If the table is stored as ORC format (the default), predicate Push-down, partition pruning, and vectorized query execution are also applied according to the configuration.

Does Spark support ORC?

Spark supports two ORC implementations ( native and hive ) which is controlled by spark. sql. orc. impl .

How do I read .ORC files?

There is a desktop application to view Parquet and also other binary format data like ORC and AVRO. It's pure Java application so that can be run at Linux, Mac and also Windows. Please check Bigdata File Viewer for details. It supports complex data type like array, map, struct, etc.


2 Answers

Have you tried : .partitionBy("user_id") ?

 df
        .write()
        .option("mode", "DROPMALFORMED")
        .option("compression", "snappy")
        .mode("overwrite")
        .format("orc")
        .partitionBy("user_id")
        .save(...)
like image 147
Malik Fassi Avatar answered Sep 30 '22 09:09

Malik Fassi


According to the original blogpost on bringing ORC support to Apache Spark, there is a configuration knob to turn on in your spark context to enable ORC indexes.

# enable filters in ORC
sqlContext.setConf("spark.sql.orc.filterPushdown", "true")

Reference: https://databricks.com/blog/2015/07/16/joint-blog-post-bringing-orc-support-into-apache-spark.html

like image 21
louis_guitton Avatar answered Sep 30 '22 10:09

louis_guitton