SparkSQL - Read parquet file directly

Tags:

I am migrating from Impala to SparkSQL, using the following code to read a table:

my_data = sqlContext.read.parquet('hdfs://my_hdfs_path/my_db.db/my_table')

How do I invoke SparkSQL above, so it can return something like:

'select col_A, col_B from my_table'

209

asked Dec 21 '16 02:12

Edamame

2 Answers

After creating a Dataframe from parquet file, you have to register it as a temp table to run sql queries on it.

val sqlContext = new org.apache.spark.sql.SQLContext(sc)  val df = sqlContext.read.parquet("src/main/resources/peopleTwo.parquet")  df.printSchema  // after registering as a table you will be able to run sql queries df.registerTempTable("people")  sqlContext.sql("select * from people").collect.foreach(println)

146

answered Sep 18 '22 13:09

bob

With plain SQL

JSON, ORC, Parquet, and CSV files can be queried without creating the table on Spark DataFrame.

//This Spark 2.x code you can do the same on sqlContext as well
val spark: SparkSession = SparkSession.builder.master("set_the_master").getOrCreate

spark.sql("select col_A, col_B from parquet.`hdfs://my_hdfs_path/my_db.db/my_table`")
   .show()

answered Sep 17 '22 13:09

mrsrinivas

Related questions
                            
                                Why to use empty parentheses in Scala if we can just use no parentheses to define a function which does not need any arguments?
                            
                                How to use orderby() with descending order in Spark window functions?
                            
                                Exploding nested Struct in Spark dataframe
                            
                                Scala - Mutable thread safe collections
                            
                                Read All Lines of BufferedReader in Scala into a String
                            
                                Accessing scala.None from Java
                            
                                Passing elements of a List as parameters to a function with variable arguments
                            
                                Right Arrow meanings in Scala
                            
                                What is the difference between toString and mkString in scala?
                            
                                Scala map method syntax
                            
                                Why can't a class extend traits with method of the same signature?
                            
                                How to format strings in Scala?
                            
                                How to convert an Array to a Tuple?
                            
                                IntelliJ IDEA 13: new Scala SBT project hasn't src directory structure generated
                            
                                "Cannot find an implicit ExecutionContext" error in scala.js example app.
                            
                                Scala case class update value
                            
                                Questions about Scala from a Rubyist
                            
                                How do I append to a file in Scala?
                            
                                How to call .map() on a list of pairs in Scala
                            
                                Persistent data structures in Scala

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SparkSQL - Read parquet file directly

Tags:

scala

apache-spark

apache-spark-sql

hive

hdfs

Edamame

People also ask

2 Answers

bob

With plain SQL

mrsrinivas

Recent Activity

Donate For Us