How HiveContext of spark internally works?

1 Answers

No, Spark doesn't call the hive to execute query. Spark only reads the metadata from hive and executes the query within Spark engine. Spark has it's own SQL execution engine which includes components such as catalyst, tungsten to optimize queries and give faster results. It uses meta data from hive and execution engine of spark to run the queries.

One of the greatest advantages of Hive is it's metastore. It acts as a single meta store for lot of components in hadoop eco system.

Coming to your question, when you use HiveContext, it will get access to metastore db and all your Hive Meta Data, which can clearly explain what type of data you have , where do you have the data , serialization and deserializations, compression codecs, columns, datatypes and literally every detail about the table and it's data. That is enough for spark to understand the data.

Overall, Spark only needs metastore which gives complete details of underlying data and once it has the metadata, it will execute the queries that you asked for, over its on execution engine. Hive is slower than Spark as it uses MapReduce. So, there is no point in going back to hive and ask to run it in hive.

Let me know if it answers ur question.

answered Sep 18 '22 05:09

Srini

Related questions
                            
                                Serialization using ArrayWritable seems to work in a funny way
                            
                                Hadoop mobile application
                            
                                Is CDH(Cloudera Distribution for hadoop) is open source to use?
                            
                                Hadoop: Format aborted in /mnt/hdfs/1/namenode
                            
                                How to run HBase program
                            
                                Apache Pig - MATCHES with multiple match criteria
                            
                                java.lang.IllegalArgumentException: Wrong FS: , expected: hdfs://localhost:9000
                            
                                Hadoop cannot connect to Google Cloud Storage
                            
                                Testing java HBase connection
                            
                                http://localhost:9870 does not work HADOOP
                            
                                Hive- how do I "create table as select.." with partitions from original table?
                            
                                Using encryption with Hadoop
                            
                                How can I get the MapReduce Jobs source codes generated by the Hive compiler?
                            
                                How can I add row numbers for rows in PIG or HIVE?
                            
                                Hadoop - Produce multiple values for a single key
                            
                                Getting NULL after create external table in Hive using parquet file as storage
                            
                                Hue File Browser not working
                            
                                java.lang.ClassNotFoundException: org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingInterface
                            
                                hive-drop-import-delims not removing newline while using HCatalog in Sqoop
                            
                                Where does Hbase store data?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How HiveContext of spark internally works?

Tags:

apache-spark-sql

hadoop

Tom Sebastian

People also ask

1 Answers

Srini

Recent Activity

Donate For Us