Spark can access Hive table from pyspark but not from spark-submit

Tags:

So, when running from pyspark i would type in (without specifying any contexts) :

df_openings_latest = sqlContext.sql('select * from experian_int_openings_latest_orc')

.. and it works fine.

However, when i run my script from spark-submit, like

spark-submit script.py i put the following in

from pyspark.sql import SQLContext
from pyspark import SparkConf, SparkContext
conf = SparkConf().setAppName('inc_dd_openings')
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)

df_openings_latest = sqlContext.sql('select * from experian_int_openings_latest_orc')

But it gives me an error

pyspark.sql.utils.AnalysisException: u'Table not found: experian_int_openings_latest_orc;'

So it doesnt see my table.

What am I doing wrong? Please help

P.S. Spark version is 1.6 running on Amazon EMR

724

asked Apr 01 '16 15:04

Denys

1 Answers

Spark 2.x

The same problem may occur in Spark 2.x if SparkSession has been created without enabling Hive support.

Spark 1.x

It is pretty simple. When you use PySpark shell, and Spark has been build with Hive support, default SQLContext implementation (the one available as a sqlContext) is HiveContext.

In your standalone application you use plain SQLContext which doesn't provide Hive capabilities.

Assuming the rest of the configuration is correct just replace:

from pyspark.sql import SQLContext

sqlContext = SQLContext(sc)

with

from pyspark.sql import HiveContext

sqlContext = HiveContext(sc)

101

answered Oct 12 '22 23:10

zero323

Related questions
                            
                                Django admin display multiple fields on the same line
                            
                                Dynamic choices field in Django Models
                            
                                How can I include a python package with Hadoop streaming job?
                            
                                Unicode encoding for filesystem in Mac OS X not correct in Python?
                            
                                how to create a dictionary using two lists in python? [duplicate]
                            
                                Index Error: list index out of range (Python) [duplicate]
                            
                                Python statsmodels OLS: how to save learned model to file
                            
                                python 32-bit and 64-bit integer math with intentional overflow
                            
                                Python - Pymongo Insert and Update Documents
                            
                                Most pythonic way to convert a string to a octal number
                            
                                No module named flask.ext.wtf
                            
                                Scikit classification report - change the format of displayed results
                            
                                Validating URLs in Python
                            
                                Django : How to override the CSRF_FAILURE_TEMPLATE
                            
                                How do I send a DELETE keystroke to a text field using Selenium with Python?
                            
                                How to express classes on the axis of a heatmap in Seaborn
                            
                                How to retrieve pip requirements (freeze) within Python?
                            
                                List directory contents of an S3 bucket using Python and Boto3?
                            
                                Adding lambda functions with the same operator in python
                            
                                Error: 'conda' can only be installed into the root environment

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark can access Hive table from pyspark but not from spark-submit

Tags:

python

apache-spark

hadoop

pyspark

Denys

People also ask

1 Answers

zero323

Recent Activity

Donate For Us