pyspark.sql.utils.IllegalArgumentException: "Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuild in windows 10

Question

I have installed spark 2.2 with winutils in windows 10.when i am going to run pyspark i am facing bellow exception

pyspark.sql.utils.IllegalArgumentException: "Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder'

I have already tried permission 777 commands in tmp/hive folder as well.but it is not working for now

winutils.exe chmod -R 777 C:	mp\hive

after applying this the problem remains same. I am using pyspark 2.2 in my windows 10. Her is spark-shell env enter image description here

Here is pyspark shell enter image description here

Kindly help me to figure out Thankyou

Chetan Shetty · Accepted Answer

I had the same problem using the command 'pyspark' as well as 'spark-shell' (for scala) in my mac os with apache-spark 2.2. Based on some research I figured its because of my JDK version 9.0.1 which does not work well with Apache-Spark. Both errors got resolved by switching back from Java JDK 9 to JDK 8.

Maybe that might help with your windows spark installation too.

Jacek Laskowski · Answer

Port 9000?! It must be something Hadoop-related as I don't remember the port for Spark. I'd recommend using spark-shell first that would eliminate any additional "hops", i.e. spark-shell does not require two runtimes for Spark itself and Python.

Given the exception I'm pretty sure that the issue is that you've got some Hive- or Hadoop-related configuration somewhere lying around and Spark uses it apparently.

The "Caused by" seems to show that 9000 is used when Spark SQL is created which is when Hive-aware subsystem is loaded.

Caused by: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.net.ConnectException: Call From DESKTOP-SDNSD47/192.168.10.143 to 0.0.0.0:9000 failed on connection exception: java.net.ConnectException: Connection refused

Please review the environment variables in Windows 10 (possibly using set command on command line) and remove anything Hadoop-related.

AravindR · Answer

Posting this answer for posterity. I faced the same error. The way i solved it is by first trying out spark-shell instead of pyspark. The error message was more direct.

This gave a better idea; there was S3 access error. Next; i checked the ec2 role/instance profile for that instance; it has S3 administrator access.

Then i did a grep for s3:// in all the conf files under /etc/ directory. Then i found that in core-site.xml there is a property called

 <name>fs.defaultFS</name> <value>s3://arvind-glue-temp/</value> </property>

Then i remembered. I had removed HDFS as the default file system and set it to S3. I had created the ec2 instance from an earlier AMI and had forgotten to update the S3 bucket corresponding to the newer account.

Once i updated the s3 bucket to the one which is accessible by the current ec2 instance profile; it worked.

pyspark.sql.utils.IllegalArgumentException: "Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuild in windows 10

Tags:

apache-spark

pyspark

Kalyan

3 Answers

Chetan Shetty

Jacek Laskowski

AravindR

Recent Activity

Donate For Us

pyspark.sql.utils.IllegalArgumentException: "Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuild in windows 10

Tags:

apache-spark

pyspark

Kalyan

3 Answers

Chetan Shetty

Jacek Laskowski

AravindR

Related questions

Recent Activity

Donate For Us