Unable to install PySpark on Google Colab

Question

I am trying to install PySpark on Google Colab using the code given below but getting the following error.

tar: spark-2.3.2-bin-hadoop2.7.tgz: Cannot open: No such file or directory

tar: Error is not recoverable: exiting now

This code has ran successfully once. But it is throwing this error after the notebook restart. I have even tried running this from a different Google account but same error again.

(Also is there any way that we don't need to install PySpark everytime after the notebook re-start?)

code:

--------------------------------------------------------------------------------------------------------------------------------

!apt-get install openjdk-8-jdk-headless -qq > /dev/null

!wget -q http://apache.osuosl.org/spark/spark-2.3.2/spark-2.3.2-bin-hadoop2.7.tgz

This following line seems to cause the problem as it is not finding the downloaded file.

!tar xvf spark-2.3.2-bin-hadoop2.7.tgz

I have also tried the following two lines (instead of above two lines) suggested somewhere on medium blog. But nothing better.

!wget -q http://mirror.its.dal.ca/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz

!tar xvf spark-2.4.0-bin-hadoop2.7.tgz

!pip install -q findspark

-------------------------------------------------------------------------------------------------------------------------------

Any ideas how to get out of this error and install PySpark on Colab?

Harmeet · Accepted Answer

I am running pyspark on colab by just using

!pip install pyspark

and it works fine.

Vinay Chaudhari · Answer

Date: 6-09-2020

Step 1 : Install pyspark on google colab

!pip install pyspark

Step 2 : Dealing with pandas and spark Dataframe inside spark session

!pip install pyarrow

It facilitates communication between many components, for example, reading a parquet file with Python (pandas) and transforming to a Spark data frame, Falcon Data Visualization or Cassandra without worrying about conversion.

Step 3 : Create Spark Session

from pyspark.sql import SparkSession
spark = SparkSession.builder.master('local').getOrCreate()

Done ⭐

Unable to install PySpark on Google Colab

Tags:

google-colaboratory

pyspark

tar: spark-2.3.2-bin-hadoop2.7.tgz: Cannot open: No such file or directory

tar: Error is not recoverable: exiting now

code:

Ankit Sharma

2 Answers

Harmeet

Vinay Chaudhari

Recent Activity

Donate For Us

Unable to install PySpark on Google Colab

Tags:

google-colaboratory

pyspark

tar: spark-2.3.2-bin-hadoop2.7.tgz: Cannot open: No such file or directory

tar: Error is not recoverable: exiting now

code:

Ankit Sharma

2 Answers

Harmeet

Vinay Chaudhari

Related questions

Recent Activity

Donate For Us