Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Visual studio code using pytest for Pyspark getting stuck at SparkSession Creation

I am trying to run a pyspark unit test in Visual studio code on my local windows machine. when i debug the test it gets stuck at line where I am creating a sparksession. It doesn't show any error/failure but status bar just shows "Running Tests" . Once it work, i can refactor my test to create sparksession as part of test fixture, but presently my test is getting stuck at sparksession creation.

Do i have to install/configure on my local machine for sparksession to work?

i tried a simple test with assert 'a' == 'b' and i can debug and test run succsfully, so i assume my pytest configurations are correct. Issue i am facing is with creating sparksession.

# test code

from pyspark.sql import SparkSession, Row, DataFrame

import pytest

def test_poc():
   spark_session = SparkSession.builder.master('local[2]').getOrCreate()  #this line never returns when debugging test.
   spark_session.createDataFrame(data,schema) #data and schema not shown here.

Thanks

like image 501
user9297554 Avatar asked Jan 31 '26 07:01

user9297554


1 Answers

What I have done to make it work was:

  1. Create a .env file in the root of the project

  2. Add the following content to the created file:

SPARK_LOCAL_IP=127.0.0.1
JAVA_HOME=<java_path>/jdk/[email protected]/Contents/Home
SPARK_HOME=<spark_path>/spark-3.0.1-bin-hadoop2.7
PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PYTHONPATH
  1. Go to .vscode file in the root, expand and open settings.json. Add the following like (replace <workspace_path> with your actual workspace path):
"python.envFile": "<workspace_path>/.env"

After refreshing the Testing section in Visual Studio Code, the setup should succeed.

Note: I use pyenv to setup my python version, so I had to make sure that VS Code was using the correct python version with all the expected dependencies installed.

Solution inspired by py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM and https://github.com/microsoft/vscode-python/issues/6594

like image 56
user3582348 Avatar answered Feb 03 '26 05:02

user3582348



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!