Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Kedro PySpark : py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext

it's my first project using kedro with Pyspark and I have an issue. I work with the new Mac (M1). When I do spark-shell in the terminal, spark is successfully installed and I have the right output (welcome to spark version 3.2.1 with the picture). However, I tried to run spark using Kedro project, I have a trouble. I tried to find solutions thanks to stack overflow discussion but nothing linked with this.

Version:

  • Python : 3.8
  • Java : openjdk version "18" 2022-03-22
  • PySpark : 3.2.1

Spark conf :

spark.driver.maxResultSize: 3g
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
spark.sql.execution.arrow.pyspark.enabled: true

And in my project context of Kedro :

class ProjectContext(KedroContext):
    """A subclass of KedroContext to add Spark initialisation for the pipeline."""

    def __init__(
        self,
        package_name: str,
        project_path: Union[Path, str],
        env: str = None,
        extra_params: Dict[str, Any] = None,
    ):
        super().__init__(package_name, project_path, env, extra_params)
        if not os.getenv('DISABLE_SPARK'):
            self.init_spark_session()

    def init_spark_session(self) -> None:
        """Initialises a SparkSession using the config
        defined in project's conf folder.
        """

        parameters = self.config_loader.get("spark*", "spark*/**")
        spark_conf = SparkConf().setAll(parameters.items())

        # Initialise the spark session
        spark_session_conf = (
            SparkSession.builder.appName(self.package_name)
            .enableHiveSupport()
            .config(conf=spark_conf)
            .master("local[*]")
        )
        _spark_session = spark_session_conf.getOrCreate()

When I run it, I have this error :

py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x3c60b7e7) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x3c60b7e7
    at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:213)
    at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)
    at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)
    at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
    at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
    at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
    at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:460)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.base/java.lang.Thread.run(Thread.java:833)

In my terminal, I adapted the commands to match my Python path :

export HOMEBREW_OPT="/opt/homebrew/opt"
export JAVA_HOME="$HOMEBREW_OPT/openjdk/"
export SPARK_HOME="$HOMEBREW_OPT/apache-spark/libexec"
export PATH="$JAVA_HOME:$SPARK_HOME:$PATH"
export SPARK_LOCAL_IP=localhost

Thanks you for your help

like image 290
Mathilde Roblot Avatar asked Apr 17 '26 23:04

Mathilde Roblot


2 Answers

Hi @Mathilde Roblot thanks for the detailed report -

The specific error 'cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module' sticks out to me.

Googling suggests that you may be retrieving the wrong Java (not 8.0 as required by spark)

  • https://stackoverflow.com/a/49453770/2010808
  • https://stackoverflow.com/a/69851663/2010808
like image 168
datajoely Avatar answered Apr 20 '26 12:04

datajoely


You can use some SparkConf to set needed --add-opens see : https://stackoverflow.com/a/71855571/13547620.

like image 38
bloussou Avatar answered Apr 20 '26 14:04

bloussou



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!