Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala module requiring specific version of data bind for Spark

I am having issues trying to get Spark to load, read and query a parquet file. The infrastructure seems to be set up (Spark standalone 3.0) and can be seen and will pick up jobs.

The issue I am having is when this line is called

    Dataset<Row> parquetFileDF = sparkSession.read().parquet(parquePath);

the following error is thrown

Caused by: com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.0 requires Jackson Databind version >= 2.10.0 and < 2.11.0
    at com.fasterxml.jackson.module.scala.JacksonModule.setupModule(JacksonModule.scala:61)

I looked into JacksonModule.setupModule and when it gets to context.getMapperVersion the version that is being passed is 2.9.10. It appears to me that the DefaultScalaModule is pulling some older version.

I'm using Gradle to build and have the dependencies set up as such

    implementation 'com.fasterxml.jackson.core:jackson-core:2.10.0'
    implementation 'com.fasterxml.jackson.core:jackson-databind:2.10.0'
    implementation 'org.apache.spark:spark-core_2.12:3.0.0'
    implementation 'org.apache.spark:spark-sql_2.12:3.0.0'
    implementation 'org.apache.spark:spark-launcher_2.12:3.0.0'
    implementation 'org.apache.spark:spark-catalyst_2.12:3.0.0'
    implementation 'org.apache.spark:spark-streaming_2.12:3.0.0'

That didn't work, so I tried forcing databind

    implementation ('com.fasterxml.jackson.core:jackson-databind') {
        version {
            strictly '2.10.0'
        }
    }

I've tried a few different versions and still keep hitting this issue. Maybe I'm missing something super simple, but right now, I can't seem to get past this error.

Any help would be appreciated.

like image 361
Emmanuel F Avatar asked Oct 15 '22 23:10

Emmanuel F


1 Answers

I was able to figure out the issue. I was pulling in jar file from another project. The functionality in the jar file wasn't being used at all, so it wasn't suspect. Unfortunately, that project hadn't been updated and there were some older Spark libraries that were some how being picked up by my current running app. Once I removed that, the error went away. What's interesting is the dependency graph didn't show anything about the libraries the other jar file was using.

I suppose if you run into a similar issue, double check any jar files being imported.

like image 125
Emmanuel F Avatar answered Oct 17 '22 12:10

Emmanuel F