We have a custom file system class which is an extension of hadoop.fs.FileSystem. This file system has a uri scheme of abfs:///. External hive tables have been created over this data.
CREATE EXTERNAL TABLE testingCustomFileSystem (a string, b int, c double) PARTITIONED BY dt
STORED AS PARQUET
LOCATION 'abfs://<host>:<port>/user/name/path/to/data/'
Using loginbeeline, I'm able to query the table and it would fetch the results.
Now I'm trying to load the same table into a spark dataframe using spark.table('testingCustomFileSystem') and it would throw the following exception
java.io.IOException: No FileSystem for scheme: abfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.spark.sql.execution.datasources.CatalogFileIndex$$anonfun$2.apply(CatalogFileIndex.scala:77)
at org.apache.spark.sql.execution.datasources.CatalogFileIndex$$anonfun$2.apply(CatalogFileIndex.scala:75)
at scala.collection.immutable.Stream.map(Stream.scala:418)
The jar containing the CustomFileSystem (defining the abfs:// scheme) was loaded into the classpath and was also available.
How does the spark.table parse a hive table definition in a metastore and resolve the uri?.
After looking into the configurations in spark, I happened to notice by setting the following hadoop configuration, I was able to resolve.
hadoopConfiguration.set("fs.abfs.impl",<fqcn of the FileSystemImplementation>)
In Spark, this setting is done during the sparkSession creation (just used only the appName and
like
val spark = SparkSession
.builder()
.setAppName("Name")
.setMaster("yarn")
.getOrCreate()
spark.sparkContext
.hadoopConfiguration.set("fs.abfs.impl",<fqcn of the FileSystemImplementation>)
and it worked !
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With