Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark.table fails with java.io.Exception: No FileSystem for Scheme: abfs

We have a custom file system class which is an extension of hadoop.fs.FileSystem. This file system has a uri scheme of abfs:///. External hive tables have been created over this data.

CREATE EXTERNAL TABLE testingCustomFileSystem (a string, b int, c double) PARTITIONED BY dt
STORED AS PARQUET
LOCATION 'abfs://<host>:<port>/user/name/path/to/data/'

Using loginbeeline, I'm able to query the table and it would fetch the results.

Now I'm trying to load the same table into a spark dataframe using spark.table('testingCustomFileSystem') and it would throw the following exception

    java.io.IOException: No FileSystem for scheme: abfs
  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586)
  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632)
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
  at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
  at org.apache.spark.sql.execution.datasources.CatalogFileIndex$$anonfun$2.apply(CatalogFileIndex.scala:77)
  at org.apache.spark.sql.execution.datasources.CatalogFileIndex$$anonfun$2.apply(CatalogFileIndex.scala:75)
  at scala.collection.immutable.Stream.map(Stream.scala:418)

The jar containing the CustomFileSystem (defining the abfs:// scheme) was loaded into the classpath and was also available.

How does the spark.table parse a hive table definition in a metastore and resolve the uri?.

like image 323
venBigData Avatar asked Apr 29 '19 21:04

venBigData


1 Answers

After looking into the configurations in spark, I happened to notice by setting the following hadoop configuration, I was able to resolve.

hadoopConfiguration.set("fs.abfs.impl",<fqcn of the FileSystemImplementation>)

In Spark, this setting is done during the sparkSession creation (just used only the appName and

like

val spark = SparkSession
            .builder()
            .setAppName("Name")
            .setMaster("yarn")
            .getOrCreate()

spark.sparkContext
     .hadoopConfiguration.set("fs.abfs.impl",<fqcn of the FileSystemImplementation>)

and it worked !

like image 108
venBigData Avatar answered Nov 07 '22 22:11

venBigData