Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Spark report "java.net.URISyntaxException: Relative path in absolute URI" when working with DataFrames?

Tags:

I am running Spark locally on a Windows machine. I was able to launch the spark shell successfully and also read in text files as RDDs. I was also able to follow along the various online tutorials on this subject and was able to perform various operations on the RDDs.

However, when I try to convert an RDD into a DataFrame I am getting an error. This is what I am doing:

val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._  //convert rdd to df val df = rddFile.toDF() 

This code generates a long series of error messages that seem to relate to the following one:

Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:C:/Users/spark/spark-warehouse         at org.apache.hadoop.fs.Path.initialize(Path.java:205)         at org.apache.hadoop.fs.Path.<init>(Path.java:171)         at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159)         at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177)         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:600)         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)         at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)         at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)         at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)         at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199)         at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)         ... 85 more Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:C:/Users/spark/spark-warehouse         at java.net.URI.checkPath(URI.java:1823)         at java.net.URI.<init>(URI.java:745)         at org.apache.hadoop.fs.Path.initialize(Path.java:202)         ... 96 more 

The entire stack trace follows.

16/08/16 12:36:20 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 16/08/16 12:36:20 WARN Hive: Failed to access metastore. This class should not accessed in runtime. org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient         at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)         at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)         at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)         at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)         at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:171)         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)         at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)         at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)         at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)         at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)         at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)         at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)         at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45)         at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50)         at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48)         at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63)         at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)         at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)         at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)         at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)         at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382)         at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143)         at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:401)         at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:342)         at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:24)         at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:29)         at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:31)         at $line14.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:33)         at $line14.$read$$iw$$iw$$iw$$iw.<init>(<console>:35)         at $line14.$read$$iw$$iw$$iw.<init>(<console>:37)         at $line14.$read$$iw$$iw.<init>(<console>:39)         at $line14.$read$$iw.<init>(<console>:41)         at $line14.$read.<init>(<console>:43)         at $line14.$read$.<init>(<console>:47)         at $line14.$read$.<clinit>(<console>)         at $line14.$eval$.$print$lzycompute(<console>:7)         at $line14.$eval$.$print(<console>:6)         at $line14.$eval.$print(<console>)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:498)         at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)         at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)         at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)         at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)         at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)         at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)         at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)         at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)         at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)         at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)         at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)         at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)         at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:415)         at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:923)         at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)         at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)         at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)         at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)         at org.apache.spark.repl.Main$.doMain(Main.scala:68)         at org.apache.spark.repl.Main$.main(Main.scala:51)         at org.apache.spark.repl.Main.main(Main.scala)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:498)         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient         at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)         at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)         at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)         at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)         at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)         at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)         at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234)         ... 74 more Caused by: java.lang.reflect.InvocationTargetException         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)         at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)         ... 80 more Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:C:/Users/spark/spark-warehouse         at org.apache.hadoop.fs.Path.initialize(Path.java:205)         at org.apache.hadoop.fs.Path.<init>(Path.java:171)         at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:159)         at org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:177)         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:600)         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)         at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)         at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)         at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)         at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199)         at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)         ... 85 more Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:C:/Users/spark/spark-warehouse         at java.net.URI.checkPath(URI.java:1823)         at java.net.URI.<init>(URI.java:745)         at org.apache.hadoop.fs.Path.initialize(Path.java:202)         ... 96 more 
like image 377
Dataminer Avatar asked Aug 14 '16 08:08

Dataminer


1 Answers

It's the SPARK-15565 issue in Spark 2.0 on Windows with a simple solution (that appears to be part of Spark's codebase that may soon be released as 2.0.2 or 2.1.0).

The solution in Spark 2.0.0 is to set spark.sql.warehouse.dir to some properly-referenced directory, say file:///c:/Spark/spark-2.0.0-bin-hadoop2.7/spark-warehouse that uses /// (triple slashes).

Start spark-shell with --conf argument as follows:

spark-shell --conf spark.sql.warehouse.dir=file:///c:/tmp/spark-warehouse 

Or create a SparkSession in your Spark application using the new fluent builder pattern as follows:

import org.apache.spark.sql.SparkSession SparkSession spark = SparkSession   .builder()   .config("spark.sql.warehouse.dir", "file:///c:/tmp/spark-warehouse")   .getOrCreate() 

Or create conf/spark-defaults.conf with the following content:

spark.sql.warehouse.dir file:///c:/tmp/spark-warehouse 
like image 122
Jacek Laskowski Avatar answered Oct 21 '22 14:10

Jacek Laskowski