why does spark appends 'WHERE 1=0' at the end of sql query

Tags:

I am trying to execute a simple mysql query using Apache Spark and create a data frame. But for some reasons spark appends 'WHERE 1=0' at the end of the query which I want to execute and throws an exception stating 'You have an error in your SQL syntax'.

val spark = SparkSession.builder.master("local[*]").appName("rddjoin"). getOrCreate()
 val mhost = "jdbc:mysql://localhost:3306/registry"
val mprop = new java.util.Properties
mprop.setProperty("driver", "com.mysql.jdbc.Driver")mprop.setProperty("user", "root")
mprop.setProperty("password", "root")
val q= """select id from loaded_item"""
val res=spark.read.jdbc(mhost,q,mprop)
res.show(10)

And the exception is as below:

18/02/16 17:53:49 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
Exception in thread "main" com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'select id from loaded_item WHERE 1=0' at line 1
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
    at com.mysql.jdbc.Util.getInstance(Util.java:408)
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:944)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484)
    at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
    at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1966)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:62)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:114)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:307)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
    at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:193)
    at GenerateReport$.main(GenerateReport.scala:46)
    at GenerateReport.main(GenerateReport.scala)
18/02/16 17:53:50 INFO SparkContext: Invoking stop() from shutdown hook

564

asked Feb 16 '18 12:02

sam N

1 Answers

The second parameter of your call to spark.read.jdbc is not correct. Instead of specifing a sql query, you should either use a table name qualified with schema or a valid SQL query with an alias. In your case this would be val q="registry.loaded_item". Another option if you want to provide addional parameters (maybe for a where statement) is to use the other versions of DataframeReader.jdbc.

Btw: the reason why you see the strange looking query WHERE 1=0 is that Spark tries to infer the schema of your data frame without loading any actual data. This query is guaranteed never to deliver any results, but the query result's metadata can be used by Spark to get the schema information.

answered Nov 10 '22 08:11

werner

Related questions
                            
                                How to specify the location of custom log4j.configuration when spark-submit to Amazon EMR?
                            
                                Unbounded table is spark structured streaming
                            
                                Visualizing topics with Spark LDA
                            
                                R - How to replicate rows in a spark dataframe using sparklyr
                            
                                Scala - How to split the probability column (column of vectors) that we obtain when we fit the GMM model to the data in to two separate columns? [duplicate]
                            
                                How does Spark SQL read compressed csv files?
                            
                                S3A: fails while S3: works in Spark EMR
                            
                                with pyspark.sql.functions unix_timestamp get null
                            
                                Streaming data store in hive using spark
                            
                                How can I include additional jars when starting a Google DataProc cluster to use with Jupyter notebooks?
                            
                                reuse the result of a select expression in the "GROUP BY" clause?
                            
                                Spark DataFrame operators (nunique, multiplication)
                            
                                Is it possible to print definition of a function in Scala
                            
                                read/write dynamo db from apache spark [closed]
                            
                                java.lang.IllegalArgumentException: Invalid lambda deserialization
                            
                                Pyspark Dataframe - Map Strings to Numerics
                            
                                After installing sparknlp, cannot import sparknlp
                            
                                How to achieve dynamic load-balancing of tasks in Apache Spark
                            
                                How to calculate the power of 2 for the column of DataFrame
                            
                                Can num-executors override dynamic allocation in spark-submit

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

why does spark appends 'WHERE 1=0' at the end of sql query

Tags:

apache-spark

apache-spark-sql

spark-dataframe

sam N

People also ask

1 Answers

werner

Recent Activity

Donate For Us