I want to use Spark to process some data from a JDBC source. But to begin with, instead of reading original tables from JDBC, I want to run some queries on the JDBC side to filter columns and join tables, and load the query result as a table in Spark SQL. The following syntax to load raw JDBC table works for me: <pre class="prettyprint"><code>df_table1 = sqlContext.read.format('jdbc').options( url="jdbc:mysql://foo.com:3306", dbtable="mydb.table1", user="me", password="******", driver="com.mysql.jdbc.Driver" # mysql JDBC driver 5.1.41 ).load() df_table1.show() # succeeded </code></pre> According to Spark documentation (I'm using PySpark 1.6.3): <blockquote> dbtable: The JDBC table that should be read. Note that anything that is valid in a FROM clause of a SQL query can be used. For example, instead of a full table you could also use a subquery in parentheses. </blockquote> So just for experiment, I tried something simple like this: <pre class="prettyprint"><code>df_table1 = sqlContext.read.format('jdbc').options( url="jdbc:mysql://foo.com:3306", dbtable="(SELECT * FROM mydb.table1) AS table1", user="me", password="******", driver="com.mysql.jdbc.Driver" ).load() # failed </code></pre> It threw the following exception: <pre class="prettyprint"><code>com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'table1 WHERE 1=0' at line 1 </code></pre> I also tried a few other variations of the syntax (add / remove parentheses, remove 'as' clause, switch case, etc) without any luck. So what would be the correct syntax? Where can I find more detailed documentation for the syntax? Besides, where does this weird "WHERE 1=0" in error message come from? Thanks!

For reading data from JDBC source using sql query in Spark SQL, you can try something like this: <pre class="prettyprint"><code>val df_table1 = sqlContext.read.format("jdbc").options(Map( ("url" -> "jdbc:postgresql://localhost:5432/mydb"), ("dbtable" -> "(select * from table1) as table1"), ("user" -> "me"), ("password" -> "******"), ("driver" -> "org.postgresql.Driver")) ).load() </code></pre> I tried it using PostgreSQL. You can modify it according to <code>MySQL</code>.

<pre class="prettyprint"><code>table = "(SELECT id, person, manager, CAST(tdate AS CHAR) AS tdate, CAST(start AS CHAR) AS start, CAST(end AS CHAR) as end, CAST(duration AS CHAR) AS duration FROM EmployeeTimes) AS EmployeeTimes", spark = get_spark_session() df = spark.read.format("jdbc"). \ options(url=ip, driver='com.mysql.jdbc.Driver', dbtable=table, user=username, password=password).load() return df </code></pre> I had heaps of trouble with Spark JDBC incompatability with MYSQL timestamps. The trick is to convert all your timestamp or duration values to a string prior to having the JDBC touch them. Simply cast your values as strings and it will work. Note: You will also have to use AS to give the query an alias for it to work.

How to use a subquery for dbtable option in jdbc data source?

Tags:

mysql

jdbc

apache-spark

apache-spark-sql

pyspark-sql

I want to use Spark to process some data from a JDBC source. But to begin with, instead of reading original tables from JDBC, I want to run some queries on the JDBC side to filter columns and join tables, and load the query result as a table in Spark SQL.

The following syntax to load raw JDBC table works for me:

df_table1 = sqlContext.read.format('jdbc').options(
    url="jdbc:mysql://foo.com:3306",
    dbtable="mydb.table1",
    user="me",
    password="******",
    driver="com.mysql.jdbc.Driver" # mysql JDBC driver 5.1.41
).load() 
df_table1.show() # succeeded

According to Spark documentation (I'm using PySpark 1.6.3):

dbtable: The JDBC table that should be read. Note that anything that is valid in a FROM clause of a SQL query can be used. For example, instead of a full table you could also use a subquery in parentheses.

So just for experiment, I tried something simple like this:

df_table1 = sqlContext.read.format('jdbc').options(
    url="jdbc:mysql://foo.com:3306",
    dbtable="(SELECT * FROM mydb.table1) AS table1",
    user="me",
    password="******",
    driver="com.mysql.jdbc.Driver"
).load() # failed

It threw the following exception:

com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'table1 WHERE 1=0' at line 1

I also tried a few other variations of the syntax (add / remove parentheses, remove 'as' clause, switch case, etc) without any luck. So what would be the correct syntax? Where can I find more detailed documentation for the syntax? Besides, where does this weird "WHERE 1=0" in error message come from? Thanks!

807

asked Apr 02 '17 23:04

Dichen

2 Answers

For reading data from JDBC source using sql query in Spark SQL, you can try something like this:

val df_table1 = sqlContext.read.format("jdbc").options(Map(
    ("url" -> "jdbc:postgresql://localhost:5432/mydb"),
    ("dbtable" -> "(select * from table1) as table1"),
    ("user" -> "me"),
    ("password" -> "******"),
    ("driver" -> "org.postgresql.Driver"))
).load()

I tried it using PostgreSQL. You can modify it according to MySQL.

answered Nov 02 '22 11:11

himanshuIIITian

table = "(SELECT id, person, manager, CAST(tdate AS CHAR) AS tdate, CAST(start AS   CHAR) AS start, CAST(end AS CHAR) as end, CAST(duration AS CHAR) AS duration FROM EmployeeTimes) AS EmployeeTimes",

spark = get_spark_session()
df = spark.read.format("jdbc"). \
    options(url=ip,
            driver='com.mysql.jdbc.Driver',
            dbtable=table,
            user=username,
            password=password).load()
return df

I had heaps of trouble with Spark JDBC incompatability with MYSQL timestamps. The trick is to convert all your timestamp or duration values to a string prior to having the JDBC touch them. Simply cast your values as strings and it will work.

Note: You will also have to use AS to give the query an alias for it to work.

answered Nov 02 '22 10:11

Zack

Related questions
                            
                                How to run a php script automatically daily? [duplicate]
                            
                                Using SUM with multiple joins in mysql
                            
                                how use mysql_data_seek with PDO?
                            
                                How to query MySQL for fields containing null characters
                            
                                jQuery: Can I send and receive an $.ajax response while a longer $.ajax request is pending?
                            
                                MySQL - Supertype/Subtype design
                            
                                SQL query with count and group by in Symfony2 QueryBuilder
                            
                                Reconnecting MySQL on timeout
                            
                                MySQLdb Stored Procedure Out Parameter not working
                            
                                Not able to connect with database after some times of deployment on server
                            
                                BIT(1) or TINYINT for flags in MySQL
                            
                                Mysql Workbench, exporting selected rows of a table
                            
                                How to insert multiple value onkey change action?
                            
                                JDBC parameter verifyServerCertificate=false connects without the need for a clientkeystore and truststore
                            
                                When I INSERT multiple rows into a MySQL table, will the ids be increment by 1 everytime?
                            
                                Export MySQL to CSV, some columns with quotes and some without
                            
                                Finding date where conditions within 30 days has elapsed
                            
                                Including data in MySQL Docker container
                            
                                How to select a primary key which has exact foreign keys matches a given list of values?
                            
                                How to get Country Code from (intl-tel-input) plugin after submitting the form?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With