Why Does Spark Query (Load) from Oracle Is So Slow Comparing to SQOOP?

Tags:

We found loading data with Spark's API from Oracle databases have been always slow since Spark 1.3 up to current Spark 2.0.1. The typical code is something in Java like this:

        Map<String, String> options = new HashMap<String, String>();
        options.put("url", ORACLE_CONNECTION_URL);
        options.put("dbtable", dbTable);
        options.put("batchsize", "100000");
        options.put("driver", "oracle.jdbc.OracleDriver");

        Dataset<Row> jdbcDF = sparkSession.read().options(options)
                .format("jdbc")
                .load().cache();
        jdbcDF.createTempView("my");

        //= sparkSession.sql(dbTable);
        jdbcDF.printSchema();
        jdbcDF.show();

        System.out.println(jdbcDF.count());

One of our members ever tried to customize this part and he improved a lot at the time (Spark 1.3.0). But some part of the Spark core code became internal to Spark so this cannot be used after the version. Also, we see HADOOP's SQOOP is much faster than Spark for this part (but it writes to HDFS, which will needs a lot of work to be converted to Dataset for Spark uses). Writing to Oracle using Spark's Dataset write method seems to be good for us. It is puzzling why this happens!

345

asked Oct 12 '16 20:10

Paul Z Wu

1 Answers

Well @Pau Z Wu already answered the question in the comments but the problem wasoptions.put("batchsize", "100000"); This needed to be options.put("fetchsize", "100000"); since fetch size deals with limiting the amount of rows retrived from the database at a time and would end up making the load time faster.

More information can be found here: https://docs.oracle.com/cd/A87860_01/doc/java.817/a83724/resltse5.htm

111

answered Apr 09 '23 09:04

Brent Knox

Related questions
                            
                                Simple type alias - Best practice for Oracle
                            
                                Oracle SQL adding multi-line table comment or column comment
                            
                                How to call REPLACE with CLOB (without exceeding 32K)
                            
                                USING index clause
                            
                                Storing recurring time periods in Oracle database
                            
                                How to migrate Spring Boot JMS from ActiveMQ to Oracle Advanced Queueing
                            
                                Is it possible to create Oracle associative array type outside of a package/procedure?
                            
                                SQLite to Oracle
                            
                                How can I access an Oracle database via ODBC from R without making the password public?
                            
                                Connecting Oracle to SQL Server via database link
                            
                                How to change character set in Oracle 11g r2 Express edition
                            
                                Strange behaviour of full outer join in Oracle - how it could be explained?
                            
                                Oracle CONNECT BY clause after GROUP BY clause
                            
                                Reflection in PLSQL?
                            
                                Oracle Cast using %TYPE attribute
                            
                                Oracle Sql Loader "ORA-01722: invalid number" when loading CSV file with Windows line endings
                            
                                oracle sql developer: 00904. 00000 - "%s: invalid identifier". Where is my fault?
                            
                                Inserting a child node in an XMLTYPE column
                            
                                SELECT from table with Varying IN list in WHERE clause
                            
                                Difference between setting a fetch size on Statement vs ResultSet

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why Does Spark Query (Load) from Oracle Is So Slow Comparing to SQOOP?

Tags:

oracle

apache-spark

apache-spark-sql

spark-dataframe

Paul Z Wu

People also ask

1 Answers

Brent Knox

Recent Activity

Donate For Us