Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

One SQL query to access multiple data sources in Java (from oracle, excel, sql server)

I need to develop application that can be getting data from multiple data sources ( Oracle, Excel, Microsoft Sql Server, and so on) using one SQL query. For example:

 SELECT o.employeeId, count(o.orderId) 
    FROM employees@excel e. customers@microsoftsql c, orders@oracle o 
    WHERE o.employeeId = e.employeeId and o.customerId = c.customerId 
    GROUP BY o.employeeId;

This sql and data sources must be changes dynamically by java program. My customers want to write and run sql-like query from different database and storage in same time with group by, having, count, sum and so on in web interface of my aplication. Other requirements is perfomance and light-weight.

I find this way to do it (and what drawbacks I see, please, fix me if I wrong):

  1. Apache Spark (drawbacks: heavy solution, more better for BigData, slow if you need getting up-to-date informations without cached it in Spark),

  2. Distributed queries in SQL server (Database link of Oracle, Linked server of Microsoft SQL Server, Power Query of Excel) - drawbacks: problem with change data sources dynamically by java program and problem with working with Excel,

  3. Prestodb (drawbacks: heavy solution, more better for BigData),

  4. Apache Drill (drawbacks: quite young solution, some problem with not latest odbc drivers and some bugs when working),

  5. Apache Calcite (ligth framework that be used by Apache Drill, drawbacks: quite young solution yet),

  6. Do join from data sources manually (drawbacks: a lot of work to develop correct join, "group by" in result set, find best execution plan and so on)

May be, do you know any other way (using free open-source solutions) or give me any advice from your experience about ways in above? Any help would be greatly appreciated.

like image 461
Slava Vedenin Avatar asked Jan 20 '16 11:01

Slava Vedenin


People also ask

Can you query multiple databases?

In summary, if all your databases are on one server instance, then multiple database querying is as easy as prefixing the table name with the database or schema name depending on your database software.

How do I extract data from multiple databases?

Create a Java data access class that contains the columns you want to retrieve. Retrieve the rows from each of the databases separately, creating a List of data access objects. Combine this Lists in Java to create one master List of data access objects.

How can I use Java and SQL together?

STEP 1: Allocate a Connection object, for connecting to the database server. STEP 2: Allocate a Statement object, under the Connection created earlier, for holding a SQL command. STEP 3: Write a SQL query and execute the query, via the Statement and Connection created. STEP 4: Process the query result.


1 Answers

UnityJDBC is a commercial JDBC Driver that wraps multiple datasoruces and allows you to treat them as if they were all part of the same database. It works as follows:

You define a "schema file" to describe each of your databases. The schema file resembles something like:

...
<TABLE>
    <semanticTableName>Database1.MY_TABLE</semanticTableName>
    <tableName>MY_TABLE</tableName>
    <numTuples>2000</numTuples>
 <FIELD>
    <semanticFieldName>MY_TABLE.MY_ID</semanticFieldName>
    <fieldName>MY_ID</fieldName>
    <dataType>3</dataType>
    <dataTypeName>DECIMAL</dataTypeName>
    ...

You also have a central "sources file" that references all of your schema files and gives connection information, and it looks like this:

<SOURCES>
    <DATABASE>
        <URL>jdbc:oracle:thin:@localhost:1521:xe</URL>
        <USER>scott</USER>
        <PASSWORD>tiger</PASSWORD>
        <DRIVER>oracle.jdbc.driver.OracleDriver</DRIVER>
        <SCHEMA>MyOracleSchema.xml</SCHEMA>
    </DATABASE>
    <DATABASE>
        <URL>jdbc:sqlserver://localhost:1433</URL>
        <USER>sa</USER>
        <PASSWORD>Password123</PASSWORD>
        <DRIVER>com.microsoft.sqlserver.jdbc.SQLServerDriver</DRIVER>
        <SCHEMA>MySQLServerSchema.xml</SCHEMA>
    </DATABASE> 
</SOURCES>

You can then use unity.jdbc.UnityDriver to allow your Java code to run SQL that joins across databases, like so:

String sql = "SELECT *\n" +
"FROM MyOracleDB.Whatever, MySQLServerDB.Something\n" +
"WHERE MyOracleDB.Whatever.whatever_id = MySQLServerDB.Something.whatever_id";
stmt.execute(sql);

So it looks like UnityJDBC provides the functionality that you need, however, I have to say that any solution that allows users to execute arbitrary SQL that joins tables across different databases sounds like a recipe to bring your databases to their knees. The solution that I would actually recommend to your type of requirements is to do ETL processes from all of your data sources into a single data warehouse and allow your users to query that; how to define those processes and your data warehouse is definitely too broad for a stackoverflow question.

like image 127
heenenee Avatar answered Sep 20 '22 20:09

heenenee