Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare millions of records from Oracle to SQL server

I have an Oracle database and a SQL Server database. There is one table say Inventory which contains millions of rows in both database tables and it keeps growing.

I want to compare the Oracle table data with the SQL Server data to find out which records are missing in the SQL Server table on daily basis.

Which is best approach for this?

  1. Create SSIS package.
  2. Create Windows service.

I want to consume less resource to achieve this functionality which takes less time and less resource.

Eg : 18 millions records in oracle and 16/17 millions in SQL Server

This situation of two different database arise because two different application online and offline

EDIT : How about connecting SQL server from oracle through Oracle Gateway to SQL server to

1) Direct query to SQL server from Oracle to update missing record in SQL server for 1st time.

2) Create a trigger on Oracle which gets executed when record is deleted from Oracle and it insert deleted record in new oracle table.

3) Create SSIS package to map newly created oracle table with SQL server to update SQL server record.This way only few records have to process daily through SSIS.

What do you think of this approach ?

like image 634
Mahajan344 Avatar asked Jan 15 '16 06:01

Mahajan344


2 Answers

I would create an SSIS package and load the data from the Oracle table use a Data Flow / OLE DB Data Source. If you have SQL Enterprise, the Attunity Connectors are a bit faster.

Then I would load key from the SQL Server table into a Lookup transformation, where I would match the 2 sources on the key, and direct unmatched rows into a separate output.

Finally I would direct the unmatched rows output to a OLE DB Command, to update the SQL Server table.

This SSIS package will require a lot of memory, but as the matching is done in memory with minimal IO, it will probably outperform other solutions for speed. It will need enough free memory to cache all the keys from the SQL Server Table.

SSIS also has the advantage that it has lots of other transformation functions available if you need them later.

like image 50
Mike Honey Avatar answered Oct 15 '22 21:10

Mike Honey


What you basically want to do is replication from Oracle to SQL Server.

You could do this in SSIS, A windows Service or indeed a multitude of platforms. The real trick is using the correct design pattern.

There are two general design patterns

  1. Snapshot Replication

You take all records from both systems and compare them somewhere (so far we have suggestions to compare in SSIS or compare on Oracle but not yet a suggestion to compare on SQL Server, although this is valid)

You are comparing 18 million records here so this is a lot of work

  1. Differential replication

You record the changes in the publisher (i.e. Oracle) since the last replication then you apply those changes to the subscriber (i.e. SQL Server)

You can do this manually by implementing triggers and log tables on the Oracle side, then use a regular ETL process (SSIS, command line tools, text files, whatever), probably scheduled in SQL Agent to apply these to the SQL Server.

Or you could do this by using the out of the box replication capability to set up Oracle as a publisher and SQL as a subscriber: https://msdn.microsoft.com/en-us/library/ms151149(v=sql.105).aspx

You're going to have to try a few of these and see what works for you.

Given this objective:

I want to consume less resource to achieve this functionality which takes less time and less resource

transactional replication is far more efficient but complicated. For maintenance purposes, which platforms (.Net, SSIS, Python etc.) are you most comfortable with?

like image 41
Nick.McDermaid Avatar answered Oct 15 '22 21:10

Nick.McDermaid