Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to download and process a Very Large Compressed MS Access File to be loaded in Oracle

I have to download a 250mb~ ZIP File password encrypted through FTP. After downloaded I have to unzip it with a common password, the Zip file would contain an 1.5GB MS Access DB that I have to read and make some joins with some tables in my DB Oracle and transform and load that Data into that Oracle DB.

I'm looking for the best way to do this process. I'm a c# developer, so my first thought was to use c#, download the file via FtpClient or FtpWebRequest, then use a zip library like DotNetZip and open the MS Access dataBase via ODBC and load the records into Oracle with ODP.NEt, I think that's is my "easy way", cuz I know how to do it.

But since this a big file and I know this could take a long time, I'm concerned about time and efficiency and how to reduce the time of this process.

So I'm thinking that processing all the request directly into oracle(download the FTP from there, unzip it there, and process the information directly in there would reduce times like passing record by record from c# to oracle) should reduce the time of this process, but I'm not sure if this is the correct way of doing this.

So I started to look into librarys from oracle that could do what I'm trying to archieve and I found the PLSQL-utils and seems like they can do everything that I need except reading the MS Access DataBase and I started looking about that and found the Heterogeneous Services but I have never used them so I'm little lost about that.

Also I heard once that I could use Java directly from Oracle, and I know java can connect to MS Access via JDBC. So I searched about that and found something about Calling Java Methods in Oracle Database

That's what I have so far, but I don't know which method should I use, I mean, RDBMS as far as I know, are meant for processing data but not for programming things like downloading files or something like that, that's why we have OOP's languages.

As an additional information, this process is going to execute once or twice for month so I have to schedule it, if it is in oracle, can easily be done with an schedule job, or in c# with a Scheduled Task or Windows Service (those are the tools that I know)

Some restrictions that I have

  • My client don't have a MS SQL Server and neither can buy a license for it (So I cannot use DTSX for this process)
  • In the Oracle production server maybe I won't have enough permissions to do all the things, but I can comply for those if they are the best for the process
  • If a backend server (Java, c# hosted on IIS or WebLogic or JBoss or anykind) is going to be required, this Server and the Oracle Server would be differents
  • Oracle database hosted on Unix Server

Being said all of this, how can I efficiently do all this process, should I use .net and load record by record in my Oracle DataBase? Should I do everything in oracle?Or none of this? Is there a better way to do this?

like image 482
Hector Sanchez Avatar asked Oct 15 '13 21:10

Hector Sanchez


People also ask

What is the greatest feature in Oracle Database 12c?

One of the most useful features of Oracle 12c is the ability to limit rows easily. This is great for pagination and other similar requirements. In older versions of Oracle, you needed to have one or two nested subqueries with the ROWNUM function.

How do I backup a table in Oracle SQL Developer?

Exporting Object Definitions You can use these as a backup of the object definitions or run them in another schema. In this exercise, you export all the object definitions and the data for the Departments table. Using the main menu, select Tools->Database Export. An Export wizard will open.


1 Answers

I think you're on the right track with a C# console application to make it a repeatable process. Here is great free library I've used for zip on many projects.

using (var client = new WebClient())
using (var stream = client.OpenRead(@"ftp://mysite.com/mydb.zip"))
using (var file = File.Create(@"c:\temp\mydb.zip"))
{
    stream.CopyTo(@"c:\temp\mydb.zip", 32000);
}

using (ZipFile zip = ZipFile.Read(@"c:\temp\mydb.zip"))
{
    ZipEntry e = zip["bigdb.mdb"];
    e.Password = "yourpassword";
    e.Extract("c:\temp\bigdb.mdb");
}

Once unpacked, you can create a data connection to the access DB and datareader object. Then use the dbreader to read rows and write to flat file (avoids out of memory exception with large data sets).

private constr = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=yourdbfile.mdb;Jet OLEDB:Database Password=yourpassword;";
OleDbConnection conn = new OleDbConnection(constr);
string query = "SELECT * FROM [YourTable]";

OleDbCommand cmd = new OleDbCommand(query, conn);
OleDbDataReader reader = cmd.ExecuteReader();
int rowNum = 0;
StringBuilder sb = new StringBuilder(); 
while (reader.Read())
{
   // write rows to flat file in chunks of 10K rows.
   sb.Append(reader["FieldA"].ToString() + "|");
   sb.Append(reader["FieldB"].ToString() + "|");
   sb.Append(reader["FieldC"].ToString() + System.Environment.NewLine);

   if (rowNum % 10000 == 0)
   {
        File.AppendText(@"c:\temp\data.psv", sb.ToString());
        sb = new StringBuilder(); 
   }
   rowNum++;
}
File.AppendText(@"c:\temp\data.psv", sb.ToString());
reader.Close();

After you have your data table filled you can then export it to a flat file. I would not suggest inserting data row by row, that will be incredibly slow and it will bloat your Oracle db transaction logs. I don't believe Oracle 10g has a .Net driver that supports bulk loading, so you'll probably need to bulk load via a flat file.

Next, import into Oracle via command line, you can invoke this from your C# console app. Before you do this you'll need to have created a control file, ctl.ldr, first which is used by Oracle for bulk load operations.

options (skip=1)
load data
 INFILE 'c:\temp\data.psv'
 INTO table tblTest
 APPEND
 FIELDS TERMINATED BY "|" optionally enclosed by '"'      
 ( fielda,fieldb,etc...)

and then 
run it in as follows via command line

sqlldr username/pswd@oracle_sid control=ctl.ldr

Hopefully this helps, good luck!

[Edit]

You might also have a look at the .Net Oracle Bulk copy class. This was shipped with the Oracle 11g client drivers. Perhaps it will still work against your 10g server. A potential problem there, is that all your other apps on that same application server would need to work with these newer 11g client drivers too. Another option is to build a Java application that uses the Jena framework which supports bulk loading.

like image 66
13 revs Avatar answered Nov 10 '22 07:11

13 revs