Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle 11gR2 loading multiple files: sqlldr or external tables?

I have 471 files totaling about 100GB. The files are "\t" separate, with transaction data in the following format:

char(10) not null,
char(8) not null,
char(1) not null,
char(4) not null,
number not null,
char(1) not null,
char(1) not null,
char(1) not null,
number not null

The order of the transactions in the files is important and needs to be preserved, ideally with a primary key id. Initially, I loaded these files with sqlldr but it takes a very long time. I recently learned about external tables. From a strategic perspective, which method is better? How does the external table work? Thank you.

like image 858
anti_ml Avatar asked Oct 24 '22 13:10

anti_ml


1 Answers

The record parsing of External Tables and SQL*Loader is very similar, so normally there is not a major performance difference in the same record format. However, External Tables may be more appropriate in the following situations:

  • You want to transform the data as it is being loaded into the database.
  • You want to load data, and additional indexing of the staging table is required.
  • You want to use transparent parallel processing without having to split the external data first.

However, in the following situations, use SQL*Loader for the best load performance:

  • You want to load data remotely.
  • Transformations are not required on the data, and the data does not need to be loaded in parallel.

To improve the performance of SQL*Loader the following suggestions have been made.

  • Do not have any indexes and/or constraints (primary keys) on your load tables during the load process
  • Add the following option in the command line: DIRECT=TRUE. This will bypass most of the RDBMS processing by using the direct path loader instead of the conventional path loader. However, there are some cases when you can’t use direct load. These restrictions can be obtained from the Oracle Server Utilities Guide
  • Use fixed width data rather than delimited data. For delimited data, each record needs to be scanned for the delimiter
  • Try to avoid character set conversions as conversions are both time and cpu intensive
  • For conventional path, use the READSIZE and BINDSIZE parameters.
    READSIZE will grab larger chunks of data per read system call. The BINDSIZE parameter specifies the size of the bind array, which in turn specifies the number of rows which will be loaded per batch

Source: http://download.oracle.com/otndocs/products/database/enterprise_edition/utilities/pdf/sql_loader_faq.pdf

like image 132
Dennis Avatar answered Oct 31 '22 10:10

Dennis