Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl DBI alternative to LongReadLen

Tags:

linux

oracle

perl

I'd like to know the most memory-efficient way to pull arbitrarily large data fields from an Oracle db with Perl DBI. The method I know to use is to set the 'LongReadLen' attribute on the database handle to something sufficiently large. However, my application needs to pull several thousand records, so doing this arbitarily is extremely memory inefficient.

The doc suggests doing a query upfront to find the largest potential value, and setting that.

$dbh->{LongReadLen} = $dbh->selectrow_array(qq{
    SELECT MAX(OCTET_LENGTH(long_column_name))
    FROM table WHERE ...
});
$sth = $dbh->prepare(qq{
    SELECT long_column_name, ... FROM table WHERE ...
});

However, this is still inefficient, since the outlying data is not representative of every record. The largest values are in excess of a MB, but the average record is less than a KB. I want to be able to pull all of the informatoin (i.e., no truncation) while wasting as little memory on unused buffers as possible.

A method I've considered is to pull the data in chunks, say 50 records a time, and set LongReadLen against the max length of records of that chunk. Another work around, which could, but doesn't have to, build on the chunk idea, would be to fork a child process, retrieve the data, and then kill the child (taking the wasted memory with it). The most wonderful thing would be the ability to force-free the DBI buffers, but I don't think that's possible.

Has anyone addressed a similar problem with any success? Thanks for the help!

EDIT

Perl v5.8.8, DBI v1.52

To clarify: the memory inefficiency is coming from using 'LongReadLen' together with {ora_pers_lob => 1} in the prepare. Using this code:

my $sql = "select myclob from my table where id = 68683";
my $dbh = DBI->connect( "dbi:Oracle:$db", $user, $pass ) or croak $DBI::errstr;

print "before";
readline( *STDIN );

$dbh->{'LongReadLen'} = 2 * 1024 * 1024;
my $sth = $dbh->prepare( $sql, {'ora_pers_lob' => 1} ) or croak $dbh->errstr;
$sth->execute() or croak( 'Cant execute_query '. $dbh->errstr . ' sql: ' . $sql );
my $row = $sth->fetchrow_hashref;

print "after";
readline( *STDIN );

Resident memory usage "before" is at 18MB and usage "after" is at 30MB. This is unacceptable over a large number of queries.

like image 325
Christopher Neylan Avatar asked Dec 08 '11 02:12

Christopher Neylan


1 Answers

Are your columns with large data LOBs (CLOBs or BLOBs)? If so, you don't need to use LongReadLen at all; DBD::Oracle provides a LOB streaming interface.

What you want to do is to bind the param as type ORA_CLOB or ORA_BLOB, which will get you a "LOB locator" returned from the query, instead of tex. Then you use ora_lob_read together with the LOB locator to get data. Here's an example of code that's worked for me:

sub read_lob {
  my ( $dbh, $clob ) = @_;

  my $BLOCK_SIZE = 16384;

  my $out;
  my $offset = 1;

  while ( my $data = $dbh->ora_lob_read( $clob, $offset, $BLOCK_SIZE ) ) {
    $out .= $data;
    $offset += $BLOCK_SIZE;
  }
  return $out;
}
like image 190
hobbs Avatar answered Sep 29 '22 16:09

hobbs