Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speeding up perl DBI fetchrow_hashref

I have something that looks like this:

my $report = new ReportGenerator; #custom object
my $dbh = $dbc->prepare('SELECT * FROM some_table WHERE some_condition'); #DBI handle
$dbh->execute();
while(my $href = $dbh->fetchrow_hashref){
    $report->process_record($href);
}
$dbh->finish();
print $report->printReport();

My problem is that each iteration of the loop is very slow. The problem is the MySQL. I was wondering if it was possible to put some kind of wrapper in the while loop to make it fetch more than one record at a time, at the same time, fetching all records into memory is not practical either. I am not worried about the efficiency of the code(hashref vs arrayref,etc..). Rather, I am interested in fetching lets say 10000 records at a time.

The database has ~5 Million records. I can not change/upgrade the server.

Thanks

like image 350
Smartelf Avatar asked Dec 22 '11 14:12

Smartelf


3 Answers

You can use the fetchall_arrayref function which accepts a 'maxrows' argument:

while (my $data = $dbc->fetchall_arrayref(undef, 10000)) {
  for my $row( @{$data} ) {
    $report->process_record($row);
  }
}

You could also look at the RowCacheSize property which attempts to control how many records are returned in a fetch from your driver.

like image 173
Richard Avatar answered Nov 01 '22 15:11

Richard


Which bit is slow? Is it the call to execute, fetchrow_hashref or process_record? It seems unlikely to me that fetchrow_hashref is the problem. It's far more likely to be the execution of the query or the black-box of process_record.

But this all guesswork. It's impossible to really help here. I recommend you get some real data about the performance of the code by using Devel::NYTProf.

like image 31
Dave Cross Avatar answered Nov 01 '22 15:11

Dave Cross


The fastest way to fetch rows as hashes using the DBI is to use bind_columns() like this:

  $sth->execute;
  my %row;
  $sth->bind_columns( \( @row{ @{$sth->{NAME_lc} } } ));
  while ($sth->fetch) {
      print "$row{region}: $row{sales}\n";
  }

That's only appropriate if you're happy for each row to reuse the same hash.

Beyond that, I agree with davorg, avoid guesswork: measure first.

For much more information on using the DBI, including performance, see my tutorial slides (from 2007, but still relevant).

like image 28
Tim Bunce Avatar answered Nov 01 '22 15:11

Tim Bunce