What's wrong with this alternative mechanism to make DBI queries?

Question

In the DBI documentation, this is the recommended code for executing a query many times:

$sth = $dbh->prepare_cached($statement);
$sth->execute(@bind);
$data = $sth->fetchall_arrayref(@attrs);
$sth->finish;

However, I see that many* query methods allow passing a prepared and cached statement handle in place of a query string, which makes this possible:

$sth = $dbh->prepare_cached($statement);
$data = $dbh->selectall_arrayref($sth, \%attrs, @bind);

Is there anything wrong with this approach? I haven't seen it used in the wild.

FWIW, I have benchmarked these two implementations. And the second approach appears marginally (4%) faster, when querying for two consecutive rows using fetchall_arrayref in the first implementation vs selectall_arrayref in the second.

* The full list of query methods which support this are:

selectrow_arrayref - normal method with prepared statements is fetchrow_arrayref

selectrow_hashref - " " fetchrow_hashref

selectall_arrayref - " " fetchall_arrayref

selectall_hashref - " " fetchall_hashref

selectcol_arrayref (doesn't really count, as it has no parallel method using the first code path as described above - so the only way to use prepared statements with this method is to use the second code path above)

cjm · Accepted Answer

There's nothing wrong with it, as long as you were planning to do only one fetch. When you use the select*_* methods, all the data comes back in one chunk. My DBI code more often looks like this:

$sth = $dbh->prepare_cached($statement);
$sth->execute(@bind);
while (my $row = $sth->fetch) { # alias for fetchrow_arrayref
  # do something with @$row here
}

There's no equivalent to this using a select*_* method.

If you're going to call fetchall_* (or you're only fetching 1 row), then go ahead and use a select*_* method with a statement handle.

ikegami · Answer

No, there's nothing wrong that approach. There is something wrong with your benchmark or its analysis, though.

You've claimed that

$sth->execute(@bind);
$data = $sth->fetchall_arrayref(@attrs);
$sth->finish;

is slower than a call to

sub selectall_arrayref {
    my ($dbh, $stmt, $attr, @bind) = @_;
    my $sth = (ref $stmt) ? $stmt : $dbh->prepare($stmt, $attr)
        or return;
    $sth->execute(@bind) || return;
    my $slice = $attr->{Slice}; # typically undef, else hash or array ref
    if (!$slice and $slice=$attr->{Columns}) {
        if (ref $slice eq 'ARRAY') { # map col idx to perl array idx
            $slice = [ @{$attr->{Columns}} ];   # take a copy
            for (@$slice) { $_-- }
        }
    }
    my $rows = $sth->fetchall_arrayref($slice, my $MaxRows = $attr->{MaxRows});
    $sth->finish if defined $MaxRows;
    return $rows;
}

Maybe if you got rid of the useless call to finish you'll find the first faster? Note that benchmarks with less than 5% difference are not very telling; the accuracy isn't that high.

Update: s/faster than/slower than/

Schwern · Answer

The performance difference should not be between selectall_arrayref() and fetchall_arrayref() but between fetchall_arrayref() and doing a fetch() in a loop yourself. fetchall_arrayref() may be faster as it is hand optimized in C.

The docs for fetchall_arrayref discuss performance...

   If $max_rows is defined and greater than or equal to zero then it is
   used to limit the number of rows fetched before returning.
   fetchall_arrayref() can then be called again to fetch more rows.  This
   is especially useful when you need the better performance of
   fetchall_arrayref() but don't have enough memory to fetch and return
   all the rows in one go.

   Here's an example (assumes RaiseError is enabled):

     my $rows = []; # cache for batches of rows
     while( my $row = ( shift(@$rows) || # get row from cache, or reload cache:
                        shift(@{$rows=$sth->fetchall_arrayref(undef,10_000)||[]}) )
     ) {
       ...
     }

   That might be the fastest way to fetch and process lots of rows using
   the DBI, but it depends on the relative cost of method calls vs memory
   allocation.

   A standard "while" loop with column binding is often faster because the
   cost of allocating memory for the batch of rows is greater than the
   saving by reducing method calls. It's possible that the DBI may provide
   a way to reuse the memory of a previous batch in future, which would
   then shift the balance back towards fetchall_arrayref().

So that's a definitive "maybe". :-)

What's wrong with this alternative mechanism to make DBI queries?

Tags:

bind

perl

weak-typing

dbi

Ether

3 Answers

cjm

ikegami

Schwern

Recent Activity

Donate For Us

What's wrong with this alternative mechanism to make DBI queries?

Tags:

bind

perl

weak-typing

dbi

Ether

3 Answers

cjm

ikegami

Schwern

Related questions

Recent Activity

Donate For Us