Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this postgres stored procedure want to `use utf8`?

I have come across a peculiarity in a plperl stored procedure on Postgres 9.2 with Perl 5.12.4.

The curious behavior can be reproduced using this "broken" SP:

CREATE FUNCTION foo(VARCHAR) RETURNS VARCHAR AS $$
    my ( $re ) = @_;
    $re = ''.qr/\b($re)\b/i;
    return $re;
$$ LANGUAGE plperl;

When executed:

# select foo('foo');
ERROR:  Unable to load utf8.pm into plperl at line 3.
BEGIN failed--compilation aborted.
CONTEXT:  PL/Perl function "foo"

However, if I move the qr// operation into an eval, it works:

CREATE OR REPLACE FUNCTION bar(VARCHAR) RETURNS VARCHAR AS $$
    my ( $re ) = @_;
    eval "\$re = ''.qr/\\b($re)\\b/i;";
    return $re;
$$ LANGUAGE plperl;

Result:

# select bar('foo');
       bar       
-----------------
 (?^i:\b(foo)\b)
(1 row)
  1. Why does the eval bypass the automatic use utf8?

  2. Why is use utf8 even required in the first place? My code is not in UTF8, which is said to be the only time one should use utf8.

    If anything, I might expect the eval version to break without use utf8, in the case where the input to the script contained non-ASCII values. (Further testing shows that passing non-ASCII values to bar() does indeed cause the eval to fail with the same error)


Note that many Postgres installations automatically load 'utf8' on startup of the perl interpreter. This is the default in Debian at least, as demonstrated by executing DO 'elog(WARNING, join ", ", sort keys %INC)' language plperl;:

WARNING: Carp.pm, Carp/Heavy.pm, Exporter.pm, feature.pm, overload.pm, strict.pm, unicore/Heavy.pl, unicore/To/Fold.pl, unicore/lib/Perl/SpacePer.pl, utf8.pm, utf8_heavy.pl, vars.pm, warnings.pm, warnings/register.pm
CONTEXT: PL/Perl anonymous code block
DO

But not so on the machine demonstrating the odd behavior:

WARNING: Carp.pm, Carp/Heavy.pm, Exporter.pm, feature.pm, overload.pm, overloading.pm, strict.pm, vars.pm, warnings.pm, warnings/register.pm
CONTEXT: PL/Perl anonymous code block
DO

This question is not about how to get my target machine to load utf8 automatically; I know how to do that. I'm curious why it seems to be necessary in the first place.

like image 441
Flimzy Avatar asked Dec 03 '13 15:12

Flimzy


2 Answers

In the verison that's failing, you're executing

$re = ''.qr/\b($re)\b/i

In the version that's succeeding, you're executing

$re = ''.qr/\b(foo)\b/i

Sounds like qr// needs utf8.pm when the pattern was compiled as a Unicode pattern (whatever that means), but the latter isn't compiled as a Unicode pattern.


The failure to load utf8.pm is due to the limitations imposed by the Safe compartment created by plperl.

The fix is to load the module outside the Safe compartment.

The workaround is to use the more efficient

$re = '(?^u:\\b(?i:'.$re.')\\b)';
like image 162
ikegami Avatar answered Nov 06 '22 13:11

ikegami


I had the same issue and I fixed it by adding

plperl.on_init = 'use utf8; use re; package utf8; require "utf8_heavy.pl";'

to postgresql.conf file.

I hope this will help someone.

like image 1
Vajira Lasantha Avatar answered Nov 06 '22 13:11

Vajira Lasantha