Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

postgresql weird invalid input syntax for type numeric: "" while value is not an empty varchar

Tags:

postgresql

I am trying to debug a function not made by myself (dms2dd). I made my own test function (see below) and boiled my problem down to a specific row/value.

If I run the following query:

SELECT "Lat", "Long", test_dolf("Lat"), test_dolf("Long") FROM pawikan WHERE "Lat" IS NOT NULL AND "Long" IS NOT NULL ORDER BY index LIMIT 1 OFFSET 29130

I get the following output:

'N6° 6' 9.4824"';'E118° 26' 49.1172'' ';'9.4824';'49.1172'

which is exactly what I expect. But with the following query:

SELECT "Lat", "Long", CAST(test_dolf("Lat") as numeric), test_dolf("Long") FROM pawikan WHERE "Lat" IS NOT NULL AND "Long" IS NOT NULL ORDER BY index  LIMIT 1 OFFSET 29130

I get the error

ERROR: invalid input syntax for type numeric: ""
SQL state: 22P02

The error suggests that the varchar value I was trying to cast to numeric is empty, but as you can see from the previous query, it isn't. It is just a valid numeric varchar. Actually, if I copy-paste the value and run:

SELECT CAST('9.4824' AS numeric);

It totally works and the query actually results in a valid numeric. Even more, if I store the results of the first query in an intermediary table with:

SELECT "Lat", "Long", test_dolf("Lat") as lat_sec, test_dolf("Long") as long_sec INTO dms2dd_test FROM pawikan WHERE "Lat" IS NOT NULL AND "Long" IS NOT NULL ORDER BY index LIMIT 11 OFFSET 29120

and then issue a

SELECT CAST(long_sec as numeric), CAST(lat_sec AS numeric) FROM dms2dd_test;

it totally works. Even this works just fine:

SELECT test_dolf(E'N6° 6\' 9.4824"')::numeric as lat_sec

So what is going wrong here? It looks like in the second query where I cast to numeric, a different value is passed to my function, but I tested the sort column (index) and it contains only unique bigints.

This is the code for the test_dolf function:

CREATE OR REPLACE FUNCTION public.test_dolf(strdegminsec character varying)
  RETURNS varchar AS
$BODY$
    DECLARE
       i               numeric;
       intDmsLen       numeric;          -- Length of original string
       strCompassPoint Char(1);
       strNorm         varchar(16) = ''; -- Will contain normalized string
       strDegMinSecB   varchar(100);
       blnGotSeparator integer;          -- Keeps track of separator sequences
       arrDegMinSec    varchar[];        -- TYPE stringarray is table of varchar(2048) ;
       strChr          Char(1);
    BEGIN
       strDegMinSec := regexp_replace(replace(strdegminsec,E'\'\'','"'),' "([0-9]+)',E' \\1"');
       -- Remove leading and trailing spaces
       strDegMinSecB := REPLACE(strDegMinSec,' ','');
       intDmsLen := Length(strDegMinSecB);

       blnGotSeparator := 0; -- Not in separator sequence right now

       -- Loop over string, replacing anything that is not a digit or a
       -- decimal separator with
       -- a single blank
       FOR i in 1..intDmsLen LOOP
          -- Get current character
          strChr := SubStr(strDegMinSecB, i, 1);
          -- either add character to normalized string or replace
          -- separator sequence with single blank         
          If strpos('0123456789,.', strChr) > 0 Then
             -- add character but replace comma with point
             If (strChr <> ',') Then
                strNorm := strNorm || strChr;
             Else
                strNorm := strNorm || '.';
             End If;
             blnGotSeparator := 0;
          ElsIf strpos('neswNESW',strChr) > 0 Then -- Extract Compass Point if present
            strCompassPoint := strChr;
          Else
             -- ensure only one separator is replaced with a blank -
             -- suppress the rest
             If blnGotSeparator = 0 Then
                strNorm := strNorm || ' ';
                blnGotSeparator := 0;
             End If;
          End If;
       End Loop;

       -- Split normalized string into array of max 3 components
       arrDegMinSec := string_to_array(strNorm, ' ');
       return arrDegMinSec[3];
    End 
$BODY$
  LANGUAGE plpgsql IMMUTABLE
  COST 100;
like image 230
Dolf Andringa Avatar asked Jun 24 '17 03:06

Dolf Andringa


2 Answers

I figured out what the problem was. It looks like postgresql, even though I do a LIMIT and OFFSET, still calls the functions in the select for other rows outside that frame.

I figured this out by putting the code that raised the exception inside my function and catching the resulting error, and raising a NOTICE error when that exception occurs (see function below, specifically the BEGIN EXCEPTION END block at the end of the function). The notice is displayed as a warning, but doesn't cause the code execution to stop. All of a sudden it turned out that the function wasn't just called for the row that I was expecting it to be called for, but also for a whole bunch of other rows. This is totally not what I expected, and for me kind of counter intuitive, but I guess it is how postgresql is supposed to work.

Since catching exceptions is quite expensive in postgresql, I guess I need to add a test which prevents the exception in the first place (I could test the length of arrDegMinSec and the value of items 1-3 of that array and return NULL in case of invalid values.

CREATE OR REPLACE FUNCTION public.test_dolf(strdegminsec character varying)
  RETURNS numeric AS
$BODY$
    DECLARE
       i               numeric;
       intDmsLen       numeric;          -- Length of original string
       strCompassPoint Char(1);
       strNorm         varchar(16) = ''; -- Will contain normalized string
       strDegMinSecB   varchar(100);
       blnGotSeparator integer;          -- Keeps track of separator sequences
       arrDegMinSec    varchar[];        -- TYPE stringarray is table of varchar(2048) ;
       strChr          Char(1);
       retval          numeric;
    BEGIN

       strDegMinSec := regexp_replace(replace(strdegminsec,E'\'\'','"'),' "([0-9]+)',E' \\1"');
       -- Remove leading and trailing spaces
       strDegMinSecB := REPLACE(strDegMinSec,' ','');
       intDmsLen := Length(strDegMinSecB);

       blnGotSeparator := 0; -- Not in separator sequence right now

       -- Loop over string, replacing anything that is not a digit or a
       -- decimal separator with
       -- a single blank
       FOR i in 1..intDmsLen LOOP
          -- Get current character
          strChr := SubStr(strDegMinSecB, i, 1);
          -- either add character to normalized string or replace
          -- separator sequence with single blank         
          If strpos('0123456789,.', strChr) > 0 Then
             -- add character but replace comma with point
             If (strChr <> ',') Then
                strNorm := strNorm || strChr;
             Else
                strNorm := strNorm || '.';
             End If;
             blnGotSeparator := 0;
          ElsIf strpos('neswNESW',strChr) > 0 Then -- Extract Compass Point if present
            strCompassPoint := strChr;
          Else
             -- ensure only one separator is replaced with a blank -
             -- suppress the rest
             If blnGotSeparator = 0 Then
                strNorm := strNorm || ' ';
                blnGotSeparator := 0;
             End If;
          End If;
       End Loop;
       -- Split normalized string into array of max 3 components
       arrDegMinSec := string_to_array(strNorm, ' ');
       BEGIN
          retval := arrDegMinSec[3]::numeric;
          return retval;
       EXCEPTION
          WHEN SQLSTATE '22P02' THEN
             RAISE NOTICE 'Incorrect value %', strDegMinSec;
             RETURN NULL;
       END;
    End 
$BODY$
  LANGUAGE plpgsql IMMUTABLE
  COST 100;

EDIT

Courtesy of @michel.milezzi another solution that doesn't require modification of the function is to change the function call in the query to

CAST(NULLIF(test_dolf("Lat"), '') as numeric)

And indeed as @abelisto suggests, I could also have put the query in a subquery and only casting it to numeric in the main query like this:

SELECT "Lat", "Long", CAST(test_dolf("Lat") as numeric), test_dolf("Long") FROM (SELECT * FROM pawikan WHERE "Lat" IS NOT NULL AND "Long" IS NOT NULL ORDER BY index LIMIT 1 OFFSET 29130) as t

This would indeed have prevented the problem, which would have indeed simplified the debugging process.

That being said, I was going to modify the function anyway (to make it more robust for dirty data) so for me that was in this case the best solution.

like image 126
Dolf Andringa Avatar answered Nov 13 '22 16:11

Dolf Andringa


The error your getting is the following:

ERROR: invalid input syntax for type numeric: ""

So, it's trying to cast a empty string to a numeric value. What about using NULLIF function to address this?

SELECT "Lat", "Long", CAST(NULLIF(test_dolf("Lat"), '') as numeric), test_dolf("Long") FROM pawikan WHERE "Lat" IS NOT NULL AND "Long" IS NOT NULL ORDER BY index  LIMIT 1 OFFSET 29130;

Also you might want to see execution plan to understand this issue. What may be happening is the LIMIT and OFFSET is executed just after the casting. This explain why you're not seeing the row with empty string.


EDIT

Oops, I should read your answer before posting this. Anyway you still can use NULLIF to workaround your issue.

like image 30
Michel Milezzi Avatar answered Nov 13 '22 16:11

Michel Milezzi