Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

postgresql function confusion

if I write a query as such:

with WordBreakDown (idx, word, wordlength) as (
    select 
        row_number() over () as idx,
        word,
        character_length(word) as wordlength
    from
    unnest(string_to_array('yo momma so fat', ' ')) as word
)
select 
    cast(wbd.idx + (
        select SUM(wbd2.wordlength)
        from WordBreakDown wbd2
        where wbd2.idx <= wbd.idx
        ) - wbd.wordlength as integer) as position,
    cast(wbd.word as character varying(512)) as part
from
    WordBreakDown wbd;  

... I get a table of 4 rows like so:

1;"yo"
4;"momma"
10;"so"
13;"fat"

... this is what I want. HOWEVER, if I wrap this into a function like so:

drop type if exists split_result cascade;
create type split_result as(
    position integer,
    part character varying(512)
);

drop function if exists split(character varying(512), character(1));    
create function split(
    _s character varying(512), 
    _sep character(1)
    ) returns setof split_result as $$
begin

    return query
    with WordBreakDown (idx, word, wordlength) as (
        select 
            row_number() over () as idx,
            word,
            character_length(word) as wordlength
        from
        unnest(string_to_array(_s, _sep)) as word
    )
    select 
        cast(wbd.idx + (
            select SUM(wbd2.wordlength)
            from WordBreakDown wbd2
            where wbd2.idx <= wbd.idx
            ) - wbd.wordlength as integer) as position,
        cast(wbd.word as character varying(512)) as part
    from
        WordBreakDown wbd;  

end;
$$ language plpgsql;

select * from split('yo momma so fat', ' ');

... I get:

1;"yo momma so fat"

I'm scratching my head on this. What am I screwing up?

UPDATE Per the suggestions below, I have replaced the function as such:

CREATE OR REPLACE FUNCTION split(_string character varying(512), _sep character(1))
  RETURNS TABLE (postition int, part character varying(512)) AS
$BODY$
BEGIN
    RETURN QUERY
    WITH wbd AS (
        SELECT (row_number() OVER ())::int AS idx
              ,word
              ,length(word) AS wordlength
        FROM   unnest(string_to_array(_string, rpad(_sep, 1))) AS word
        )
    SELECT (sum(wordlength) OVER (ORDER BY idx))::int + idx - wordlength
          ,word::character varying(512) -- AS part
    FROM wbd;  
END;
$BODY$ LANGUAGE plpgsql;

... which keeps my original function signature for maximum compatibility, and the lion's share of the performance gains. Thanks to the answerers, I found this to be a multifaceted learning experience. Your explanations really helped me understand what was going on.


1 Answers

Observe this:

select length(' '::character(1));
 length
--------
      0
(1 row)

A cause of this confusion is a bizarre definition of character type in SQL standard. From Postgres documentation for character types:

Values of type character are physically padded with spaces to the specified width n, and are stored and displayed that way. However, the padding spaces are treated as semantically insignificant. Trailing spaces are disregarded when comparing two values of type character, and they will be removed when converting a character value to one of the other string types.

So you should use string_to_array(_s, rpad(_sep,1)).

like image 126
Tometzky Avatar answered Jan 23 '26 11:01

Tometzky



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!