if I write a query as such:
with WordBreakDown (idx, word, wordlength) as (
select
row_number() over () as idx,
word,
character_length(word) as wordlength
from
unnest(string_to_array('yo momma so fat', ' ')) as word
)
select
cast(wbd.idx + (
select SUM(wbd2.wordlength)
from WordBreakDown wbd2
where wbd2.idx <= wbd.idx
) - wbd.wordlength as integer) as position,
cast(wbd.word as character varying(512)) as part
from
WordBreakDown wbd;
... I get a table of 4 rows like so:
1;"yo"
4;"momma"
10;"so"
13;"fat"
... this is what I want. HOWEVER, if I wrap this into a function like so:
drop type if exists split_result cascade;
create type split_result as(
position integer,
part character varying(512)
);
drop function if exists split(character varying(512), character(1));
create function split(
_s character varying(512),
_sep character(1)
) returns setof split_result as $$
begin
return query
with WordBreakDown (idx, word, wordlength) as (
select
row_number() over () as idx,
word,
character_length(word) as wordlength
from
unnest(string_to_array(_s, _sep)) as word
)
select
cast(wbd.idx + (
select SUM(wbd2.wordlength)
from WordBreakDown wbd2
where wbd2.idx <= wbd.idx
) - wbd.wordlength as integer) as position,
cast(wbd.word as character varying(512)) as part
from
WordBreakDown wbd;
end;
$$ language plpgsql;
select * from split('yo momma so fat', ' ');
... I get:
1;"yo momma so fat"
I'm scratching my head on this. What am I screwing up?
UPDATE Per the suggestions below, I have replaced the function as such:
CREATE OR REPLACE FUNCTION split(_string character varying(512), _sep character(1))
RETURNS TABLE (postition int, part character varying(512)) AS
$BODY$
BEGIN
RETURN QUERY
WITH wbd AS (
SELECT (row_number() OVER ())::int AS idx
,word
,length(word) AS wordlength
FROM unnest(string_to_array(_string, rpad(_sep, 1))) AS word
)
SELECT (sum(wordlength) OVER (ORDER BY idx))::int + idx - wordlength
,word::character varying(512) -- AS part
FROM wbd;
END;
$BODY$ LANGUAGE plpgsql;
... which keeps my original function signature for maximum compatibility, and the lion's share of the performance gains. Thanks to the answerers, I found this to be a multifaceted learning experience. Your explanations really helped me understand what was going on.
Observe this:
select length(' '::character(1));
length
--------
0
(1 row)
A cause of this confusion is a bizarre definition of character type in SQL standard. From Postgres documentation for character types:
Values of type character are physically padded with spaces to the specified width n, and are stored and displayed that way. However, the padding spaces are treated as semantically insignificant. Trailing spaces are disregarded when comparing two values of type character, and they will be removed when converting a character value to one of the other string types.
So you should use string_to_array(_s, rpad(_sep,1)).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With