I've been scratching my head on this problem in PostgreSQL. I have a table <code>test</code> with 2 columns: - <code>id</code> and <code>content</code>. e.g. <pre class="prettyprint"><code>create table test (id integer, content varchar(1024)); insert into test (id, content) values (1, 'Lorem Ipsum is simply dummy text of the printing and typesetting industry.'), (2, 'Lorem Ipsum has been the industrys standard dummy text '), (3, 'ever since the 1500s, when an unknown printer took a galley of type and scrambled it to'), (4, 'make a type specimen book.'), (5, 'It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.'), (6, 'It was popularised in the 1960s with the release of Letraset sheets containing Lorem '), (7, 'Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker'), (8, ' including versions of Lorem Ipsum.'); </code></pre> If I run the following query ... <pre class="prettyprint"><code>select id, length(content) as characters from test order by id </code></pre> ... then I get: - <pre class="prettyprint"><code>id | characters ---+----------- 1 | 74 2 | 55 3 | 87 4 | 26 5 | 120 6 | 85 7 | 87 8 | 35 </code></pre> What I want to do is group the <code>id</code> into rows where the sum of the content goes over a threshold. For example, if that threshold is <code>100</code> then the desired result would look like the following: - <pre class="prettyprint"><code>ids | characters ----+----------- 1,2 | 129 3,4 | 113 5 | 120 6,7 | 172 8 | 35 </code></pre> NOTE (1): - The query doesn't need to generate a <code>characters</code> column - just the <code>ids</code> - they are here to communicate that they are all over <code>100</code> - except for the last row which is <code>35</code>. NOTE (2): - <code>ids</code> could be a comma-delimited string or a PostgreSQL array - the type is less important than the values Can I use a window function to do this or do I need something more complex like a <code>lateral join</code>?

This type of problem requires a recursive CTE (or similar functionality). Here is an example: <pre class="prettyprint"><code>with recursive t as ( select id, length(content) as len, row_number() over (order by id) as seqnum from test ), cte(id, len, ids, seqnum, grp) as ( select id, len, len as cumelen, t.id::text, 1::int as seqnum, 1 as grp from t where seqnum = 1 union all select t.id, t.len, (case when cte.cumelen >= 100 then t.len else cte.cumelen + t.len end) as cumelen, (case when cte.cumelen >= 100 then t.id::text else cte.ids || ',' || t.id::text end) as ids, t.seqnum (case when cte.cumelen >= 100 then cte.grp + 1 else cte.grp end) as ids, from t join cte on cte.seqnum = t.seqnum - 1 ) select grp, max(ids) from cte group by grp; </code></pre> Here is a small working example: <pre class="prettyprint"><code>with recursive test as ( select 1 as id, 'abcd'::text as content union all select 2 as id, 'abcd'::text as content union all select 3 as id, 'abcd'::text as content ), t as ( select id, length(content) as len, row_number() over (order by id) as seqnum from test ), cte(id, len, cumelen, ids, seqnum, grp) as ( select id, len, len as cumelen, t.id::text, 1::int as seqnum, 1 as grp from t where seqnum = 1 union all select t.id, t.len, (case when cte.cumelen >= 5 then t.len else cte.cumelen + t.len end) as cumelen, (case when cte.cumelen >= 5 then t.id::text else cte.ids || ',' || t.id::text end) as ids, t.seqnum::int, (case when cte.cumelen >= 5 then cte.grp + 1 else cte.grp end) from t join cte on cte.seqnum = t.seqnum - 1 ) select grp, max(ids) from cte group by grp; </code></pre>

PostgreSQL Group By Sum

Tags:

sql

postgresql

I've been scratching my head on this problem in PostgreSQL. I have a table test with 2 columns: - id and content. e.g.

create table test (id integer, 
                   content varchar(1024));

insert into test (id, content) values 
    (1, 'Lorem Ipsum is simply dummy text of the printing and typesetting industry.'),
    (2, 'Lorem Ipsum has been the industrys standard dummy text '),
    (3, 'ever since the 1500s, when an unknown printer took a galley of type and scrambled it to'),
    (4, 'make a type specimen book.'),
    (5, 'It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.'),
    (6, 'It was popularised in the 1960s with the release of Letraset sheets containing Lorem '),
    (7, 'Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker'),
    (8, ' including versions of Lorem Ipsum.');

If I run the following query ...

select id, length(content) as characters from test order by id

... then I get: -

id | characters
---+-----------
 1 |         74
 2 |         55
 3 |         87
 4 |         26
 5 |        120
 6 |         85
 7 |         87
 8 |         35

What I want to do is group the id into rows where the sum of the content goes over a threshold. For example, if that threshold is 100 then the desired result would look like the following: -

ids | characters
----+-----------   
1,2 |        129
3,4 |        113    
5   |        120
6,7 |        172    
8   |         35

NOTE (1): - The query doesn't need to generate a characters column - just the ids - they are here to communicate that they are all over 100 - except for the last row which is 35.

NOTE (2): - ids could be a comma-delimited string or a PostgreSQL array - the type is less important than the values

Can I use a window function to do this or do I need something more complex like a lateral join?

601

asked Nov 09 '16 12:11

bobmarksie

2 Answers

This type of problem requires a recursive CTE (or similar functionality). Here is an example:

with recursive t as (
      select id, length(content) as len,
             row_number() over (order by id) as seqnum
      from test 
     ),
     cte(id, len, ids, seqnum, grp) as (
      select id, len, len as cumelen, t.id::text, 1::int as seqnum, 1 as grp
      from t
      where seqnum = 1
      union all
      select t.id,
             t.len,
             (case when cte.cumelen >= 100 then t.len else cte.cumelen + t.len end) as cumelen,
             (case when cte.cumelen >= 100 then t.id::text else cte.ids || ',' || t.id::text end) as ids,
             t.seqnum
             (case when cte.cumelen >= 100 then cte.grp + 1 else cte.grp end) as ids,
      from t join
           cte
           on cte.seqnum = t.seqnum - 1
     )
select grp, max(ids)
from cte
group by grp;

Here is a small working example:

with recursive test as (
      select 1 as id, 'abcd'::text as content union all
      select 2 as id, 'abcd'::text as content union all
      select 3 as id, 'abcd'::text as content 
     ),
     t as (
      select id, length(content) as len,
             row_number() over (order by id) as seqnum
      from test 
     ),
     cte(id, len, cumelen, ids, seqnum, grp) as (
      select id, len, len as cumelen, t.id::text, 1::int as seqnum, 1 as grp
      from t
      where seqnum = 1
      union all
      select t.id,
             t.len,
             (case when cte.cumelen >= 5 then t.len else cte.cumelen + t.len end) as cumelen,
             (case when cte.cumelen >= 5 then t.id::text else cte.ids || ',' || t.id::text end) as ids,
             t.seqnum::int,
             (case when cte.cumelen >= 5 then cte.grp + 1 else cte.grp end)
      from t join
           cte
           on cte.seqnum = t.seqnum - 1
     )
select grp, max(ids)
from cte
group by grp;

answered Sep 28 '22 02:09

Gordon Linoff

Using stored functions allows to avoid (sometime) the head-breaking queries.

create or replace function fn_foo(ids out int[], characters out int) returns setof record language plpgsql as $$
declare
  r record;
  threshold int := 100;
begin
  ids := '{}'; characters := 0;
  for r in (
    select id, coalesce(length(content),0) as lng
    from test order by id)
  loop
    characters := characters + r.lng;
    ids := ids || r.id;
    if characters > threshold then
      return next;
      ids := '{}'; characters := 0;
    end if;
  end loop;
  if ids <> '{}' then
    return next;
  end if;
end $$;

select * from fn_foo();

╔═══════╤════════════╗
║  ids  │ characters ║
╠═══════╪════════════╣
║ {1,2} │        129 ║
║ {3,4} │        113 ║
║ {5}   │        120 ║
║ {6,7} │        172 ║
║ {8}   │         35 ║
╚═══════╧════════════╝
(5 rows)

answered Sep 28 '22 02:09

Abelisto

Related questions
                            
                                How to check progress of long running insertions in oracle
                            
                                adding a value to a column from data in next row sql
                            
                                How to deal with Unicode replacement character � (0xFFFD / 65533) in SQL
                            
                                Why are both SELECT count(PK) and SELECT count(*) so slow?
                            
                                SQL INSERT INTO WITH SELECT query
                            
                                NULL defaults to empty string in mysql?
                            
                                Is null-checking on Linq queries idiomatic?
                            
                                Updating the database using php
                            
                                SQL's `case when ...` code conversion using data.table package in R
                            
                                Add an index to a timestamp with time zone
                            
                                select UNION except one column
                            
                                How to reference the auto incremented id when performing a second insert in Liquibase?
                            
                                Oracle sql order by with case statement
                            
                                How to set default value while insert null value into not null column SQL Server?
                            
                                Selected columns dont have compatiable type, even it has same type
                            
                                ALTER TRIGGER command in PostgreSQL
                            
                                How do I validate an SQL query before executing it using C#
                            
                                PostgreSQL: Unable to drop a specific table named "user"
                            
                                SQL Server execute (sp_executesql ) command in SQL function
                            
                                Oracle 12c: Multiple functions in a SELECT statement's WITH clause

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With