How to create buckets and groups within those buckets using PostgresQL

Tags:

postgresql

How to find the distribution of credit cards by year, and completed transaction. Group these credit cards into three buckets: less than 10 transactions, between 10 and 30 transactions, more than 30 transactions?

The first method I tried to use was using the width_buckets function in PostgresQL, but the documentation says that only creates equidistant buckets, which is not what I want in this case. Because of that, I turned to case statements. However, I'm not sure how to use the case statement with a group by.

This is the data I am working with:

table 1 - credit_cards table
credit_card_id
year_opened


table 2 - transactions table
transaction_id
credit_card_id - matches credit_cards.credit_card_id
transaction_status ("complete" or "incomplete")

This is what I have gotten so far:

SELECT 

CASE WHEN transaction_count < 10 THEN “Less than 10”
WHEN transaction_count >= 10 and transaction_count < 30 THEN “10 <= transaction count < 30”
ELSE transaction_count>=30 THEN “Greater than or equal to 30”
END as buckets

count(*) as ct.transaction_count
FROM credit_cards c
INNER JOIN transactions t
ON c.credit_card_id = t.credit_card_id
WHERE t.status = “completed”
GROUP BY v.year_opened

GROUP BY buckets
ORDER BY buckets

Expected output

credit card count | year opened | transaction count bucket
23421             | 2002        | Less than 10
etc

525

asked Jul 03 '18 17:07

pr338

2 Answers

You can specify the bin sizes in width_bucket by specifying a sorted array of the lower bound of each bin.

In you case, it would be array[10,30]: anything less than 10 gets bin 0, between 10 and 29 gets bin 1 and 30 or more gets bin 2.

WITH a AS (select generate_series(5,35) cnt)
SELECT  cnt, width_bucket(cnt, array[10,30]) 
FROM a;

136

answered Sep 19 '22 13:09

JGH

To figure this out you need to count transactions per credit card in order to figure out the right bucket, then you need to count the credit cards per bucket per year. There are a couple of different ways to get the final result. One way is to first join up all your data and compute the first level of aggregate values. Then compute the final level of aggregate values:

with t1 as (
  select year_opened
     , c.credit_card_id
     , case when count(*) < 10 then 'Less than 10'
            when count(*) < 30 then 'Between [10 and 30)'
            else 'Greater than or equal to 30'
       end buckets
  from credit_cards c
  join transactions t
    on t.credit_card_id = c.credit_card_id
 where t.transaction_status = 'complete'
 group by year_opened
     , c.credit_card_id
)
select count(*) credit_card_count
     , year_opened
     , buckets
  from t1
 group by year_opened
     , buckets;

However, it may be more perforamant first calculate the first level of aggregate data on the transactions table before joining it to the credit cards table:

select count(*) credit_card_count
     , year_opened
     , buckets
  from credit_cards c
  join (select credit_card_id
             , case when count(*) < 10 then 'Less than 10'
                    when count(*) < 30 then 'Between [10 and 30)'
                    else 'Greater than or equal to 30'
               end buckets
          from transactions
         group by credit_card_id) t
    on t.credit_card_id = c.credit_card_id
 group by year_opened
     , buckets;

If you prefer to unroll the above query and uses Common Table Expressions, you can do that too (I find this easier to read/follow along):

with bkt as (
  select credit_card_id
       , case when count(*) < 10 then 'Less than 10'
              when count(*) < 30 then 'Between [10 and 30)'
              else 'Greater than or equal to 30'
          end buckets
    from transactions
   group by credit_card_id
)
select count(*) credit_card_count
     , year_opened
     , buckets
  from credit_cards c
  join bkt t
    on t.credit_card_id = c.credit_card_id
 group by year_opened
     , buckets;

answered Sep 19 '22 13:09

Sentinel

Related questions
                            
                                Django + Postgres: save JSON string directly into model as JSON type
                            
                                How to validate pg array length in Ruby on Rails 5?
                            
                                Can Postgres silently ignore column constraint conflicts?
                            
                                Raise and catch user defined exceptions
                            
                                PGAdmin 4 fails to backup database with no errors
                            
                                postgres upsert json props
                            
                                Decimal value in Postgresql returned as String in Node.js
                            
                                Select only the rows with the latest date in postgres
                            
                                Liquibase + Postgresql + Spring Jpa : Id auto increment issue
                            
                                Synonym support on PostgreSQL
                            
                                Django Migration Database Column Order
                            
                                How to do an exact match followed by ORDER BY in PostgreSQL
                            
                                How to use Postgresql enum type via Kotlin Exposed ORM?
                            
                                Postgres - join on array values
                            
                                Delayed indexing in postgres
                            
                                Postgres Creating JSON Object from Aggregated Rows
                            
                                Django unable to migrate PostgreSQL: constraint X of relation Y does not exist
                            
                                DBLINK vs Postgres_FDW, which one may provide better performance?
                            
                                High Sierra + Python + Postgresql error: Illegal instruction: 4
                            
                                How does postgresql lock tables when inserting and selecting?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With