Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count all combined occurrences in SQL?

Is there any option to gain counted combinations of all elements in one SQL query, without using temp tables or procedures?

Consider these three tables:

  • products (id, product_name)

  • transactions (id, date)

  • transaction_has_product (id, product_id, transaction_id)

Sample data

  • products

    1   AAA
    2   BBB
    3   CCC
    
  • transactions

    1   some_date
    2   some_date
    
  • transaction_has_products

    1   1   1
    2   2   1
    3   3   1
    4   1   2
    5   2   2
    

Result should be:

AAA, BBB = 2   
AAA, CCC = 1   
BBB, CCC = 1   
AAA, BBB, CCC = 1
like image 905
objah Avatar asked Nov 08 '10 20:11

objah


2 Answers

Not easily because you have different numbers of matched products in the last row compared with the other rows. You might be able to do it with some sort of GROUP_CONCAT() operator (available in MySQL; implementable in other DBMS, such as Informix and probably PostgreSQL), but I'm not confident of that.

Pairwise Matching

SELECT p1.product_name AS name1, p2.product_name AS name2, COUNT(*)
  FROM (SELECT p.product_name, h.transaction_id
          FROM products AS p
          JOIN transactions_has_products AS h ON h.product_id = p.product_id
       ) AS p1
  JOIN (SELECT p.product_name, h.transaction_id
          FROM products AS p
          JOIN transactions_has_products AS h ON h.product_id = p.product_id
       ) AS p2
    ON p1.transaction_id = p2.transaction_id
   AND p1.product_name   < p2.product_name
 GROUP BY p1.name, p2.name;

Handling the triple match is non-trivial; extending it further than that is definitely rather difficult.

like image 62
Jonathan Leffler Avatar answered Oct 17 '22 03:10

Jonathan Leffler


If you know what all the products will be up front, you could do it by pivoting the data like this.

If you don't know what the products will be up front, you could build this query dynamically in a stored procedure. The practicality of either approach would break down if the number of products is large, but I think that would probably be true no matter how this requirement is accomplished.

select
    product_combination, 
    case product_combination
        when 'AAA, BBB' then aaa_bbb
        when 'AAA, CCC' then aaa_ccc
        when 'BBB, CCC' then bbb_ccc
        when 'AAA, BBB, CCC' then aaa_bbb_ccc
    end as number_of_transactions
from
(
    select 'AAA, BBB' as product_combination union all
    select 'AAA, CCC' union all
    select 'BBB, CCC' union all
    select 'AAA, BBB, CCC'
) as combination_list
cross join
(
    select
        sum(case when aaa = 1 and bbb = 1 then 1 else 0 end) as aaa_bbb,
        sum(case when aaa = 1 and ccc = 1 then 1 else 0 end) as aaa_ccc,
        sum(case when bbb = 1 and ccc = 1 then 1 else 0 end) as bbb_ccc,
        sum(case when aaa = 1 and bbb = 1 and ccc = 1 then 1 else 0 end) as aaa_bbb_ccc
    from
    (
        select
            count(case when a.product_name = 'AAA' then 1 else null end) as aaa,
            count(case when a.product_name = 'BBB' then 1 else null end) as bbb,
            count(case when a.product_name = 'CCC' then 1 else null end) as ccc,
            b.transaction_id
        from
            products a
        inner join
            transaction_has_products b
        on
            a.id = b.product_id
        group by
            b.transaction_id
    ) as product_matrix
) as combination_counts

Result:

product_combination  number_of_transactions
AAA, BBB             2
AAA, CCC             1
BBB, CCC             1
AAA, BBB, CCC        1
like image 22
Jeremy Shimanek Avatar answered Oct 17 '22 02:10

Jeremy Shimanek