Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL query Frequency Distribution matrix for product

i want to create a frequency distribution matrix

1.Create a matrix.**Is it possible to get this in separate columns**

  customer1       p1         p2      p3
  customer 2      p2         p3
  customer 3      p2         p3      p1
  customer 4      p2         p1

2. Then I have to count the number of products that come together the most

   For eg  
    p2 and p3 comes together 3 times
    p1 p3   comes 2 times
    p1 p2  comes  2 times

I want to recommend products to customers ,frequency of products that comes together

 select customerId,product,count(*) from sales group by customerId,product

Can anyone please help me for a solution to this

like image 640
coder25 Avatar asked Mar 12 '17 15:03

coder25


People also ask

What is frequency distribution in SQL?

In SQL, frequency distribution (usually presented in the form of a table) is used to get a quick overview of discrete variables. It can show the actual values as well as their: Absolute frequency. Absolute percentage. Cumulative frequency.

What is histogram in SQL?

A histogram is a special type of column statistic that sorts values into buckets – as you might sort coins into buckets. Generating a histogram is a great way to understand the distribution of data.


2 Answers

If you want pairs of products that customers purchase, then you can use a self join:

select s1.product, s2.product, count(*) as cnt
from sales s1 join
     sales s2
     on s1.customerId = s2.customerId
where s1.product < s2.product
group by s1.product, s2.product
order by cnt desc;

You can extend this to more than two products by using more joins.

like image 76
Gordon Linoff Avatar answered Sep 28 '22 05:09

Gordon Linoff


1.

Array

select      customerId
           ,sort_array(collect_set (product))   as products

from        sales 

group by    customerId
;

Multiple columns

select      customerId

           ,products[0] as p0
           ,products[1] as p1
           ,products[2] as p2
           ,products[3] as p3
           ,products[4] as p4
           ,products[5] as p5
           ,products[6] as p6
           ,products[7] as p7
           ,products[8] as p8
           ,products[9] as p9

from       (select      customerId
                       ,sort_array(collect_set (product))   as products

            from        sales 

            group by    customerId
            ) s
;

2.

This returns the frequency of the whole products' combinations.
In your example (p1,p2,p3) is the most frequent (appears twice).
(p1,p2) appears once and so is (p2,p3).
For frequency of tuples, see @GordonLinoff answer.

select      s.products
           ,count(*)    as frequency 

from       (select      customerId
                       ,sort_array(collect_set (product))   as products

            from        sales 

            group by    customerId
            ) s

group by    s.products           

order by    frequency desc
like image 22
David דודו Markovitz Avatar answered Sep 28 '22 05:09

David דודו Markovitz