i want to create a frequency distribution matrix
1.Create a matrix.**Is it possible to get this in separate columns**
customer1 p1 p2 p3
customer 2 p2 p3
customer 3 p2 p3 p1
customer 4 p2 p1
2. Then I have to count the number of products that come together the most
For eg
p2 and p3 comes together 3 times
p1 p3 comes 2 times
p1 p2 comes 2 times
I want to recommend products to customers ,frequency of products that comes together
select customerId,product,count(*) from sales group by customerId,product
Can anyone please help me for a solution to this
In SQL, frequency distribution (usually presented in the form of a table) is used to get a quick overview of discrete variables. It can show the actual values as well as their: Absolute frequency. Absolute percentage. Cumulative frequency.
A histogram is a special type of column statistic that sorts values into buckets – as you might sort coins into buckets. Generating a histogram is a great way to understand the distribution of data.
If you want pairs of products that customers purchase, then you can use a self join:
select s1.product, s2.product, count(*) as cnt
from sales s1 join
sales s2
on s1.customerId = s2.customerId
where s1.product < s2.product
group by s1.product, s2.product
order by cnt desc;
You can extend this to more than two products by using more joins.
1.
Array
select customerId
,sort_array(collect_set (product)) as products
from sales
group by customerId
;
Multiple columns
select customerId
,products[0] as p0
,products[1] as p1
,products[2] as p2
,products[3] as p3
,products[4] as p4
,products[5] as p5
,products[6] as p6
,products[7] as p7
,products[8] as p8
,products[9] as p9
from (select customerId
,sort_array(collect_set (product)) as products
from sales
group by customerId
) s
;
2.
This returns the frequency of the whole products' combinations.
In your example (p1,p2,p3)
is the most frequent (appears twice).(p1,p2)
appears once and so is (p2,p3)
.
For frequency of tuples, see @GordonLinoff answer.
select s.products
,count(*) as frequency
from (select customerId
,sort_array(collect_set (product)) as products
from sales
group by customerId
) s
group by s.products
order by frequency desc
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With