Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does statistical calculation of "similar products/music/..." from customer buying/listening behaviour work?

I mean product suggestions on Amazon or more specifically similar band recommendation on Last.fm.

Given that you can store the complete listening/buying behaviour of your users (WHO listened to WHAT how OFTEN?), how do you calculate which bands are similar to any given bands, and how much?

I've found some sites on Wikipedia (Association rule learning, Affinity analysis) but I'd like to get some information from a programmer's point of view and preferably some pseudocode or Python code for it.

Given that I have

 dic = {
"Alice"   : { "AC/DC" : 2, "The Raconteurs" : 3, "Mogwai" : 1 },
"Bob"     : { "The XX" : 4, "Lady Gaga" : 3, "Mogwai" : 1, "The Raconteurs" : 1 }
"Charlie" : { "AC/DC" : 7, "Lady Gaga" : 7 }
 }

where the numbers are play counts, how would I iterate over this to find the similarity of the bands?

like image 394
Felix Dombek Avatar asked Jan 20 '23 10:01

Felix Dombek


2 Answers

The book "Programming Collective Intelligence: Building Smart Web 2.0 Applications" is a classic and uses Python. Among other things it also deals with recommendation engines.

enter image description here

like image 55
nikow Avatar answered Jan 23 '23 01:01

nikow


You might find the Association Rules widget (among others) in Orange helpful in getting started. Another useful package, available with source, is pysuggest which implements a number of recsys/collaborative filtering algorithms.

like image 33
ars Avatar answered Jan 23 '23 01:01

ars