Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"People who watched this also watched" algorithm

Tags:

algorithm

I am trying to code an algorithm that acts a bit like Amazon's "People who bought this also bought".

The difference between the two is that mine is just counting the "products" you watched in a single session, while Amazon is counting every purchase/checkout.

I have a bit of difficulty in implementing and figuring out what the algo should be.

  1. So far, I am counting by SessionID the productID that was watched.
  2. By the end of a day, I have many ProductIDs watched by many SessionIDs.
  3. Now, I need to create some sort of cliques in the DB. That is, going one-by-one on the SessionsIDs and extracting all the products they've viewed. then, writing it as a clique in a DB table.
  4. Once I have cliques, and a product is being viewed, I am scanning this table to look in which clique it is in, and then extracting all the rest of the productIDs.

Do you have any reference/idea if my algorithm is correct? Is there a better one?

like image 580
Himberjack Avatar asked Oct 16 '11 06:10

Himberjack


1 Answers

I was able to achieve your desired result using a simple DB structure, and a pretty simple query:

Table

TABLE `exa`

| sesh_id | prod_id |
---------------------
| 1       | 1       |
| 1       | 2       |
| 1       | 3       |
| 1       | 4       |
| 2       | 2       |
| 2       | 3       |
| 2       | 4       |
| 3       | 3       |
| 3       | 4       |
| 4       | 1       |
| 4       | 2       |
| 4       | 5       |

Query

SELECT c.prod_id, COUNT(*)
FROM `exa` a
JOIN `exa` b ON a.prod_id=b.prod_id
JOIN `exa` c ON b.sesh_id=c.sesh_id
WHERE a.`prod_id`=3 AND c.prod_id!=3
GROUP BY c.prod_id
ORDER BY 2 DESC;

Result

| prod_id | COUNT |
| 4       | 9     |
| 2       | 6     |
| 1       | 3     |

The idea is that every time a session views a product, it gets inserted into the table [in this case exa]

Then, on any particular product view, you can check and see what other products people who viewed this product also viewed, weighted by frequency. So in this particular example, EVERYONE that viewed product #3 viewed product #4, so it comes up first in the sort. Product #5 was only viewed by session #4, and session #4 didn't view product #3, so product #5 doesn't come up in the results.

like image 118
Shad Avatar answered Sep 24 '22 17:09

Shad