I'm trying to develop a site that recommends items(fx. books) to users based on their preferences. So far, I've read O'Reilly's "Collective Intelligence" and numerous other online articles. They all, however, seem to deal with single instances of recommendation, for example if you like book A then you might like book B.
What I'm trying to do is to create a set of 'preference-nodes' for each user on my site. Let's say a user likes book A,B and C. Then, when they add book D, I don't want the system to recommend other books based solely other users experience with book D. I wan't the system to look up similar 'preference-nodes' and recommend books based on that.
Here's an example of 4 nodes:
User1: 'book A'->'book B'->'book C'
User2: 'book A'->'book B'->'book C'->'book D'
user3: 'book X'->'book Y'->'book C'->'book Z'
user4: 'book W'->'book Q'->'book C'->'book Z'
So a recommendation system, as described in the material I've read, would recommend book Z to User 1, because there are two people who recommends Z in conjuction with liking C (ie. Z weighs more than D), even though a user with a similar 'preference-node', User2, would be more qualified to recommend book D because he has a more similar interest-pattern.
So do any of you have any experience with this sort of thing? Is there some things I should try to read or does there exist any open source systems for this?
Thanks for your time!
Small edit: I think last.fm's algorithm is doing exactly what I my system to do. Using the preference-trees of people to recommmend music more personally to people. Instead of just saying "you might like B because you liked A"
Collaborative filtering is a technique that can filter out items that a user might like on the basis of reactions by similar users. It works by searching a large group of people and finding a smaller set of users with tastes similar to a particular user.
Amazon is known for its use of collaborative filtering, matching products to users based on past purchases. For example, the system can identify all of the products a customer and users with similar behaviors have purchased and/or positively rated.
It uses rating information from all other users to provide predictions for a user-item interaction and, thereby, whittles down the item choices for the users, from the complete item set. Hence, the name collaborative filtering.
Create a table and insert the test data:
CREATE TABLE `ub` (
`user_id` int(11) NOT NULL,
`book_id` varchar(10) NOT NULL,
PRIMARY KEY (`user_id`,`book_id`),
UNIQUE KEY `book_id` (`book_id`,`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
insert into ub values (1, 'A'), (1, 'B'), (1, 'C');
insert into ub values (2, 'A'), (2, 'B'), (2, 'C'), (2,'D');
insert into ub values (3, 'X'), (3, 'Y'), (3, 'C'), (3,'Z');
insert into ub values (4, 'W'), (4, 'Q'), (4, 'C'), (4,'Z');
Join the test data onto itself by book_id, and create a temporary table to hold each user_id and the number of books it has in common with the target user_id:
create temporary table ub_rank as
select similar.user_id,count(*) rank
from ub target
join ub similar on target.book_id= similar.book_id and target.user_id != similar.user_id
where target.user_id = 1
group by similar.user_id;
select * from ub_rank;
+---------+------+
| user_id | rank |
+---------+------+
| 2 | 3 |
| 3 | 1 |
| 4 | 1 |
+---------+------+
3 rows in set (0.00 sec)
We can see that user_id has 3 in common with user_id 1, but user_id 3 and user_id 4 only have 1 each.
Next, select all the books that the users in the temporary table have that do not match the target user_id's books, and arrange these by rank. Note that the same book might appear in different user's lists, so we sum the rankings for each book so that common books get a higher ranking.
select similar.book_id, sum(ub_rank.rank) total_rank
from ub_rank
join ub similar on ub_rank.user_id = similar.user_id
left join ub target on target.user_id = 1 and target.book_id = similar.book_id
where target.book_id is null
group by similar.book_id
order by total_rank desc;
+---------+------------+
| book_id | total_rank |
+---------+------------+
| D | 3 |
| Z | 2 |
| X | 1 |
| Y | 1 |
| Q | 1 |
| W | 1 |
+---------+------------+
6 rows in set (0.00 sec)
Book Z appeared in two user lists, and so was ranked above X,Y,Q,W which only appeared in one user's list. Book D did best because it appeared in user_id 2's list, which had 3 items in common with target user_id 1.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With