Having implemented an algorithm to recommend products with some success, I'm now looking at ways to calculate the initial input data for this algorithm. My objective is to calculate a score for each product that a user has some sort of history with. The data I am currently collecting: <ul> <li>User order history</li> <li>Product pageview history for both anonymous and registered users</li> </ul> All of this data is timestamped. <h3>What I'm looking for</h3> There are a couple of things I'm looking for suggestions on, and ideally this question should be treated more for discussion rather than aiming for a single 'right' answer. <ul> <li>Any additional data I can collect for a user that can directly imply an interest in a product</li> <li>Algorithms/equations for turning this data into scores for each product</li> </ul> <h3>What I'm NOT looking for</h3> Just to avoid this question being derailed with the wrong kind of answers, here is what I'm doing once I have this data for each user: <ul> <li>Generating a number of user clusters (21 at the moment) using the k-means clustering algorithm, using the pearsons coefficient for the distance score</li> <li>For each user (on demand) calculating their a graph of similar users by looking for their most and least similar users within their cluster, and repeating for an arbitrary depth.</li> <li>Calculating a score for each product based on the preferences of other users within the user's graph</li> <li>Sorting the scores to return a list of recommendations</li> </ul> Basically, I'm not looking for ideas on what to do once I have the input data (I may need further help with that later, but it's not the point of this question), just for ideas on how to generate this input data in the first place

Here's a haymaker of a response: <ul> <li>time spent looking at a product</li> <li>semantic interpretation of comments left about the product</li> <li>make a discussion page about a product, brand, or product category and semantically interpret the comments</li> <li>if they Shared a product page (email, del.icio.us, etc.)</li> <li>browser (mobile might make them spend less time on the page vis-à-vis laptop while indicating great interest) and connection speed (affects amt. of time spent on the page)</li> <li>facebook profile similarity</li> <li>heatmap data (e.g. à la kissmetrics)</li> </ul> What kind of products are you selling? That might help us answer you better. (Since this is an old question, I am addressing both @Andrew Ingram and anyone else who has the same question and found this thread through search.)

<ol> <li>You can allow users to explicitly state their preferences, the way netflix allows users to assign stars. </li> <li>You can assign a positive numeric value for all the stuff they bought, since you say you do have their purchase history. Assign zero for stuff they didn't buy</li> <li>You could do some sort of weighted value for stuff they bought, adjusted for what's popular. (if nearly everybody bought a product, it doesn't tell you much about a person that they also bought it) See "term frequency–inverse document frequency" </li> <li>You could also assign some lesser numeric value for items that users looked at but did not buy. </li> </ol>

Collaborative Filtering: Ways to determine implicit scores for products for each user?

Having implemented an algorithm to recommend products with some success, I'm now looking at ways to calculate the initial input data for this algorithm.

My objective is to calculate a score for each product that a user has some sort of history with.

The data I am currently collecting:

User order history
Product pageview history for both anonymous and registered users

All of this data is timestamped.

What I'm looking for

There are a couple of things I'm looking for suggestions on, and ideally this question should be treated more for discussion rather than aiming for a single 'right' answer.

Any additional data I can collect for a user that can directly imply an interest in a product
Algorithms/equations for turning this data into scores for each product

What I'm NOT looking for

Just to avoid this question being derailed with the wrong kind of answers, here is what I'm doing once I have this data for each user:

Generating a number of user clusters (21 at the moment) using the k-means clustering algorithm, using the pearsons coefficient for the distance score
For each user (on demand) calculating their a graph of similar users by looking for their most and least similar users within their cluster, and repeating for an arbitrary depth.
Calculating a score for each product based on the preferences of other users within the user's graph
Sorting the scores to return a list of recommendations

Basically, I'm not looking for ideas on what to do once I have the input data (I may need further help with that later, but it's not the point of this question), just for ideas on how to generate this input data in the first place

How are the methods of collaborative filtering used?

Collaborative filtering is a technique that can filter out items that a user might like on the basis of reactions by similar users. It works by searching a large group of people and finding a smaller set of users with tastes similar to a particular user.

Which technique is proper for solving collaborative filtering problem?

Which technique is proper for solving collaborative filtering problem? The standard method of Collaborative Filtering is known as Nearest Neighborhood algorithm.

How does user based collaborative filtering work?

Content-based filtering uses machine learning algorithms to predict and recommend new, yet similar, items to users. It uses item features to group similar items together. Collaborative filtering solely uses past interactions between the customers and the products they've used to recommend new items.

What is user user collaborative filtering?

Collaborative filtering is used by most recommendation systems to find similar patterns or information of the users, this technique can filter out items that users like on the basis of the ratings or reactions by similar users.

Here's a haymaker of a response:

time spent looking at a product
semantic interpretation of comments left about the product
make a discussion page about a product, brand, or product category and semantically interpret the comments
if they Shared a product page (email, del.icio.us, etc.)
browser (mobile might make them spend less time on the page vis-à-vis laptop while indicating great interest) and connection speed (affects amt. of time spent on the page)
facebook profile similarity
heatmap data (e.g. à la kissmetrics)

What kind of products are you selling? That might help us answer you better. (Since this is an old question, I am addressing both @Andrew Ingram and anyone else who has the same question and found this thread through search.)

You can allow users to explicitly state their preferences, the way netflix allows users to assign stars.
You can assign a positive numeric value for all the stuff they bought, since you say you do have their purchase history. Assign zero for stuff they didn't buy
You could do some sort of weighted value for stuff they bought, adjusted for what's popular. (if nearly everybody bought a product, it doesn't tell you much about a person that they also bought it) See "term frequency–inverse document frequency"
You could also assign some lesser numeric value for items that users looked at but did not buy.

Collaborative Filtering: Ways to determine implicit scores for products for each user?

Tags:

data-mining

collaborative-filtering

What I'm looking for

What I'm NOT looking for

Andrew Ingram

People also ask

2 Answers

isomorphismes

nont

Recent Activity

Donate For Us

Collaborative Filtering: Ways to determine implicit scores for products for each user?

Tags:

data-mining

collaborative-filtering

What I'm looking for

What I'm NOT looking for

Andrew Ingram

People also ask

2 Answers

isomorphismes

nont

Related questions

Recent Activity

Donate For Us