How to train a classifier with only positive and neutral data?

Tags:

My question : How to train a classifier with only positive and neutral data?

I am building a personalized article recommendation system for education purposes. The data I use is from Instapaper.

Datasets

I only have positive data: - Articles that I have read and "liked", regardless of read/unread status

And neutral data (because I have expressed interest in it, but I may not like it later anyway): - Articles that are unread - Articles that I have read and marked as read but I did not "like" it

The data I do not have is negative data: - Articles that I did not send to Instapaper to read it later (I am not interested, although I have browsed that page/article) - Articles that I might not even have clicked into, but I might have or might not have archive it.

My problem

In such a problem, negative data is basically missing. I have thought of the following solution(s) but did not resolve to them yet:

1) Feed a number of negative data to the classifier Pros: Immediate negative data to teach the classifier Cons: As the number of articles I like increase, the negative data effect on the classifier dims out

2) Turn the "neutral" data into negative data Pros: Now I have all the positive and (new) negative data I need Cons: Despite the neutral data is of mild interest to me, I'd still like to get recommendations on such article, but perhaps as a less value class.

238

asked Dec 18 '12 16:12

log0

1 Answers

The Spy EM algorithm solves exactly this problem.

S-EM is a text learning or classification system that learns from a set of positive and unlabeled examples (no negative examples). It is based on a "spy" technique, naive Bayes and EM algorithm.

The basic idea is to combine your positive set with a whole bunch of random documents, some of which you hold out. You initially treat all the random documents as the negative class, and learn a naive bayes classifier on that set. Now some of those crawled documents will actually be positive, and you can conservatively relabel any documents that are scored higher than the lowest scoring held out true positive document. Then you iterate this process until it stablizes.

171

answered Nov 09 '22 21:11

Rob Neuhaus

Related questions
                            
                                Retrieve the list of child commits of a specific commit in Git
                            
                                Writing to txt file with StreamWriter and FileStream
                            
                                What is the -webkit-user-drag css property?
                            
                                Unsupervised automatic tagging algorithms?
                            
                                Jenkins Git plugin and Refspecs
                            
                                Spring MVC web app: application context starts twice
                            
                                Does Cassandra support sharding?
                            
                                How does `mongoose` handle adding documents that have FIELDS that are __NOT__ part of the schema?
                            
                                Calling a batch file from another batch file in different directory - resources not found
                            
                                Using vagrant on EC2
                            
                                Is it possible to set na.rm to TRUE globally?
                            
                                What is the radix parameter in Java, and how does it work?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With