Algorithm to identify similarity between text messages

Tags:

I'm looking for an algorithm than can compare two text messages (let's say forum posts) and identify the similarity in percentage.

What would be the most efficient solution for this purpose?

The idea is to use this algorithm to identify users on a forum who have more than two nicknames, pretending to be different people.

I'm going to build a program that will read all their posts and compare each post from the first account to posts of the second account to find whether they are genuinely two different persons or just two registrations of a single user.

746

asked Feb 28 '14 23:02

SharpAffair

1 Answers

The first thing that came to my mind was the Levenshtein Distance, but it is more focused on words similarities.

You could use tf-idf, but it will probably work better if your corpus contains more than only two documents.

An alternative could be representing the documents (posts) using a vector space model, like:

(w_0, w_1, ..., w_k)

where

k is the total of terms (words) in your document
w_i is the i-th term.

and then compute the Hamming Distance, which basically compares two vectors (arrays) and count the positions where they are different. You can discard stop-words first (i.e. words like prepositions, etc.)

Take in count that the user might change some words, use synonyms, etc. There are lots of models for representing documents, computing similarity between them. Some of them take in count words dependency, which gives more semantic to the process, and others don't.

168

answered Sep 21 '22 05:09

Oscar Mederos

Related questions
                            
                                Catching generic non-fatal exceptions [closed]
                            
                                What problems will I face if I force the use of keywords like "public" and "private" in ASP.NET MVC Area names?
                            
                                Whats [ASP.net]MVC doing BEFORE my controller?
                            
                                Include files in bundle across projects
                            
                                Posting a SAML token to ASP.NET MVC website
                            
                                DateTime drifting - weird issue after 2 hours
                            
                                Best way to find Facebook/Twitter/Google Friends using the same app
                            
                                Alternative for LINQ's .Contains() [duplicate]
                            
                                Why do I receive “Invalid pipe handle” when accessing an anonymous pipe?
                            
                                Wix C# Custom Action not executing at all
                            
                                SignalR Error while closing the websocket - Invalid Handle
                            
                                How to get the duration of mp3 file without playing or storing them
                            
                                Doing a postback without AutoPostBack
                            
                                Page Break in C# Crystal Reports
                            
                                Why is Visual Studio skipping over my method when debugging?
                            
                                EntityFramework detecting complex type with database first
                            
                                wkhtmltopdf relative paths in HTML with redirected in/out streams won't work
                            
                                Retrieving HttpContext in a Custom NLog Target
                            
                                How to display C program errors in asp.net web page using c#
                            
                                ADFS Active Authentication .NET 4.5 (Post-WIF)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Algorithm to identify similarity between text messages

Tags:

c#

.net

text

algorithm

similarity

SharpAffair

People also ask

1 Answers

Oscar Mederos

Recent Activity

Donate For Us