User recognition algorithm

Tags:

algorithm

let's say you have a big IRC chan log, and you want to find out what user is using multiple accounts. As input you have the time the user connects to the server, and some sort of text analysis ( word frequency, and so on), and as output you want the likelihood two user "matches".

Is it possible to do it using ANN? Are there better algorithms to accomplish that task?

PS : use IP addresses is not an accepted solution :)

831

asked Feb 21 '11 16:02

kaharas

2 Answers

The problem with using neural networks is that you need a robust set of training data--that is, you need to have lots of examples of people using multiple accounts where you already know that's what they're doing. Furthermore, if the people you're trying to identify have ever played a role-playing game, they'll probably be able to make themselves seem quite a bit different if they want to.

So, if people are acting just like themselves and you have a pretty good training data set, then you stand a chance. You should probably start with methods used by forensic linguistics.

But I suspect that what you'll probably end up doing is identifying people who are sort of similar to each other. Good for a matchmaking site, perhaps; not so cool for most other things. (For example, I would think this would be a perfectly dreadful way to try to find members of Anonymous in other guises.)

102

answered Nov 03 '22 05:11

Rex Kerr

This problem is known as "authorship detection" (or sometimes, in a particular domain, "plagiarism detection"). It can be done using a variety of statistical algorithms, of which neural networks aren't the easiest.

Check out the Cavnar & Trenkle algorithm for text classification. That may be made into a useful baseline algorithm for this task. Implementations in various languages are available on the web. You may want to turn it into a clustering algorithm instead of a classifier.

answered Nov 03 '22 05:11

Fred Foo

Related questions
                            
                                Pre RTree step: Divide a set of points into rectangular regions each containing one point
                            
                                Fitting a bimodal distribution to a set of values
                            
                                String Find/Replace Algorithm
                            
                                What's a good multi-core 64-bit "Hello World" program?
                            
                                extracting a specific melody/beat/rhythm from a specific instument from a mixed wave (or other music format) file
                            
                                Fastest way to get the set of convex polygons formed by Voronoi line segments
                            
                                Calculate occurrences of specified word in a large text file
                            
                                How to speed up calculation of length of longest common substring?
                            
                                What are some lesser known data structures and algorithms that one should know of?
                            
                                Collision Points in GJK
                            
                                Programing Pearls - Random Select algorithm
                            
                                Need to create a "choose your own adventure" type guide - best approach to use
                            
                                Working with huge text files in Java
                            
                                Algorithm to find the closest segment to a point among many segments (Reverse Geocoding)
                            
                                Convert ascii encoding to int and back again in python (quickly)
                            
                                Sliding AABB collision - getting stuck on edges
                            
                                How can I efficiently determine if two lists contain elements ordered in the same way?
                            
                                Batcher's Merge-Exchange Sort
                            
                                Sorting algorithm: Big text file with variable-length lines (comma-separated values)
                            
                                LP modelling question... long time since school

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With