let's say you have a big IRC chan log, and you want to find out what user is using multiple accounts. As input you have the time the user connects to the server, and some sort of text analysis ( word frequency, and so on), and as output you want the likelihood two user "matches".
Is it possible to do it using ANN? Are there better algorithms to accomplish that task?
PS : use IP addresses is not an accepted solution :)
Most commonly used recognition algorithms may be characterized by a similarity matrix ϒ that represents all the information used to perform identification. The elements of ϒ are similarity measures ɛ which may be defined by the function. (1) Similarity is used to rank gallery images relative to a specific probe image.
Face recognition algorithms classified as geometry based or template based algorithms. The template-based methods can be constructed using statistical tools like SVM [Support Vector Machines], PCA [Principal Component Analysis], LDA [Linear Discriminant Analysis], Kernel methods or Trace Transforms.
The main facial recognition methods are feature analysis, neural network, eigen faces, and automatic face processing.
Facial recognition uses computer-generated filters to transform face images into numerical expressions that can be compared to determine their similarity. These filters are usually generated by using deep “learning,” which uses artificial neural networks to process data.
The problem with using neural networks is that you need a robust set of training data--that is, you need to have lots of examples of people using multiple accounts where you already know that's what they're doing. Furthermore, if the people you're trying to identify have ever played a role-playing game, they'll probably be able to make themselves seem quite a bit different if they want to.
So, if people are acting just like themselves and you have a pretty good training data set, then you stand a chance. You should probably start with methods used by forensic linguistics.
But I suspect that what you'll probably end up doing is identifying people who are sort of similar to each other. Good for a matchmaking site, perhaps; not so cool for most other things. (For example, I would think this would be a perfectly dreadful way to try to find members of Anonymous in other guises.)
This problem is known as "authorship detection" (or sometimes, in a particular domain, "plagiarism detection"). It can be done using a variety of statistical algorithms, of which neural networks aren't the easiest.
Check out the Cavnar & Trenkle algorithm for text classification. That may be made into a useful baseline algorithm for this task. Implementations in various languages are available on the web. You may want to turn it into a clustering algorithm instead of a classifier.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With