Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detecting random keyboard hits considering QWERTY keyboard layout

The winner of a recent Wikipedia vandalism detection competition suggests that detection could be improved by "detecting random keyboard hits considering QWERTY keyboard layout".

Example: woijf qoeoifwjf oiiwjf oiwj pfowjfoiwjfo oiwjfoewoh

Is there any software that does this already (preferably free and open source) ?

If not, is there an active FOSS project whose goal is to achieve this?

If not, how would you suggest to implement such a software?

like image 424
Nicolas Raoul Avatar asked Sep 27 '10 08:09

Nicolas Raoul


2 Answers

If two bigrams in analyzed text are close in QWERTY terms but have near zero statistical frequency in English language (like pairs "fg" or "cd") then there is chance that random keyboard hits are involved. If more such pairs are found then chance increases greatly.

If you want to take into account the use of both hands for bashing then test letters that are separated with another letter for QWERTY closeness, but two bigrams (or even trigrams) for bigram frequency. For example in text "flsjf" you would check F and S for QWERTY distance, but bigrams FL and LS (or trigram FLS) for frequency.

like image 193
Dialecticus Avatar answered Sep 28 '22 09:09

Dialecticus


Consider empirical distribution of sequences of two letters, ie "probability of having letter a given it follows letter b", all this probabilities fill a table of size 27x27 (considering space as a letter).

Now, compare this with historical data from a bunch of english/french/whatever texts. Use Kullback divergence for comparison.

like image 26
Alexandre C. Avatar answered Sep 28 '22 10:09

Alexandre C.