Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to check if a string looks randomized, or human generated and pronouncable?

For the purpose of identifying [possible] bot-generated usernames.

Suppose you have a username like "bilbomoothof" .. it may be nonsense, but it still contains pronouncable sounds and so appears human-generated.

I accept that it could have been randomly generated from a dictionary of syllables, or word parts, but let's assume for a moment that the bot in question is a bit rubbish.

  1. Suppose you have a username like "sdfgbhm342r3f", to a human this is clearly a random string. But can this be identified programatically?
  2. Are there any algorithms available (similar to Soundex, etc..) that can identify pronounceable sounds within a string like this?

Solutions applicable in PHP/MySQL most appreciated.

like image 357
Tim Whitlock Avatar asked Jul 22 '09 09:07

Tim Whitlock


1 Answers

I guess you could think of something like that if you could restrict yourself to pronounceable sounds in english. For me (I am French), words like szczepan or wawrzyniec are unpronounceable and certainly have a certain randomness.

But they are actually Polish first names (meaning steven and lawrence)...

like image 62
Mac Avatar answered Sep 23 '22 23:09

Mac