Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to validate text as not gibberish in PHP?

What is the best way to validate a string as not gibberish using PHP?

For example, if I get a string input from a user that must be at least 250 characters long, how can I tell whether they entered legitimate text (e.g. real words) or just gibberish to comply with the minimum characters (e.g. asdlfkjefksjlfkjldskfjelkef)?

I've thought about counting the number of words as one option, but the user could still space out their gibberish (e.g. asdlf kjef ksjlf kjl dskfje lkef), so it needs another kind of check on top of that.

Is there any way to check that at least half of a string contains real dictionary words, or something to that effect?

What is the best solution to this problem?

Thanks.

like image 404
PleaseHelpMe Avatar asked Oct 19 '25 23:10

PleaseHelpMe


2 Answers

You cannot do that properly because Colorless green ideas sleep furiously.

like image 72
Gordon Avatar answered Oct 21 '25 14:10

Gordon


You could look at Markov Chains. Simply put the idea is this algorithm determines whether sequences of characters look like they belong together. It won't necessarily tell you it's not gibberish, but it should catch out things like "ksjhglah etc".

See Markov text generators

like image 25
xzyfer Avatar answered Oct 21 '25 15:10

xzyfer



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!