Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to filter out offensive words from Jcaptcha?

Tags:

java

captcha

We are using JCaptcha for a captcha tool in a small app that my team is writing. However, just during development time (on a small team - 4 of us), we've run across a number of curse words and other potentially offensive words for the actual captchas. Is there a way to filter out potentially offensive words so that they are not presented to the user?

like image 270
elduff Avatar asked Mar 31 '10 19:03

elduff


People also ask

Can bots bypass CAPTCHA?

In short, yes they can. While reCAPTCHA v2 and v3 can help limit simple bot traffic, both versions come with several problems: User experience suffers, as human users hate the image/audio recognition challenges. CAPTCHA farms and advances in AI allow cybercriminals and advanced bots to bypass reCAPTCHAs easily.

What is alternative to CAPTCHA?

One alternative to reCAPTCHA and CAPTCHA is a honeypot, a security mechanism meant to misguide bots. For example, a form with an extra field visible for bots but hidden for humans with CSS or JavaScript. Anything that fills out the hidden field won't be let through.

What is BotDetect CAPTCHA?

BotDetect™ CAPTCHA generator is a non-stalking form-security solution that uses a mix of measures, that are easy for humans but hard for bots, to prevent automated form posting.

How do you solve a CAPTCHA?

A CAPTCHA test is made up of two simple parts: a randomly generated sequence of letters and/or numbers that appear as a distorted image, and a text box. To pass a the test and prove your human identity, simply type the characters you see in the image into the text box.


1 Answers

I spent time downloading JCaptcha and looking at the source. Basically JCatpcha works like every single captcha out there besides ReCaptcha. Hence what you want to is trivial.

JCaptcha is using the very simple concept of a WordGenerator, which is and interface:

public interface WordGenerator {
    String getWord(Integer length);
    String getWord(Integer length, Locale locale);
}

Let us ignore localization.

Typical use is like this:

WordGenerator words = ...
WordToImage word2image = new SimpleWordToImage();
ImageCaptchaFactory factory = new GimpyFactory(words, word2image);
pixCaptcha = factory.getImageCaptcha();

In their unit tests we can see, for testing purpose:

    WordGenerator words = new DummyWordGenerator("TESTING");
    WordToImage word2image = new SimpleWordToImage();
    ImageCaptchaFactory factory = new GimpyFactory(words, word2image);
    pixCaptcha = factory.getImageCaptcha();

Note that we have ENTIRE control on the "WordGenerator" used.

Here's one (working, fully functional) word generator I just wrote:

private static final Random r = new Random( System.currentTimeMillis() );

public String getWord( final Integer length ) {
    final StringBuilder sb = new StringBuilder();
    for (int i = 0; i < length; i++) {
        final int rnd = r.nextInt( 52 );
        final char c = (char) (rnd < 26 ? 'a' + rnd : 'A' + (rnd-26));
        sb.append( c );
    }
    return sb.toString();
}

It generates random "words" like these:

fqXVxId
cdVWBSZ
zXeJFaY
aeoSeEb
OuBfzvL
unYewjG
EhbzRup
GkXkTyQ
yDGnHmh
mRFgHWM
FFBkTLF
DvCHIIT
fDmjqLH
XMWSOpa
muukLLN
jUedgYK
FlbWARe
WohMMgZ
lmeLHau
djHRqlc

Note that if you prefer "real words" (like reCaptcha, but reCaptcha is using real word for another purpose altogheter -- because it helps scanning/OCRing books!) it's not an issue, simply change getWord(...) to pick randomly words out of a dictionary.

Now how do you prevent insulting words to be picked up? This is trivial. Here I just give one example (please, no arguing about the code, it's really just one example that shows how it could be done):

private static final Set<String> s = new HashSet<String>();

static {
    s.add( "f**k" );
    s.add( "suck" );
    s.add( "dick" );
}

private static final Random r = new Random( System.currentTimeMillis() );

public String getWord( Integer length ) {
    String cand = getRandomWord( length );
    while ( isSwearWord(cand) ) {
        cand = getRandomWord( length );
    }
    return cand;
}

private boolean isSwearWord( final String w ) {
    return s.contains( w.toLowerCase() );
}

public String getRandomWord( final Integer length ) {
    final StringBuilder sb = new StringBuilder();
    for (int i = 0; i < length; i++) {
        final int rnd = r.nextInt( 52 );
        final char c = (char) (rnd < 26 ? 'a' + rnd : 'A' + (rnd-26));
        sb.append( c );
    }
    return sb.toString();
}

Now if you want to prevent swear words, you probably also want to prevent those close to swear words (eg "fvck" and "dikk" etc.). This is once again trivial:

 private boolean isSwearWord( final String w ) {
    List<String> ls = generateAllPermutationsWithLevenhsteinEditDistanceOne(w);
    for ( final String cand : ls ) {
        if ( s.contains( cand.toLowerCase()) ) {
            return true;
        }
    }
    return false;
}

Writing of the method "generateAllPermutationsWithLevenhsteinEditDistanceOne(w)" is left as an exercice to the reader.

like image 98
SyntaxT3rr0r Avatar answered Sep 20 '22 02:09

SyntaxT3rr0r