Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regular expression to remove all non-printable characters

I wish to remove all non-printable ascii characters from a string while retaining invisible ones. I thought this would work because whitespace, \n \r are invisible characters but not non-printable? Basically I am getting a byte array with � characters in it and I don't want them to be in it. So i am trying to convert it to a string, remove the � characters before using it as a byte array again.

Space works fine in my code now, however now \r and \n do not work. What would be the correct regex to retain these also? Or is there a better way that what I am doing?

public void write(byte[] bytes, int offset, int count) {

    try {
        String str = new String(bytes, "ASCII");
        str2 = str.replaceAll("[^\\p{Print}\\t\\n]", "");
        GraphicsTerminalActivity.sendOverSerial(str2.getBytes("ASCII"));

    } catch (UnsupportedEncodingException e) {

        e.printStackTrace();
    }

     return;
 }

} 

EDIT: I tried [^\x00-\x7F] which is the range of ascii characters....but then the � symbols still get through, weird.

like image 889
Paul Avatar asked Jan 28 '13 15:01

Paul


People also ask

What is \\ x00 -\\ x7F?

US-ASCII is a character set (and an encoding) with some notable features: Values are between 0–127 (x00–x7F) ASCII code-point 32 (decimal) represents a SPACE. ASCII code-point 65 represents the uppercase letter A.


1 Answers

The following regex will only match printable text

[^\x00\x08\x0B\x0C\x0E-\x1F]*

The following Regex will find non-printable characters

[\x00\x08\x0B\x0C\x0E-\x1F]

Jave Code:

boolean foundMatch = false;
try {
    Pattern regex = Pattern.compile("[\\x00\\x08\\x0B\\x0C\\x0E-\\x1F]");
    Matcher regexMatcher = regex.matcher(subjectString);
    foundMatch = regexMatcher.find();
    //Relace the found text with whatever you want
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}
like image 171
abc123 Avatar answered Sep 19 '22 21:09

abc123