Match non printable/non ascii characters and remove from text

Tags:

My JavaScript is quite rusty so any help with this would be great. I have a requirement to detect non printable characters (control characters like SOH, BS etc) as well extended ascii characters such as Ž in a string and remove them but I am not sure how to write the code?

Can anyone point me in the right direction for how to go about this? This is what I have so far:

$(document).ready(function() {
    $('.jsTextArea').blur(function() {
        var pattern = /[^\000-\031]+/gi;
        var val = $(this).val();
        if (pattern.test(val)) {    
        for (var i = 0; i < val.length; i++) {
            var res = val.charAt([i]);
                alert("Character " + [i] + " " + res);              
        }          
    }
    else {
         alert("It failed");
     }

    });
});

783

asked Jun 15 '14 11:06

Grant Doole

4 Answers

To target characters that are not part of the printable basic ASCII range, you can use this simple regex:

[^ -~]+

Explanation: in the first 128 characters of the ASCII table, the printable range starts with the space character and ends with a tilde. These are the characters you want to keep. That range is expressed with [ -~], and the characters not in that range are expressed with [^ -~]. These are the ones we want to replace. Therefore:

result = string.replace(/[^ -~]+/g, "");

answered Oct 19 '22 04:10

zx81

No need to test, you can directly process the text box content:

textBoxContent = textBoxContent.replace(/[^\x20-\x7E]+/g, '');

where the range \x20-\x7E covers the printable part of the ascii table.

Example with your code:

$('.jsTextArea').blur(function() {
    this.value = this.value.replace(/[^\x20-\x7E]+/g, '');
});

answered Oct 19 '22 06:10

Casimir et Hippolyte

For anyone looking for a solution that works beyond ascii and does not strip out Unicode chars:

function stripNonPrintableAndNormalize(text) {
    // strip control chars
    text = text.replace(/\p{C}/gu, '');

    // other common tasks are to normalize newlines and other whitespace

    // normalize newline
    text = text.replace(/\n\r/g, '\n');
    text = text.replace(/\p{Zl}/gu, '\n');
    text = text.replace(/\p{Zp}/gu, '\n');

    // normalize space
    text = text.replace(/\p{Zs}/gu, ' ');

    return text;
}

The various unicode class identifiers (e.g. Zl for line separator) are defined at https://www.unicode.org/reports/tr44/ as also shown below:

Abbr	Long	Description
Lu	Uppercase_Letter	an uppercase letter
Ll	Lowercase_Letter	a lowercase letter
Lt	Titlecase_Letter	a digraphic character, with first part uppercase
LC	Cased_Letter	Lu \| Ll \| Lt
Lm	Modifier_Letter	a modifier letter
Lo	Other_Letter	other letters, including syllables and ideographs
L	Letter	Lu \| Ll \| Lt \| Lm \| Lo
Mn	Nonspacing_Mark	a nonspacing combining mark (zero advance width)
Mc	Spacing_Mark	a spacing combining mark (positive advance width)
Me	Enclosing_Mark	an enclosing combining mark
M	Mark	Mn \| Mc \| Me
Nd	Decimal_Number	a decimal digit
Nl	Letter_Number	a letterlike numeric character
No	Other_Number	a numeric character of other type
N	Number	Nd \| Nl \| No
Pc	Connector_Punctuation	a connecting punctuation mark, like a tie
Pd	Dash_Punctuation	a dash or hyphen punctuation mark
Ps	Open_Punctuation	an opening punctuation mark (of a pair)
Pe	Close_Punctuation	a closing punctuation mark (of a pair)
Pi	Initial_Punctuation	an initial quotation mark
Pf	Final_Punctuation	a final quotation mark
Po	Other_Punctuation	a punctuation mark of other type
P	Punctuation	Pc \| Pd \| Ps \| Pe \| Pi \| Pf \| Po
Sm	Math_Symbol	a symbol of mathematical use
Sc	Currency_Symbol	a currency sign
Sk	Modifier_Symbol	a non-letterlike modifier symbol
So	Other_Symbol	a symbol of other type
S	Symbol	Sm \| Sc \| Sk \| So
Zs	Space_Separator	a space character (of various non-zero widths)
Zl	Line_Separator	U+2028 LINE SEPARATOR only
Zp	Paragraph_Separator	U+2029 PARAGRAPH SEPARATOR only
Z	Separator	Zs \| Zl \| Zp
Cc	Control	a C0 or C1 control code
Cf	Format	a format control character
Cs	Surrogate	a surrogate code point
Co	Private_Use	a private-use character
Cn	Unassigned	a reserved unassigned code point or a noncharacter
C	Other	Cc \| Cf \| Cs \| Co \| Cn

answered Oct 19 '22 04:10

mwag

You have to assign a pattern (instead of string) into isNonAscii variable, then use test() to check if it matches. test() returns true or false.

$(document).ready(function() {
    $('.jsTextArea').blur(function() {
        var pattern = /[^\000-\031]+/gi;
        var val = $(this).val();
        if (pattern.test(val)) {
            alert("It matched");
        }
        else {
            alert("It did NOT match");
        }
    });
});

Check jsFiddle

answered Oct 19 '22 05:10

kosmos

Related questions
                            
                                What is the best way to determine if a date is today in JavaScript?
                            
                                Bypass HTML "required" attribute when submitting [duplicate]
                            
                                Calculating median - javascript
                            
                                Simplify nested if/else with repeated results?
                            
                                jQuery Screen Resolution Height Adjustment
                            
                                Remove Backslashes from Json Data in JavaScript
                            
                                Javascript not greater than 0 [closed]
                            
                                console.log doesn't work in CasperJS' evaluate with setTimeout
                            
                                How to remove comma from number which comes dynamically in .tpl file
                            
                                React Native KeyboardAvoidingView covers last text input
                            
                                Defer Attribute (Chrome)
                            
                                Open fancybox from function
                            
                                how to check if all object keys has false values
                            
                                Swap rows with columns (transposition) of a matrix in javascript [duplicate]
                            
                                Change text on hover, then return to the previous text
                            
                                How to clear all inputs, selects and also hidden fields in a form using jQuery?
                            
                                Converting a value to 2 decimal places within jQuery [duplicate]
                            
                                Fail a test with Chai.js
                            
                                Javascript - Get Image height
                            
                                fb login popup block

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Match non printable/non ascii characters and remove from text

Tags:

javascript

regex

control-characters