Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reliably strip invisible characters that break code?

I am trying to build a bookmarklet and got slammed with this issue which I was just able to figure out: a \u8203 character, which Chrome unhelpfully tells me in my block of code (upon pasting into the JS console) is an `"Invalid character ILLEGAL".

Luckily Safari was the one that told me it was a \u8203.

I am editing the code in the Sublime Text 2 editor and somehow copying in and out of it (I also tried TextEdit) fails to remove it.

Is there some sort of website somewhere that will strip all characters other than ASCII?

When I try to save as ISO 8859 but it will save it back as UTF-8 "because of unsupported characters".

... Yeah. that's the point. Get rid of my unsupported evil characters.

What am I supposed to do? Edit my file in a hex editor?

FYI I actually solved it by re-typing the code (which originated from this site by the way).

like image 827
Steven Lu Avatar asked Jul 19 '12 05:07

Steven Lu


People also ask

How do you reveal hidden characters?

As with most things in Word, you can use either a keyboard shortcut or the mouse to see the hidden formatting characters. Keyboard, hit Control+Shift+8. Mouse, simply click the Show/Hide button on the Home tab.

How do I remove hidden characters from a string in Java?

replaceAll("\\p{C}", "?"); This will replace all non-printable characters. Where p{C} selects the invisible control characters and unused code points. Save this answer.


2 Answers

Is there some sort of website somewhere that will strip all characters other than ASCII?

You could use this website

You can recreate the website using this code:

<!DOCTYPE html>
<html>

    <head>
        <meta http-equiv="content-type" content="text/html; charset=UTF-8">
        <title>- jsFiddle demo</title>
        <script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js"></script>
        <link rel="stylesheet" type="text/css" href="/css/normalize.css">
        <link rel="stylesheet" type="text/css" href="/css/result-light.css">
        <style type="text/css">
            textarea {
                width: 800px;
                height: 480px;
                outline: none;
                font-family: Monaco, Consolas, monospace;
                border: 0;
                padding: 15px;
                color: hsl(0, 0%, 27%);
                background-color: #F6F6F6;
            }
        </style>
        <script type="text/javascript">
            //<![CDATA[ 
            $(function () {
                $("button").click(function () {
                    $("textarea").val(
                             $("textarea").val().replace(/[^\u0000-\u007E]/g, "")
                    );
                    $("textarea").focus()[0].select();
                });
            }); //]]>
        </script>
    </head>

    <body>
        <textarea></textarea>
        <button>Remove</button>
    </body>

</html>
like image 110
Esailija Avatar answered Oct 26 '22 00:10

Esailija


you can use regex to filter everything out of 0-127. For example in javascript:

text.replace(/[^\x00-\x7F]/g, "")

x00 = 0, x7f = 127

like image 21
Matt Kim Avatar answered Oct 25 '22 23:10

Matt Kim