I'm getting strange characters when pulling data from a website:
Â
How can I remove anything that isn't a non-extended ASCII character?
A more appropriate question can be found here: PHP - replace all non-alphanumeric chars for all languages supported
In python, to remove non-ASCII characters in python, we need to use string. encode() with encoding as ASCII and error as ignore, to returns a string without ASCII character use string. decode().
replaceAll("\\p{Cntrl}", "?"); The following will replace all ASCII non-printable characters (shorthand for [\p{Graph}\x20] ), including accented characters: my_string.
In order to use non-ASCII characters, Python requires explicit encoding and decoding of strings into Unicode. In IBM® SPSS® Modeler, Python scripts are assumed to be encoded in UTF-8, which is a standard Unicode encoding that supports non-ASCII characters.
A regex replace would be the best option. Using $str
as an example string and matching it using :print:
, which is a POSIX Character Class:
$str = 'aAÂ'; $str = preg_replace('/[[:^print:]]/', '', $str); // should be aA
What :print:
does is look for all printable characters. The reverse, :^print:
, looks for all non-printable characters. Any characters that are not part of the current character set will be removed.
Note: Before using this method, you must ensure that your current character set is ASCII. POSIX Character Classes support both ASCII and Unicode and will match only according to the current character set. As of PHP 5.6, the default charset is UTF-8.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With