Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove non-ascii characters from string

Tags:

php

I'm getting strange characters when pulling data from a website:

 

How can I remove anything that isn't a non-extended ASCII character?


A more appropriate question can be found here: PHP - replace all non-alphanumeric chars for all languages supported

like image 201
LordZardeck Avatar asked Jan 08 '12 22:01

LordZardeck


People also ask

How do I remove non ASCII characters from a string in Python?

In python, to remove non-ASCII characters in python, we need to use string. encode() with encoding as ASCII and error as ignore, to returns a string without ASCII character use string. decode().

How do I remove non-printable characters from a string?

replaceAll("\\p{Cntrl}", "?"); The following will replace all ASCII non-printable characters (shorthand for [\p{Graph}\x20] ), including accented characters: my_string.

What are non ASCII characters Python?

In order to use non-ASCII characters, Python requires explicit encoding and decoding of strings into Unicode. In IBM® SPSS® Modeler, Python scripts are assumed to be encoded in UTF-8, which is a standard Unicode encoding that supports non-ASCII characters.


1 Answers

A regex replace would be the best option. Using $str as an example string and matching it using :print:, which is a POSIX Character Class:

$str = 'aAÂ'; $str = preg_replace('/[[:^print:]]/', '', $str); // should be aA 

What :print: does is look for all printable characters. The reverse, :^print:, looks for all non-printable characters. Any characters that are not part of the current character set will be removed.

Note: Before using this method, you must ensure that your current character set is ASCII. POSIX Character Classes support both ASCII and Unicode and will match only according to the current character set. As of PHP 5.6, the default charset is UTF-8.

like image 84
Chris Bornhoft Avatar answered Sep 18 '22 23:09

Chris Bornhoft