Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determine if UTF-8 text is all ASCII?

What's the fastest way, in PHP, to determine if some given UTF-8 text is purely ASCII or not?

like image 738
philfreo Avatar asked Nov 10 '10 18:11

philfreo


People also ask

How do I check if a string is ASCII?

The isascii() function returns a boolean value where True indicates that the string contains all ASCII characters and False indicates that the string contains some non-ASCII characters.

Can UTF-8 be read as ASCII?

Each character is represented by one to four bytes. UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character. The first 128 UTF-8 characters precisely match the first 128 ASCII characters (numbered 0-127), meaning that existing ASCII text is already valid UTF-8.

Can UTF-8 represent all characters?

Each UTF can represent any Unicode character that you need to represent. UTF-8 is based on 8-bit code units. Each character is encoded as 1 to 4 bytes. The first 128 Unicode code points are encoded as 1 byte in UTF-8.

Which function is used to check if all characters in a string conform to ASCII?

Method #1 : Using ord() + all() In this method, we search for all the string and check for each character, a value in range of ASCII characters.


1 Answers

A possibly faster function would be to use a negative character class (since the regex can just stop when it hits the first character, and there's no need to internally capture anything):

function isAscii($str) {
    return 0 == preg_match('/[^\x00-\x7F]/', $str);
}

Without regex (based on my comment) {

function isAscii($str) {
    $len = strlen($str) {
    for ($i = 0; $i < $len; $i++) {
        if (ord($str[$i]) > 127) return false;
    }
    return true;
}

But I'd have to ask, why are you so concerned about faster? Use the more readable and easier to understand version, and only worry about optimizing it when you know it's a problem...

Edit:

Then the fastest will likely be mb_check_encoding:

function isAscii($str) {
    return mb_check_encoding($str, 'ASCII');
}
like image 148
ircmaxell Avatar answered Sep 21 '22 19:09

ircmaxell