Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check to see if a string is encoded as UTF-8

function seems_utf8($str) {
 $length = strlen($str);
 for ($i=0; $i < $length; $i++) {
  $c = ord($str[$i]);
  if ($c < 0x80) $n = 0; # 0bbbbbbb
  elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb
  elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb
  elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb
  elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb
  elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b
  else return false; # Does not match any model
  for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
   if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80))
    return false;
  }
 }
 return true;
}

I got this code from Wordpress, I don't know much about this, but I would like to know what exactly happing in that function.

If any one know please help me out?

I need the clear idea about the above code. If line by line explanation will be more helpful.

like image 411
coderex Avatar asked Nov 28 '22 05:11

coderex


1 Answers

I use two ways to check if string is utf-8 (depending on the case):

mb_internal_encoding('UTF-8'); // always needed before mb_ functions, check note below
if (mb_strlen($string) != strlen($string)) {
 /// not single byte
}

-- OR --

if (preg_match('!\S!u', $string)) {
 // utf8
}

For the mb_internal_encoding - due to some unknown to me bug in php (version 5.3- (haven't tested it on 5.3)) passing the encoding as a parameter to the mb_ function doesn't work and the internal encoding needs to be set before any use of mb_ functions.

like image 195
bisko Avatar answered Dec 16 '22 02:12

bisko