Is there any way in PHP of detecting the following character �
?
I'm currently fixing a number of UTF-8 encoding issues with a few different algorithms and need to be able to detect if �
is present in a string. How do I do so with strpos
?
Simply pasting the character into my codebase does not seem to work.
if (strpos($names['decode'], '?') !== false || strpos($names['decode'], '�') !== false)
Converting a UTF-8 string into UTF-8 using iconv()
using the //IGNORE
parameter produces a result where invalid UTF-8 characters are dropped.
Therefore, you can detect a broken character by comparing the length of the string before and after the iconv operation. If they differ, they contained a broken character.
Test case (make sure you save the file as UTF-8):
<?php
header("Content-type: text/html; charset=utf-8");
$teststring = "Düsseldorf";
// Deliberately create broken string
// by encoding the original string as ISO-8859-1
$teststring_broken = utf8_decode($teststring);
echo "Broken string: ".$teststring_broken ;
echo "<br>";
$teststring_converted = iconv("UTF-8", "UTF-8//IGNORE", $teststring_broken );
echo $teststring_converted;
echo "<br>";
if (strlen($teststring_converted) != strlen($teststring_broken ))
echo "The string contained an invalid character";
in theory, you could drop //IGNORE
and simply test for a failed (empty) iconv
operation, but there might be other reasons for a iconv to fail than just invalid characters... I don't know. I would use the comparison method.
Here is what I do to detect and correct the encoding of strings not encoded in UTF-8 when that is what I am expecting:
$encoding = mb_detect_encoding($str, 'utf-8, iso-8859-1, ascii', true);
if (strcasecmp($encoding, 'UTF-8') !== 0) {
$str = iconv($encoding, 'utf-8', $str);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With