This is something that should be simple but I can't figure out.
The site in question is UTF-8 encoded.
A customer has been having trouble filling out a form on our website. Here is example data they have entered.
SPICER-SMITHS LOST
It looks like a regular string, but when you copy that string into an app like notepad++ you'll see a "?" appear in the word "SMITHS" ("SMITH?S").
The script sanitizes the field and goes the extra step of removing the following characters:
"\r\n", "\n", "\r", "\t", "\0", "\x0B"
.
It's not catching this hidden character though.
Does anybody know what's going on here?
EDIT: I'm using php. Here is the function that I use to sanitize the field:
function strip_hidden_chars($str)
{
$chars = array("\r\n", "\n", "\r", "\t", "\0", "\x0B");
$str = str_replace($chars," ",$str);
return preg_replace('/\s+/',' ',$str);
}
EDIT 2: @thaJeztah led me to the answer. The string I was testing was the output from our support ticket after the customer had copied and pasted it from whatever application she is using. The actual input was
SPICER-SMITH’S
You may try to have a look here; remove control characters?
Remove control characters from php String
this also work as well
$chars = array("\r\n", '\\n', '\\r', "\n", "\r", "\t", "\0", "\x0B");
str_replace($chars,"<br>",$data);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With