Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP URL Encoding/Decoding Pretty Quotes across form field's %u2019

For some reason, after submitting a string like this Jack’s Spindle from a text form to php, I get:

Jack%u2019s Spindle

This is not what PHP's urlencode() would do, which would be Jack%92s+Spindle or rawurlencode() = Jack%92s%20Spindle

Thus, urldecode() and the raw version don't work to decode that string... Is there another function for such strings?

--

Also, Jack’s Spindle would be the HTML-safe way to encode the above, but urlencode() and raw* for that yields: Jack%26%238217%3Bs+Spindle and Jack%26%238217%3Bs%20Spindle respectively...

Where is the %u2019 coming from? What does it represent? How do you get it back to just that innoculous apostrophe?

like image 274
ina Avatar asked Dec 22 '22 00:12

ina


1 Answers

Well, only you can tell us from where that came from. From are you getting your text and which transformations is it being submitted to? I confess I haven't seen that encoding strategy yet.

That said, it's very similar to the way Javascript encodes UTF-16 code units: \uXXXX where each X represents a hexadecimal character. To convert it to HTML entities, you could do:

preg_replace('/%u([a-fA-F0-9]{4})/', '&#x\\1;', $string)
like image 109
Artefacto Avatar answered Dec 24 '22 14:12

Artefacto