I try to eject text from Word .DOC file with PHP. All seems ok, but the only trouble is something like
СУДОВА БУХГАЛТЕРІЯ
instead of russian text. I've tried to use html_entity_decode and utf8_encode, but they didn't help. Is there any simple solution?
html_entity_decode
should work with the proper parameters (unless you’re using PHP 5.3.3 or later):
html_entity_decode($str, ENT_QUOTES, 'UTF-8')
This will convert the character references into UTF-8. Before PHP 5.3.3, the charset parameter’s default value was ISO-8859-1
. In that case the cyrillic characters can’t be converted as the ISO 8859-1 character set doesn’t contain them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With