Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-8 and HTML entities

Tags:

php

utf-8

I try to eject text from Word .DOC file with PHP. All seems ok, but the only trouble is something like

СУДОВА БУХГАЛТЕРІЯ

instead of russian text. I've tried to use html_entity_decode and utf8_encode, but they didn't help. Is there any simple solution?

like image 823
Ximik Avatar asked Oct 11 '22 20:10

Ximik


1 Answers

html_entity_decode should work with the proper parameters (unless you’re using PHP 5.3.3 or later):

html_entity_decode($str, ENT_QUOTES, 'UTF-8')

This will convert the character references into UTF-8. Before PHP 5.3.3, the charset parameter’s default value was ISO-8859-1. In that case the cyrillic characters can’t be converted as the ISO 8859-1 character set doesn’t contain them.

like image 177
Gumbo Avatar answered Oct 18 '22 10:10

Gumbo