I have an XHTML document being passed to a PHP app via Greasemonkey AJAX. The PHP app uses UTF8. If I output the POST content straight back to a textarea in the AJAX receiving div, everything is still properly encoded in UTF8.
When I try to parse using XPath
$dom = new DOMDocument();
$dom->loadHTML($raw2);
$xpath = new DOMXPath($dom);
$query = '//td/text()';
$nodes = $xpath->query($query);
foreach($nodes as $node) {
var_dump($node->wholeText);
}
dumped strings are not utf8. How do I force DOM/XPath to use UTF8?
XPATH’s translate function uses a 1 to 1 matching strategy depending on the order of things (above, A translates to a). So we build our sanitized versions of the text based on the length of our UTF-8 strings which contain all those odd characters we will come across in the app.
Now in all the forms find the table with id ‘tbl_testdm’. Within the table go to a specific row and column. Within the cell, if there are multiple inputs, then find an input where value = ‘Open RFS’, and this will give us the final XPath of the field. Assume that your intended web element lies in the Panel Table and has some common text.
utf 8 - Changing PowerShell's default output encoding to UTF-8 - Stack Overflow By default, when you redirect the output of a command to a file or pipe it into something else in PowerShell, the encoding is UTF-16, which isn't useful.
If you need to force UTF-8 encoding across your website, here’s how you do it! All you have to do is add this code into your .htaccess file and save it! Like all file changes, it can take some time before it updates everywhere, but you can speed up the process on your devices simply by clearing your browser’s cache.
I had the same problem and I couldn't use tidy in my webserver. I found this solution and it worked fine:
$html = mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"); $dom = new DomDocument(); $dom->loadHTML($html);
A bit late in the game, but perhaps it helps someone...
The problem might be in the output, and not in the dom/xpath object itself.
If you would output the nodeValue directly, you would get corrupted characters e.g.:
ìÂÂì ë¹Â디ì¤
ìì ë¹ë””ì¤ í°ì íì¤
You have to load your dom object with the second param "utf-8", new \DomDocument('1.0', 'utf-8')
, but still when you print the dom node list/element value you get broken characters:
echo $contentItem->item($index)->nodeValue
you have to wrap it up with utf8_decode:
echo utf8_decode($contentItem->item($index)->nodeValue)
//output: 者不終朝而會,愚者可浹旬而學
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With