when I read data from Stream API of twitter and then write to xmlfile.
But some special character like �
will cause error (I mean when I open that xmlfile in Chrome, Chrome said that there was an error at that character!)
I want to convert that encoded sequence (�
) into real character () before writing to xmlfile!
How to implement this?
-------------ADDED--------------
This is the XMLFile content:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<text>@carlyraejepsen would be a dream if you follow me, please follow me?, I love you so much you're my inspiration</text>
<text>someone please bring me a caramel apple and a mocha from black cat. i'll love you forever</text>
<text>“@G_MartinFlyKick: Marry me Juliet.I love you and that's all I really know.”����������</text>
<text>"I need to see a picture of him cuz Im trying to imagine you guys making love and all I see is u climbing on top of a big question mark"lmao</text>
<text>@District3music hi, I LOVE YOU follow me please? &lt;3 xx 23</text>
<text>RT @syardley_: So appreciative of my family and people I love, wouldn't be where I am without them. #thankful</text>
<text>#DISTRICT3HALLOWEENFOLLOWSPREE #DISTRICT3HALLOWEENFOLLOWSPREE #3EEKERFROMTHENETHERLANDS love you! Please follow ? @District3music x42</text>
<text>Arguably my favorite electronic music producer @Kluteuk is coming back to Toronto on Dec 22nd. So stoked. Guy has made so many tunes I LOVE.</text>
<text>The stakes are high, the water's rough, but this love is ours.</text>
<text>@NiallOfficial Answer me, I love you very much. Venezuela loves. jhgj</text>
<text>Love this shit http://t.co/qSP79NKx</text>
</root>
And here is error from Chrome:
This page contains the following errors:
error on line 5 at column 91: xmlParseCharRef: invalid xmlChar value 55357
Below is a rendering of the page up to the first error.
Decoding is the process of translating print into speech by rapidly matching a letter or combination of letters (graphemes) to their sounds (phonemes) and recognizing the patterns that make syllables and words. There is an area in the brain that deals with language processing and does this process automatically.
"Decode" is a song by American rock band Paramore for the soundtrack of the 2008 romantic fantasy film Twilight. It was written by group members Hayley Williams, Josh Farro, and Taylor York.
The character reference �
denotes a surrogate code point (U+D83D), so it would be wrong to try to convert it to a character. It is not a character, not even half a character.
You need to track back to the point where the reference was generated. The reason might be a character encoding confusion. In UTF-16, surrogate code units may appear but must be handled in pairs when the data is interpreted as characters and e.g. converted to another encoding or turned to character references.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With