Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decode � to real character

Tags:

xml

unicode

when I read data from Stream API of twitter and then write to xmlfile.

But some special character like � will cause error (I mean when I open that xmlfile in Chrome, Chrome said that there was an error at that character!)

I want to convert that encoded sequence (�) into real character () before writing to xmlfile!

How to implement this?

-------------ADDED--------------

This is the XMLFile content:

<?xml version="1.0" encoding="UTF-8"?>
<root>
<text>@carlyraejepsen would be a dream if you follow me, please follow me?, I love you so much you're my inspiration</text>
<text>someone please bring me a caramel apple and a mocha from black cat. i'll love you forever</text>
<text>“@G_MartinFlyKick: Marry me Juliet.I love you and that's all I really know.”&#55357;&#56834;&#55357;&#56834;&#55357;&#56834;&#55357;&#56834;&#55357;&#56834;</text>
<text>"I need to see a picture of him cuz Im trying to imagine you guys making love and all I see is u climbing on top of a big question mark"lmao</text>
<text>@District3music hi, I LOVE YOU follow me please? &amp;lt;3 xx 23</text>
<text>RT @syardley_: So appreciative of my family and people I love, wouldn't be where I am without them. #thankful</text>
<text>#DISTRICT3HALLOWEENFOLLOWSPREE #DISTRICT3HALLOWEENFOLLOWSPREE #3EEKERFROMTHENETHERLANDS love you! Please follow ? @District3music x42</text>
<text>Arguably my favorite electronic music producer @Kluteuk is coming back to Toronto on Dec 22nd. So stoked. Guy has made so many tunes I LOVE.</text>
<text>The stakes are high, the water's rough, but this love is ours.</text>
<text>@NiallOfficial Answer me, I love you very much. Venezuela loves. jhgj</text>
<text>Love this shit http://t.co/qSP79NKx</text>
</root>

And here is error from Chrome:

This page contains the following errors:

error on line 5 at column 91: xmlParseCharRef: invalid xmlChar value 55357
Below is a rendering of the page up to the first error.
like image 412
Songokute Avatar asked Oct 31 '12 18:10

Songokute


People also ask

What do you mean by decoding?

Decoding is the process of translating print into speech by rapidly matching a letter or combination of letters (graphemes) to their sounds (phonemes) and recognizing the patterns that make syllables and words. There is an area in the brain that deals with language processing and does this process automatically.

What Twilight Is Decode in?

"Decode" is a song by American rock band Paramore for the soundtrack of the 2008 romantic fantasy film Twilight. It was written by group members Hayley Williams, Josh Farro, and Taylor York.


1 Answers

The character reference &#55357; denotes a surrogate code point (U+D83D), so it would be wrong to try to convert it to a character. It is not a character, not even half a character.

You need to track back to the point where the reference was generated. The reason might be a character encoding confusion. In UTF-16, surrogate code units may appear but must be handled in pairs when the data is interpreted as characters and e.g. converted to another encoding or turned to character references.

like image 50
Jukka K. Korpela Avatar answered Oct 05 '22 03:10

Jukka K. Korpela