Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NSXMLParser shreds umlauts (ä, ö, ü)

I use NSXMLParser for parsing XML documents of a server. They are encoded as UTF8. My problem is, that NSXMLParser breaks at umlauts (ä, ö, ü) and starts a new element.

For example:

Lösen -- NSXMLParser ---> L + ösen

How do I get NSXMLParser to read my umlaut words completely, as every other word.

Regards

like image 997
Stefan Avatar asked Dec 13 '22 03:12

Stefan


2 Answers

Sorry but based on your comment on the original question (foundCharacters receiving the text in two calls) the parser is behaving perfectly well. See the "Discussion" section for the parser:foundCharacters: method quoted below:

The parser object may send the delegate several parser:foundCharacters: messages to report the characters of an element. Because string may be only part of the total character content for the current element, you should append it to the current accumulation of characters until the element changes.

As you can see the parser is free to pass your delegate the characters in as many chunks as it sees fit.

like image 160
imaginaryboy Avatar answered Dec 30 '22 17:12

imaginaryboy


foundCharacters: is not delinited by tags, you need to concatentate the characters passed in unti lthe next call to didEndElement.

like image 23
Rog Avatar answered Dec 30 '22 16:12

Rog