Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't I parse a XML file using QXmlStreamReader from Qt?

I'm trying to figure out how QXmlStreamReader works for a C++ application I'm writing. The XML file I want to parse is a large dictionary with a convoluted structure and plenty of Unicode characters so I decided to try a small test case with a simpler document. Unfortunately, I hit a wall. Here's the example xml file:

<?xml version="1.0" encoding="UTF-8" ?>
<persons>
    <person>
        <firstname>John</firstname>
        <surname>Doe</surname>
        <email>[email protected]</email>
        <website>http://en.wikipedia.org/wiki/John_Doe</website>
    </person>
    <person>
        <firstname>Jane</firstname>
        <surname>Doe</surname>
        <email>[email protected]</email>
        <website>http://en.wikipedia.org/wiki/John_Doe</website>
    </person>
    <person>
        <firstname>Matti</firstname>
        <surname>Meikäläinen</surname>
        <email>[email protected]</email>
        <website>http://fi.wikipedia.org/wiki/Matti_Meikäläinen</website>
    </person>
</persons>

...and I'm trying to parse it using this code:

int main(int argc, char *argv[])
{
    if (argc != 2) return 1;

    QString filename(argv[1]);
    QTextStream cout(stdout);
    cout << "Starting... filename: " << filename << endl;

    QFile file(filename);
    bool open = file.open(QIODevice::ReadOnly | QIODevice::Text);
    if (!open) 
    {
        cout << "Couldn't open file" << endl;
        return 1;
    }
    else 
    {
        cout << "File opened OK" << endl;
    }

    QXmlStreamReader xml(&file);
    cout << "Encoding: " << xml.documentEncoding().toString() << endl;

    while (!xml.atEnd() && !xml.hasError()) 
    {
        xml.readNext();
        if (xml.isStartElement())
        {
            cout << "element name: '" << xml.name().toString() << "'" 
                << ", text: '" << xml.text().toString() << "'" << endl;
        }
        else if (xml.hasError())
        {
            cout << "XML error: " << xml.errorString() << endl;
        }
        else if (xml.atEnd())
        {
            cout << "Reached end, done" << endl;
        }
    }

    return 0;
}

...then I get this output:

C:\xmltest\Debug>xmltest.exe example.xml
Starting... filename: example.xml
File opened OK
Encoding:
XML error: Encountered incorrectly encoded content.

What happened? This file couldn't be simpler and it looks consistent to me. With my original file I also get a blank entry for the encoding, the entries' names() are displayed, but alas, the text() is also empty. Any suggestions greatly appreciated, personally I'm thorougly mystified.

like image 748
neuviemeporte Avatar asked Nov 17 '10 03:11

neuviemeporte


2 Answers

I'm answering this myself as this problem was related to three issues, two of which were brought up by the responses.

  1. The file actually wasn't UTF-8 encoded. I changed the encoding to iso-8859-1 and the encoding warning disappeared.
  2. The text() function doesn't work as I expected. I have to use readElementText() to read the entries' contents.
  3. When I try to readElementText() on an element that doesn't contain text, like the top-level <persons> in my case, the parser returns an "Expected character data" error and the parsing is interrupted. I find this behaviour strange (in my opinion returning an empty string and continuing would be better) but I guess as long as the specification is known, I can work around it and avoid calling this function on every entry.

The relevant code section that works as expected now looks like this:

while (!xml.atEnd() && !xml.hasError()) 
{
    xml.readNext();
    if (xml.isStartElement())
    {
        QString name = xml.name().toString();
        if (name == "firstname" || name == "surname" || 
            name == "email" || name == "website")
        {
            cout << "element name: '" << name  << "'" 
                         << ", text: '" << xml.readElementText() 
                         << "'" << endl;
        }
    }
}
if (xml.hasError())
{
    cout << "XML error: " << xml.errorString() << endl;
}
else if (xml.atEnd())
{
    cout << "Reached end, done" << endl;
}
like image 103
neuviemeporte Avatar answered Nov 15 '22 22:11

neuviemeporte


The file is not UTF-8 encoded. Change the encoding to iso-8859-1 and it will parse without error.

<?xml version="1.0" encoding="iso-8859-1" ?>
like image 24
baysmith Avatar answered Nov 15 '22 23:11

baysmith