Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Android: RSS parsing stops at special characters

I've searched a lot but haven't found a solution to why my rss reader stops at special charaters like æ ø å ' etc. The reader reads the feed until it runs into a special character - then it stops reading that element and continues to the next. So when I display the news in my app my text is cut off at the special character, it's very annoying! Surely it has something to do with encoding, but I just can't figure out what to do with my code.

This code works well with other feeds like http://www.fyens.dk/rss/sport which is in iso-8859-1 encoding. With this feed it displays special charaters with no problem. But if I try a feed like http://ob.dk/forum/rss.aspx?ForumID=3&Mode=0 which is UTF-8 the problem occur.

Any suggestions on how to solve this issue?

    try {
        //open an URL connection make GET to the server and 
        //take xml RSS data
        URL url = new URL("http://ob.dk/forum/rss.aspx?ForumID=3&Mode=0");
        HttpURLConnection conn = (HttpURLConnection) url.openConnection();

        if (conn.getResponseCode() == HttpURLConnection.HTTP_OK) {
            InputStream is = conn.getInputStream();

            //DocumentBuilderFactory, DocumentBuilder are used for 
            //xml parsing
            DocumentBuilderFactory dbf = DocumentBuilderFactory
                    .newInstance();
            DocumentBuilder db = dbf.newDocumentBuilder();



            //using db (Document Builder) parse xml data and assign
            //it to Element
            Document document = db.parse(is);
            Element element = document.getDocumentElement();

            //take rss nodes to NodeList
            NodeList nodeList = element.getElementsByTagName("item");

            if (nodeList.getLength() > 0) {
                for (int i = 0; i < nodeList.getLength(); i++) {

                    //take each entry (corresponds to <item></item> tags in 
                    //xml data

                    Element entry = (Element) nodeList.item(i);

                    Element _titleE = (Element) entry.getElementsByTagName(
                            "title").item(0);
                    Element _descriptionE = (Element) entry
                            .getElementsByTagName("description").item(0);
                    Element _pubDateE = (Element) entry
                            .getElementsByTagName("pubDate").item(0);
                    Element _linkE = (Element) entry.getElementsByTagName(
                            "link").item(0);

                    String _title = _titleE.getFirstChild().getNodeValue();
                    String _description = _descriptionE.getFirstChild().getNodeValue();
                    Date _pubDate = new Date(_pubDateE.getFirstChild().getNodeValue());
                    String _link = _linkE.getFirstChild().getNodeValue();

                    int time = _pubDate.getHours()-2;

                    _pubDate.setHours(time);

                            RssItem rssItem = new RssItem("OB.dk: "+_title, _description,
                                    _pubDate, "http://www.google.com/gwt/x?u="+_link);

                            rssItems.add(rssItem);

                    }



            }

        }
    } catch (Exception e) {
        e.printStackTrace();
    }
like image 883
bengaard Avatar asked Feb 02 '26 06:02

bengaard


1 Answers

I think this will help you:

http://www.developerfeed.com/xml/common/issues/xml-parsing-failing-due-encoding-not-being-utf-8

Mvh.

like image 69
gosr Avatar answered Feb 04 '26 22:02

gosr



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!