Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Syndication RSS Reader fails because of invalid XML?

I have written a piece of code which uses System.ServiceModel.Syndication library to parse RSS feeds.

The problem is that for one of my feeds (which is provided by facebook) I get the following line in the end of the response and Syndication library fails to parse the feed because it says the text is invalid XML and it says it's because of this part:

  ...
  </channel>
  <access:restriction relationship="deny" xmlns:access="http://www.bloglines.com/about/specs/fac-1.0" />
</rss>

I'm sure there is something I'm missing here, because both the feed and the parsing library are from huge companies (Facebook and Microsoft respectively).

Can any of you help? Or alternatively a better parser that doesn't rely on the validity of XML?

P.S. Here is my RSS feed url:
http://www.facebook.com/feeds/page.php?id=202296766494181&format=rss20

Here is how I'm parsing the feed response:

var stringReader = new StringReader(resp);
var xreader = XmlReader.Create(stringReader);
var xfeed = System.ServiceModel.Syndication.SyndicationFeed.Load(xreader);

and the exception I get:

System.Xml.XmlException: 'Element' is an invalid XmlNodeType. Line 282, position 4.

at System.Xml.XmlReader.ReadEndElement() ...

like image 207
Mo Valipour Avatar asked Feb 23 '23 16:02

Mo Valipour


1 Answers

It seems the SyndicationFeed is having a problem with the access:restriction element used by facebook. See recent thread on http://social.msdn.microsoft.com/Forums/ar/xmlandnetfx/thread/7045dc1c-1bd9-409a-9568-543e74f4578d

Michael Sun (MSFT) wrote "Just saw Martin's post! Very helpful! I also did some research about the issue. The element is from Bloglines, http://www.bloglines.com/index.html. It sounds like an extension facebook is using for its RSS 2.0 feeds, http://www.feedforall.com/access-namespace.htm. From this article, it seems Rss20FeedFormatter is not the only one which does not support the elements.

I agree with Martin to use XDocument (LINQ to XML) to parse the RSS feed. Or if you are building some large app via C#, the Facebook C# SDK can be helpful as well, http://facebooksdk.codeplex.com/"

Edit :

It seems however that the Atomfeed is not suffering from this problem. So easiest solution would be to use this link (http://www.facebook.com/feeds/page.php?id=202296766494181&format=atom10). Thus changing the format parameter from rss20 to atom10

    HttpWebRequest req = WebRequest.Create(@"http://www.facebook.com/feeds/page.php?id=202296766494181&format=atom10") as HttpWebRequest;
        req.UserAgent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)";
        using (Stream responseStream = req.GetResponse().GetResponseStream())
        {
            using (XmlReader xr = XmlReader.Create(responseStream))
            {
                SyndicationFeed feed = SyndicationFeed.Load(xr);
            }
        }

Other alternative is to write an inherited XMLTextReader overiding the ReadEndElement Method, by skipping any Element after the channel closing tag.(Do mind that code below is without any guarantee as I consider myself still a novice c# developer. Please feel free to correct any possible mistakes)

public class FaceBookReader : XmlTextReader
{
    public FaceBookReader(Stream stream)
        : base(stream) { }

    public FaceBookReader(String url)
        : base(url) { }

    public override void ReadEndElement()
    {
        string elementTag = this.LocalName.ToLower();

        base.ReadEndElement();

        // When we've read the channel End Tag, we're going to skip all tags
        // until we reach the a new Ending Tag which should be that of rss
        if (elementTag == "channel")
        {
            while (base.IsStartElement())
            {
                base.Skip();
            }
        }
    }
}
like image 109
tazyDevel Avatar answered Mar 08 '23 07:03

tazyDevel