Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problems Reading RSS with C# and .net 3.5

I have been attempting to write some routines to read RSS and ATOM feeds using the new routines available in System.ServiceModel.Syndication, but unfortunately the Rss20FeedFormatter bombs out on about half the feeds I try with the following exception:

An error was encountered when parsing a DateTime value in the XML.

This seems to occur whenever the RSS feed expresses the publish date in the following format:

Thu, 16 Oct 08 14:23:26 -0700

If the feed expresses the publish date as GMT, things go fine:

Thu, 16 Oct 08 21:23:26 GMT

If there's some way to work around this with XMLReaderSettings, I have not found it. Can anyone assist?

like image 524
dan90266 Avatar asked Oct 16 '08 21:10

dan90266


1 Answers

Based on the workaround posted in the bug report to Microsoft about this I made an XmlReader specifically for reading SyndicationFeeds that have non-standard dates.

The code below is slightly different than the code in the workaround at Microsoft's site. It also takes Oppositional's advice on using the RFC 1123 pattern.

Instead of simply calling XmlReader.Create() you need to create the XmlReader from a Stream. I use the WebClient class to get that stream:

WebClient client = new WebClient();
using (XmlReader reader = new SyndicationFeedXmlReader(client.OpenRead(feedUrl)))
{
    SyndicationFeed feed = SyndicationFeed.Load(reader);
    ....
    //do things with the feed
    ....
}

Below is the code for the SyndicationFeedXmlReader:

public class SyndicationFeedXmlReader : XmlTextReader
{
    readonly string[] Rss20DateTimeHints = { "pubDate" };
    readonly string[] Atom10DateTimeHints = { "updated", "published", "lastBuildDate" };
    private bool isRss2DateTime = false;
    private bool isAtomDateTime = false;

    public SyndicationFeedXmlReader(Stream stream) : base(stream) { }

    public override bool IsStartElement(string localname, string ns)
    {
        isRss2DateTime = false;
        isAtomDateTime = false;

        if (Rss20DateTimeHints.Contains(localname)) isRss2DateTime = true;
        if (Atom10DateTimeHints.Contains(localname)) isAtomDateTime = true;

        return base.IsStartElement(localname, ns);
    }

    public override string ReadString()
    {
        string dateVal = base.ReadString();

        try
        {
            if (isRss2DateTime)
            {
                MethodInfo objMethod = typeof(Rss20FeedFormatter).GetMethod("DateFromString", BindingFlags.NonPublic | BindingFlags.Static);
                Debug.Assert(objMethod != null);
                objMethod.Invoke(null, new object[] { dateVal, this });

            }
            if (isAtomDateTime)
            {
                MethodInfo objMethod = typeof(Atom10FeedFormatter).GetMethod("DateFromString", BindingFlags.NonPublic | BindingFlags.Instance);
                Debug.Assert(objMethod != null);
                objMethod.Invoke(new Atom10FeedFormatter(), new object[] { dateVal, this });
            }
        }
        catch (TargetInvocationException)
        {
            DateTimeFormatInfo dtfi = CultureInfo.CurrentCulture.DateTimeFormat;
            return DateTimeOffset.UtcNow.ToString(dtfi.RFC1123Pattern);
        }

        return dateVal;

    }

}

Again, this is copied almost exactly from the workaround posted on the Microsoft site in the link above. ...except that this one works for me, and the one posted at Microsoft did not.

NOTE: One bit of customization you may need to do is in the two arrays at the start of the class. Depending on any extraneous fields your non-standard feed might add, you may need to add more items to those arrays.

like image 122
CleverPatrick Avatar answered Sep 27 '22 21:09

CleverPatrick