I'm trying to parse Rss2, Atom feeds using SyndicationFeed objects. But I'm getting XmlExceptions while parsing DateTime field like pubDate
2012-01-17 08:01:06
public static List<SyndicationItem> getRssData(string url)
{
List<SyndicationItem> list = new List<SyndicationItem>();
WebClient client = new WebClient();
try
{
SyndicationFeed feed = SyndicationFeed.Load(XmlReader.Create(url));
list = (from item in feed.Items select item).ToList();
}
catch (Exception e)
{
throw e;
}
return list;
}
The url link http://news.163.com/special/00011K6L/rss_newstop.xml
<item id="2">
<title>...</title>
<link>...</link>
<description>......</description>
<pubDate>2012-01-17 12:09:29</pubDate><-----Exception
</item>
Is there a better way to achieve this? Please help. Thanks.
There is a workaround RSS20FeedFormatter throws exception trying to read some DateTime formats.
To work around this problem, create a custom XML reader that recognizes different date formats. The following is an example of a custom XML reader:
XmlReader r = new MyXmlReader(url);
SyndicationFeed feed = SyndicationFeed.Load(r);
Rss20FeedFormatter rssFormatter = feed.GetRss20Formatter();
XmlTextWriter rssWriter = new XmlTextWriter("rss.xml", Encoding.UTF8);
rssWriter.Formatting = Formatting.Indented;
rssFormatter.WriteTo(rssWriter);
rssWriter.Close();
..and class used in previous code:
class MyXmlReader : XmlTextReader
{
private bool readingDate = false;
const string CustomUtcDateTimeFormat = "ddd MMM dd HH:mm:ss Z yyyy"; // Wed Oct 07 08:00:07 GMT 2009
public MyXmlReader(Stream s) : base(s) { }
public MyXmlReader(string inputUri) : base(inputUri) { }
public override void ReadStartElement()
{
if (string.Equals(base.NamespaceURI, string.Empty, StringComparison.InvariantCultureIgnoreCase) &&
(string.Equals(base.LocalName, "lastBuildDate", StringComparison.InvariantCultureIgnoreCase) ||
string.Equals(base.LocalName, "pubDate", StringComparison.InvariantCultureIgnoreCase)))
{
readingDate = true;
}
base.ReadStartElement();
}
public override void ReadEndElement()
{
if (readingDate)
{
readingDate = false;
}
base.ReadEndElement();
}
public override string ReadString()
{
if (readingDate)
{
string dateString = base.ReadString();
DateTime dt;
if(!DateTime.TryParse(dateString,out dt))
dt = DateTime.ParseExact(dateString, CustomUtcDateTimeFormat, CultureInfo.InvariantCulture);
return dt.ToUniversalTime().ToString("R", CultureInfo.InvariantCulture);
}
else
{
return base.ReadString();
}
}
}
Basically, that RSS feed is invalid. If you look at the RSS 2.0 specification it states that:
All date-times in RSS conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred).
The string "2012-01-17 12:09:29" doesn't comply to the "Date and Time" part of RFC 822. It should be "17 01 2012 12:09:29" or something similar.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With