Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How big is too big for an RSS feed XML file?

I'm implementing an RSS feed for a website and I don't understand certain things about the format/size/content of the XML file for the feed.

I'm initializing the site with the past data, which runs back to 1999 (there was no feed at any point before now), and only a couple hundred items will be added per year.

Is there some protocol for archiving, or can I just keep the one file and continue appending to it? I'd think that would be inefficient, as the aggregators have to download the whole thing (I assume).

So, what's the usual custom for this? Limit it to the last month? The current file with over 900 items is 1.5MB, and I'd expect 1 year's worth to be about 1/10th that in size or less.

Any pointers on this on what principles to use and how to implement it? I'm using PHP, but my data is complicated enough I rolled my own script to write the file (and it validates just fine), so I can't use a canned solution -- I need to understand what to implement in my own script.

like image 455
David-W-Fenton Avatar asked Mar 15 '11 22:03

David-W-Fenton


1 Answers

Most consumers of syndication feeds have the expectation that the feed will contain relatively recent content, with previously published content 'falling off' of the feed. How much content you maintain in the feed is usually based on the type of content you are publishing but as the size of your feed grows it can impact a feed clients ability to retrieve and parse your information.

If you truly want to publish a historical feed that is continually added to but never has content items removed, you may want to consider the following options (based on the needs of your consumers):

  1. Implement Feed Paging and Archiving, per RFC 5005 Section 3, as paged feeds can be useful when the number of entries is very large, infinite, or indeterminate. Clients can "page" through the feed, only accessing a subset of the feed's entries as necessary.
  2. Logically segment your content into multiple feeds, and provide auto-discovery to the feeds on your website.
  3. Implement a REST based service interface that allows consumers to retrieve and filter your content as an Atom or RSS formatted feed, with the default representation using some reasonable defaults.

Option 1 is a reasonable approach only if you know the type of feed clients that will be consuming your feed, as not all feed clients support pagination.

Option 2 is the most common one seen on public facing web sites, as most browsers and clients support auto-discovery, and you can provide both a full historical feed and a smaller more recent content feed (or segment in ways that make sense for your content).

Option 3 potentially allows you to provide the benefits of both of the first two options, plus you can provide multiple feed formats and rich filtering of your content. It is a very powerful way to expose feed content, but usually is only worth the effort if your consumers indicate a desire for tailoring the feed content they wish to consume.

While most rich feed clients will retrieve feed content asynchronously, clients that make synchronous (and potentially frequent) requests for your feed may experience timeout issues as the size of your feed increases.

Regardless of what direction you take, consider implementing Conditional GET on your feeds; and understand the potential consumers of your syndicated content in order to choose the strategy that fits best. See this answer when you consider which syndication feed format(s) you want to provide.

like image 188
Oppositional Avatar answered Oct 21 '22 04:10

Oppositional