Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

policy for polling rss

I have an application that polls several rss sources on the web.

What is the etiquette when polling other's web servers. How frequently to poll, etc?

What are the best practices?

like image 932
flybywire Avatar asked Jun 02 '09 13:06

flybywire


People also ask

How often to poll RSS Feed?

Once an hour is a frequency I've heard. Show activity on this post. Rss has a ttl setting in it so really you should only poll when the TTL expires.

What does enable rules on all messages downloaded from RSS Feeds mean?

Did you know that you can enable Rules on your RSS feeds? For example, you can move messages of a specific topic from your RSS Feed folder to a folder that you check more frequently. To do so, you need to select the Enable rules on all RSS Feeds check box in the Rules and Alerts dialog box accessed from the Tools menu.

Does RSS Feed update automatically?

RSS stands for Really Simple Syndication. It refers to files easily read by a computer called XML files that automatically update information.


2 Answers

  1. Make use of HTTP cache. Send Etag and LastModified headers. Recognize 304 Not modified response. This way you can save a lot of bandwidth. Additionally some scripts recognize the LastModified header and return only partial contents (ie. only the two or three newest items instead of all 30 or so).

  2. Don’t poll RSS from services that supports RPC Ping (or other PUSH service, such as PubSubHubbub). I.e. if you’re receiving PUSH notifications from a service, you don’t have to poll the data in the standard interval — do it once a day to check if the mechanism still works or not (ping can be disabled, reconfigured, damaged, etc). This way you can fetch RSS only on receiving notification, not every hour or so.

  3. Check the TTL (in RSS) or cache control headers (Expires in ATOM), and don’t fetch until resource expires.

  4. Try to adapt to frequency of new items in each single RSS feed. If in the past week there were only two updates in particular feed, don’t fetch it more than once a day. AFAIR Google Reader does that.

  5. Lower the rate at night hours or other time when the traffic on your site is low.

  6. At last, do it once a hour. ;)

like image 52
Maciej Łebkowski Avatar answered Sep 20 '22 03:09

Maciej Łebkowski


Google's FeedFetcher claims it polls rss feed slightly less than once per hour.

From: http://code.google.com/apis/ajaxfeeds/documentation/

Feed Crawl Frequency

As the Google AJAX Feed API uses Feedfetcher, feed data from the AJAX Feed API may not always be up to date. The Google feed crawler ("Feedfetcher") retrieves feeds from most sites less than once every hour. Some frequently updated sites may be refreshed more often.

like image 27
Jonathan Fingland Avatar answered Sep 19 '22 03:09

Jonathan Fingland