Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't RSS handle the ampersand?

Tags:

xml

encoding

rss

When I come across a broken RSS feed, the usual reason its all blown to pieces is because line 23 says "Sanford & Sons."

The most confusing thing is the fact that if you convert the & into &, all is well, even though your alternative still contains the problem character.

Why does RSS fail at rendering the ampersand (&) character by default?

like image 241
Sampson Avatar asked Jun 23 '09 00:06

Sampson


2 Answers

When a 'raw' & is seen, the interpreter is looking for one of the valid escaped & sequences (such as '&' ). When an invalid sequence is found it throws an error. That's all there is to it.

like image 200
Mitch Wheat Avatar answered Nov 15 '22 17:11

Mitch Wheat


Because rss is an XML-based format and in xml the ampersand (&) signifies the start of an xml entity. The parser is expecting something else there.

You could argue that it should be smart enough to know that the ampersand in "Sanford & Sons" is just an ampersand. But what about when you really want to show ampersand with text? Is "&pc; some custom (also invalid) entity, or should it interpret that as an ampersand also? What about "&"?

like image 44
Joel Coehoorn Avatar answered Nov 15 '22 17:11

Joel Coehoorn