Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

utf-8 and htmlentities in RSS feeds

Tags:

php

utf-8

rss

I'm writing some RSS feeds in PHP and stuggling with character-encoding issues. Should I utf8_encode() before or after htmlentities() encoding? For example, I've got both ampersands and Chinese characters in a description element, and I'm not sure which of these is proper:

$output = utf8_encode(htmlentities($source)); or
$output = htmlentities(utf8_encode($source));

And why?

like image 208
Doug Kaye Avatar asked Nov 21 '08 02:11

Doug Kaye


People also ask

Can RSS contain HTML?

The RSS 2.0 specification says that you can include HTML in the description element so long as you properly encode the markup. Save this answer.

Does RSS use XML?

RSS formats are specified using a generic XML file. ") icon was decided upon by several major web browsers. RSS feed data is presented to users using software called a news aggregator and the passing of content is called web syndication.

What is RSS feed in HTML?

RSS stands for Really Simple Syndication. RSS allows you to syndicate your site content. RSS defines an easy way to share and view headlines and content. RSS files can be automatically updated. RSS allows personalized views for different sites.


2 Answers

First: The utf8_encode function converts from ISO 8859-1 to UTF-8. So you only need this function, if your input encoding/charset is ISO 8859-1. But why don’t you use UTF-8 in the first place?

Second: You don’t need htmlentities. You just need htmlspecialchars to replace the special characters by character references. htmlentities would replace “too much” characters that can be encoded directly using UTF-8. Important is that you use the ENT_QUOTES quote style to replace the single quotes as well.

So my proposal:

// if your input encoding is ISO 8859-1
htmlspecialchars(utf8_encode($string), ENT_QUOTES)

// if your input encoding is UTF-8
htmlspecialchars($string, ENT_QUOTES, 'UTF-8')
like image 65
Gumbo Avatar answered Oct 08 '22 01:10

Gumbo


It's important to pass the character set to the htmlentities function, as the default is ISO-8859-1:

utf8_encode(htmlentities($source,ENT_COMPAT,'utf-8'));

You should apply htmlentities first as to allow utf8_encode to encode the entities properly.

(EDIT: I changed from my opinion before that the order didn't matter based on the comments. This code is tested and works well).

like image 29
Eran Galperin Avatar answered Oct 08 '22 01:10

Eran Galperin