Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert HTML entities like – to their character equivalents?

I am creating a file that is to be saved on a local user's computer (not rendered in a web browser).

I am currently using html_entity_decode, but this isn't converting characters like – (which is the n-dash) and was wondering what other function I should be using.

For example, when the file is imported into the software, instead of the ndash or just a - it shows up as –. I know I could use str_replace, but if it's happening with this character, it could happen with many others since the data is dynamic.

like image 245
Cofey Avatar asked Feb 02 '11 22:02

Cofey


People also ask

What is HTML entity decode?

HTML encoding converts characters that are not allowed in HTML into character-entity equivalents; HTML decoding reverses the encoding. For example, when embedded in a block of text, the characters < and > are encoded as &lt; and &gt; for HTTP transmission.

How do you show entities in HTML?

You have to use HTML character entities &lt; and &gt; in place of the < and > symbols so they aren't interpreted as HTML tags.

What is HTML &GT?

&gt; and &lt; is a character entity reference for the > and < character in HTML. It is not possible to use the less than (<) or greater than (>) signs in your file, because the browser will mix them with tags. for these difficulties you can use entity names( &gt; ) and entity numbers( &#60; ).


2 Answers

You need to define the target character set. &#8211; is not a valid character in the default ISO-8859-1 character set, so it's not decoded. Define UTF-8 as the output charset and it will decode:

echo html_entity_decode('&#8211;', ENT_NOQUOTES, 'UTF-8');

If at all possible, you should avoid HTML entities to begin with. I don't know where that encoded data comes from, but if you're storing it like this in the database or elsewhere, you're doing it wrong. Always store data UTF-8 encoded and only convert to HTML entities or otherwise escape for output when necessary.

like image 171
deceze Avatar answered Nov 15 '22 15:11

deceze


Try mb_convert_encoding():

$string = "n&ndash;dash";
$output = mb_convert_encoding($string, 'UTF-8', 'HTML-ENTITIES');
echo $output;
like image 32
Lèse majesté Avatar answered Nov 15 '22 15:11

Lèse majesté