Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to decode HTML Entities in C?

Tags:

I'm interested in unescaping text for example: \ maps to \ in C. Does anyone know of a good library?

As reference the Wikipedia List of XML and HTML Character Entity References.

like image 527
FelipeC Avatar asked Jul 04 '09 12:07

FelipeC


People also ask

How do I decrypt HTML code?

Wikipedia has a good expalanation of character encodings and how some characters should be represented in HTML. Load the HTML data to decode from a file, then press the 'Decode' button: Browse: Alternatively, type or paste in the text you want to HTML–decode, then press the 'Decode' button.

What is HTML entity decode?

HTML encoding converts characters that are not allowed in HTML into character-entity equivalents; HTML decoding reverses the encoding. For example, when embedded in a block of text, the characters < and > are encoded as &lt; and &gt; for HTTP transmission.

What does HTML decode do in C#?

HtmlDecode(String, TextWriter)Converts a string that has been HTML-encoded into a decoded string, and sends the decoded string to a TextWriter output stream.

What is &amp in HTML?

& is HTML for "Start of a character reference". &amp; is the character reference for "An ampersand". &current; is not a standard character reference and so is an error (browsers may try to perform error recovery but you should not depend on this).


1 Answers

For another open source reference in C to decoding these HTML entities you can check out the command line utility uni2ascii/ascii2uni. The relevant files are enttbl.{c,h} for entity lookup and putu8.c which down converts from UTF32 to UTF8.

uni2ascii

like image 160
Cameron Lowell Palmer Avatar answered Sep 19 '22 13:09

Cameron Lowell Palmer