Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

convert HTML Character Entity Encoding in R

Tags:

Is there a way in R to convert HTML Character Entity Encodings?

I would like to convert HTML character entities like & to & or > to >

For Perl exists the package HTML::Entities which could do that, but I couldn't find something similar in R.

I also tried iconv() but couldn't get satisfying results. Maybe there is also a way using the XML package but I haven't figured it out yet.

like image 364
user625626 Avatar asked Feb 20 '11 21:02

user625626


1 Answers

Unescape xml/html values using xml2 package:

unescape_xml <- function(str){   xml2::xml_text(xml2::read_xml(paste0("<x>", str, "</x>"))) }  unescape_html <- function(str){   xml2::xml_text(xml2::read_html(paste0("<x>", str, "</x>"))) } 

Examples:

unescape_xml("3 &lt; x &amp; x &gt; 9") # [1] "3 < x & x > 9" unescape_html("&euro; 2.99") # [1] "€ 2.99" 
like image 66
Jeroen Ooms Avatar answered Oct 21 '22 12:10

Jeroen Ooms