Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting HTML to XML

Tags:

html

xml

I have got hundereds of HTML files that need to be conveted in XML. We are using these HTML to serve contents for applications but now we have to serve these contents as XML.

HTML files are contains, tables, div's, image's, p's, b or strong tags, etc..

I googled and found some applications but i couldn't achive yet.

Could you suggest a way to convert these file contents to XML?

like image 942
bahadir arslan Avatar asked May 06 '12 20:05

bahadir arslan


People also ask

Is an HTML file a XML file?

HTML and XML are related to each other, where HTML displays data and describes the structure of a webpage, whereas XML stores and transfers data. HTML is a simple predefined language, while XML is a standard language that defines other languages.

Can I write HTML in XML?

You can include HTML content. One possibility is encoding it in BASE64 as you have mentioned. Another might be using CDATA tags. just remember that XML and CDATA preserve white-space.


1 Answers

I was successful using tidy command line utility. On linux I installed it quickly with apt-get install tidy. Then the command:

tidy -q -asxml --numeric-entities yes source.html >file.xml

gave an xml file, which I was able to process with xslt processor. However I needed to set up xhtml1 dtds correctly.

This is their homepage: html-tidy.org (and the legacy one: HTML Tidy)

like image 161
Jarekczek Avatar answered Oct 27 '22 05:10

Jarekczek