Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert HTML to XHTML? [closed]

Tags:

I need to convert HTML documents into valid XML, preferably XHTML. What's the best way to do this? Does anybody know a toolkit/library/sample/...whatever that helps me to get that task done?

To be a bit more clear here, my application has to do the conversion automatically at runtime. I don't look for a tool that helps me to move some pages to XHTML manually.

like image 628
JRoppert Avatar asked Sep 26 '08 10:09

JRoppert


1 Answers

Convert from HTML to XML with HTML Tidy

Downloadable Binaries

JRoppert, For your need, i guess you might want to look at the Sources

c:\temp>tidy -help tidy [option...] [file...] [option...] [file...] Utility to clean up and pretty print HTML/XHTML/XML see http://tidy.sourceforge.net/  Options for HTML Tidy for Windows released on 14 February 2006:  File manipulation -----------------  -output <file>, -o  write output to the specified <file>  <file>  -config <file>      set configuration options from the specified <file>  -file <file>, -f    write errors to the specified <file>  <file>  -modify, -m         modify the original input files  Processing directives ---------------------  -indent, -i         indent element content  -wrap <column>, -w  wrap text at the specified <column>. 0 is assumed if  <column>            <column> is missing. When this option is omitted, the                      default of the configuration option "wrap" applies.  -upper, -u          force tags to upper case  -clean, -c          replace FONT, NOBR and CENTER tags by CSS  -bare, -b           strip out smart quotes and em dashes, etc.  -numeric, -n        output numeric rather than named entities  -errors, -e         only show errors  -quiet, -q          suppress nonessential output  -omit               omit optional end tags  -xml                specify the input is well formed XML  -asxml, -asxhtml    convert HTML to well formed XHTML  -ashtml             force XHTML to well formed HTML  -access <level>     do additional accessibility checks (<level> = 0, 1, 2, 3).                      0 is assumed if <level> is missing.  Character encodings -------------------  -raw                output values above 127 without conversion to entities  -ascii              use ISO-8859-1 for input, US-ASCII for output  -latin0             use ISO-8859-15 for input, US-ASCII for output  -latin1             use ISO-8859-1 for both input and output  -iso2022            use ISO-2022 for both input and output  -utf8               use UTF-8 for both input and output  -mac                use MacRoman for input, US-ASCII for output  -win1252            use Windows-1252 for input, US-ASCII for output  -ibm858             use IBM-858 (CP850+Euro) for input, US-ASCII for output  -utf16le            use UTF-16LE for both input and output  -utf16be            use UTF-16BE for both input and output  -utf16              use UTF-16 for both input and output  -big5               use Big5 for both input and output  -shiftjis           use Shift_JIS for both input and output  -language <lang>    set the two-letter language code <lang> (for future use)  Miscellaneous -------------  -version, -v        show the version of Tidy  -help, -h, -?       list the command line options  -xml-help           list the command line options in XML format  -help-config        list all configuration options  -xml-config         list all configuration options in XML format  -show-config        list the current configuration settings  Use --blah blarg for any configuration option "blah" with argument "blarg"  Input/Output default to stdin/stdout respectively Single letter options apart from -f may be combined as in:  tidy -f errs.txt -imu foo.html For further info on HTML see http://www.w3.org/MarkUp 
like image 86
prakash Avatar answered Sep 20 '22 13:09

prakash