I'm having a bit of trouble figuring out why HXT is replacing my DTD's. Firstly, here is my input file to be parsed:
<!DOCTYPE html>
<html>
<head>
<title>foo</title>
</head>
<body>
<h1>foo</h1>
</body>
</html>
and this is the output that I get:
<?xml version="1.0" encoding="US-ASCII"?>
<html>
<head>
<title>foo</title>
</head>
<body>
<h1>foo</h1>
</body>
</html>
Finally, here is a simplified version of the arrows I'm using:
start (App src dest) = runX $
readDocument [ withValidate no
, withSubstDTDEntities no
, withParseHTML yes
--, withTagSoup
]
src
>>>
this
>>>
writeDocument [ withIndent yes
, withSubstDTDEntities no
, withOutputHTML
--, withOutputEncoding "UTF-8"
]
dest
I apologize for the comments - I've been toying with different combinations of configs. I just can't seem to get HXT to not mess with DTDs, even with withSubstDTDEntities no
, withValidate no
, etc. I am getting a warning saying that HXT is ignoring my doctype declaration, but that's the only bit of insight I have. Can anyone please lend me a hand? Thank you in advance!
You have two problems
HXT only accepts one of the following three html doctypes
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"DTD/xhtml1-strict.dtd">
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"DTD/xhtml1-transitional.dtd">
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"DTD/xhtml1-frameset.dtd">
Using one of these will get rid of the warning about ignoring the dtd.
Second, add the following option to writeDocument
withAddDefaultDTD yes
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With