I'm trying to write some application, that performs analysis of data, stored in pretty big XML files (from 10 to 800MB). Each set of data is stored as single tag, with concrete data specified as attrobutes. I'm currently saxParse from HaXml, and I'm not satisfied with memory usage during work with it. On parsing of 15Mb XML file it consumes more than 1Gb of memory, although I tried to not to store data in the lists, and process it immediately. I use following code: <pre class="prettyprint"><code>importOneFile file proc ioproc = do xml <- readFile file let (sxs, res) = saxParse file $ stripUnicodeBOM xml case res of Just str -> putStrLn $ "Error: " ++ str; Nothing -> forM_ sxs (ioproc . proc . (extractAttrs "row")) </code></pre> where 'proc' - procedure, that performs conversion of data from attributes into record, and 'ioproc' - procedure, that performs some IO action - output to screen, storing in database, etc. How i can decrease memory consumption during XML parsing? Should switching to another XML parser help? Update: and which parser supports for different input encodings - utf-8, utf-16, utf-32, etc.?

If you're willing to assume that your inputs are valid, consider looking at TagSoup or Text.XML.Light from the Galois folks. These take strings as input, so you can (indirectly) feed them anything Data.Encoding understands, namely <ul> <li>ASCII</li> <li>UTF8 </li> <li>UTF16</li> <li>UTF32</li> <li>KOI8R</li> <li>KOI8U</li> <li>ISO88591</li> <li>GB18030</li> <li>BootString</li> <li>ISO88592</li> <li>ISO88593</li> <li>ISO88594</li> <li>ISO88595</li> <li>ISO88596</li> <li>ISO88597</li> <li>ISO88598</li> <li>ISO88599</li> <li>ISO885910</li> <li>ISO885911</li> <li>ISO885913</li> <li>ISO885914</li> <li>ISO885915</li> <li>ISO885916</li> <li>CP1250</li> <li>CP1251</li> <li>CP1252</li> <li>CP1253</li> <li>CP1254</li> <li>CP1255</li> <li>CP1256</li> <li>CP1257</li> <li>CP1258</li> <li>MacOSRoman</li> <li>JISX0201</li> <li>JISX0208</li> <li>ISO2022JP</li> <li>JISX0212</li> </ul>

Which XML parser for Haskell?

Tags:

parsing

xml

haskell

I'm trying to write some application, that performs analysis of data, stored in pretty big XML files (from 10 to 800MB). Each set of data is stored as single tag, with concrete data specified as attrobutes. I'm currently saxParse from HaXml, and I'm not satisfied with memory usage during work with it. On parsing of 15Mb XML file it consumes more than 1Gb of memory, although I tried to not to store data in the lists, and process it immediately. I use following code:

Click to copy

importOneFile file proc ioproc = do
  xml <- readFile file
  let (sxs, res) = saxParse file $ stripUnicodeBOM xml
  case res of
      Just str -> putStrLn $ "Error: " ++ str;
      Nothing -> forM_ sxs (ioproc . proc . (extractAttrs "row"))

where 'proc' - procedure, that performs conversion of data from attributes into record, and 'ioproc' - procedure, that performs some IO action - output to screen, storing in database, etc.

How i can decrease memory consumption during XML parsing? Should switching to another XML parser help?

Update: and which parser supports for different input encodings - utf-8, utf-16, utf-32, etc.?

251

asked Jun 26 '09 09:06

Alex Ott

1 Answers

If you're willing to assume that your inputs are valid, consider looking at TagSoup or Text.XML.Light from the Galois folks.

These take strings as input, so you can (indirectly) feed them anything Data.Encoding understands, namely

ASCII
UTF8
UTF16
UTF32
KOI8R
KOI8U
ISO88591
GB18030
BootString
ISO88592
ISO88593
ISO88594
ISO88595
ISO88596
ISO88597
ISO88598
ISO88599
ISO885910
ISO885911
ISO885913
ISO885914
ISO885915
ISO885916
CP1250
CP1251
CP1252
CP1253
CP1254
CP1255
CP1256
CP1257
CP1258
MacOSRoman
JISX0201
JISX0208
ISO2022JP
JISX0212

114

answered Nov 03 '22 09:11

Greg Bacon

Related questions
                            
                                how to pass a parameter and use that in my xslt
                            
                                string escape into XML-Attribute
                            
                                XML Parsing error: Extra content at the end of the document
                            
                                How can I concatenate static strings with XML string resources?
                            
                                getNodeName() operation on an XML node returns #text
                            
                                System.Xml.XmlException: Unexpected end of file while parsing Name has occurred
                            
                                How do I export Android XML vector drawables to another format?
                            
                                Is there a free/pay web service that I can query to get MLS data? [closed]
                            
                                Java REST client without schema
                            
                                Preventing BeautifulSoup from converting my XML tags to lowercase
                            
                                Regular expression to match ">", "<", "&" chars that appear inside XML nodes
                            
                                Java XML Parsing and original byte offsets
                            
                                xslt localization
                            
                                Open spreadsheet xml in Excel by default
                            
                                Duplicated field in generated XML using JAXB
                            
                                JAXB default attribute value
                            
                                XML format specification (DTD, XSD..) for Unit Test Reports
                            
                                IE7 & jquery ajax XML: permission denied on local xml file
                            
                                Parsing XML with undeclared prefixes in Python
                            
                                JAXB exception messages: How to change language?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With