Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looping over a large XML file

Tags:

coldfusion

xml

I'm having problems looping over an XML file about 20-30 MB (650000 rows).

This is my meta-code:

<cffile action="READ" ile="file.xml" variable="usersRaw">

<cfset usersXML = XmlParse(usersRaw)>
<cfset advsXML = XmlSearch(usersXML, "/advs/advuser")>
<cfset users = XmlSearch(usersXML, "/advs/advuser/user")>

<cfset numUsers = ArrayLen(users)>
<cfloop index="i" from="1" to="#numUsers#">
    ... some selects...
    ... insert...
    <cfset advs = annunciXml[i]["vehicle"]>
    <cfset numAdvs = ArrayLen(advs)> 
    <cfloop index="k" from="1" to="#numAdvs#">        
        ... insert... or ... update...
    </cfloop>
</cfloop>

struct of xml file is (yes, is not very good :-)

<advs>
   <advuser>
      <user>
      </user>
      <vehicle>
      <vehicle>
   </advuser>
</advs>

After ~120,000 rows I get an error: "Out of memory".

How can I improve performance of my script?

How can I diagnose where there is max memory consumption?

like image 480
Roberto Avatar asked Feb 14 '11 17:02

Roberto


People also ask

Can we use for loop in XML file?

This utility creates a loop that detects the size of the input and parses it in smaller segments, appending the segments together to create a fully-parsed result. If not specified, a default value of 'loops' is used. If not specified, a default value of 'loop' is used.

How do I open a heavy XML file?

If you want to open an XML file and edit it, you can use a text editor. You can use default text editors, which come with your computer, like Notepad on Windows or TextEdit on Mac. All you have to do is locate the XML file, right-click the XML file, and select the "Open With" option.

How big can an XML file be?

Symptoms. Even though the maximum file size is set to 100 MB, it is still possible to import an XML file larger than 100 MB via P6 Professional.


2 Answers

@SamG is correct that ColdFusion XML parsing can't do it because of the DOM parser, but SAX is painful, instead use a StAX parser, which provides a much simpler iterator interface. See the answer to another question I provided for an example of how to do this with ColdFusion.

This is roughly what you'd do for your example:

<cfset fis = createObject("java", "java.io.FileInputStream").init(
    "#getDirectoryFromPath(getCurrentTemplatePath())#/file.xml"
)>
<cfset bis = createObject("java", "java.io.BufferedInputStream").init(fis)>
<cfset XMLInputFactory = createObject("java", "javax.xml.stream.XMLInputFactory").newInstance()>
<cfset reader = XMLInputFactory.createXMLStreamReader(bis)>

<cfloop condition="#reader.hasNext()#">
    <cfset event = reader.next()>
    <cfif event EQ reader.START_ELEMENT>
        <cfswitch expression="#reader.getLocalName()#">
            <cfcase value="advs">
                <!--- root node, do nothing --->
            </cfcase>
            <cfcase value="advuser">
                <!--- set values used later on for inserts, selects, updates --->
            </cfcase>
            <cfcase value="user">
                <!--- some selects and insert --->
            </cfcase>
            <cfcase value="vehicle">
                <!--- insert or update --->
            </cfcase>
        </cfswitch>
    </cfif>
</cfloop>

<cfset reader.close()>
like image 149
orangepips Avatar answered Sep 28 '22 19:09

orangepips


orangepips provides a reasonable solution. Please take a look at Ben Nadel's solution for handling very large XML files in ColdFusion. I have tested his approach on a 50MB XML file with 1.2 million lines. Ben uses a similar approach that orangepips provides here -- stream it using Java, then XMLParse each node in ColdFusion to get to the goods. Check it out -- like most of Ben Nadel's code and tutorials, it just works.

http://www.bennadel.com/blog/1345-Ask-Ben-Parsing-Very-Large-XML-Documents-In-ColdFusion.htm

like image 29
Marty McGee Avatar answered Sep 28 '22 18:09

Marty McGee