I'm currently doing some parsing of very large xml files > 40 MB. I have just started developing in scala so I browsed the net for some good libs and stumbled upon Scala Scales which seems to be very good at handling large files.
I have read: http://scala-scales.googlecode.com/svn/sites/scales/scales-xml_2.9.1/0.2/ScalesXmlIntro.html , http://scala-scales.googlecode.com/svn/sites/scales/scales-xml_2.9.2/0.4.4/PullParsing.html
and then tested the pullXml function, to make sure all libs are imported correctly.
val pull = pullXml(new FileReader("/Users/mycrazyxml/tmp/large.xml"))
while( pull.hasNext ){
pull.next match {
case Left( i : XmlItem ) =>
// Handle XmlItem
Logger.info("XmlItem: "+i)
case Left( e : Elem ) => {
// Handle Element
Logger.info("Element: "+e)
}
case Right(endElem) =>
// Handle endElement
Logger.info("Endelement: "+endElem)
}
}
This results in that the entire file is printed to the console! Nice! Now it's time create the objects and save to the db, but I'm having trouble in grasping how to do this in a good way. I would really need some good examples of how to do this.
Eg. following XML has several Enterprise elements which can consist of one or several LocalUnits. The idea here is to create an Enterprise object with an array of LocalUnits. When the endElement is the closing tag for an Enterprise call the save method with the Enterprise object with it's LocalUnits.
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Info SYSTEM "info.dtd">
<Info>
<Enterprise>
<RegNo>12345678</RegNo>
<Address>
<StreetInfo>
<StreetName>Infinite Loop</StreetName>
<StreetNumber>1</StreetNumber>
</StreetInfo>
</Address>
<EName>
<Legal>Crazy Company</Legal>
</EName>
<SNI>
<Code>00000</Code>
<Rank>1</Rank>
</SNI>
<LocalUnit>
<CFARNo>987654321</CFARNo>
<LUType>1</LUType>
<LUName>Crazy Company Gym</LUName>
<LUStatus>1</LUStatus>
<SNI>
<Code>46772</Code>
<Rank>1</Rank>
</SNI>
<SNI>
<Code>68203</Code>
<Rank>2</Rank>
</SNI>
<Address>
<StreetInfo>
<StreetName>Infinite Loop</StreetName>
<StreetNumber>1</StreetNumber>
</StreetInfo>
</Address>
</LocalUnit>
<LocalUnit>
<CFARNo>987654322</CFARNo>
<LUType>1</LUType>
<LUName>Crazy Company Restaurant</LUName>
<LUStatus>1</LUStatus>
<SNI>
<Code>46772</Code>
<Rank>1</Rank>
</SNI>
<SNI>
<Code>68203</Code>
<Rank>2</Rank>
</SNI>
<Address>
<StreetInfo>
<StreetName>Infinite Loop</StreetName>
<StreetNumber>1</StreetNumber>
</StreetInfo>
</Address>
</LocalUnit>
</Enterprise>
<Enterprise>
<RegNo>12345671220</RegNo>
<Address>
<StreetInfo>
<StreetName>Cupertino Road</StreetName>
<StreetNumber>2</StreetNumber>
</StreetInfo>
</Address>
<EName>
<Legal>Fun Company HQ</Legal>
</EName>
<SNI>
<Code>00000</Code>
<Rank>1</Rank>
</SNI>
<LocalUnit>
<CFARNo>987654321</CFARNo>
<LUType>1</LUType>
<LUName>Fun Company</LUName>
<LUStatus>1</LUStatus>
<SNI>
<Code>46772</Code>
<Rank>1</Rank>
</SNI>
<SNI>
<Code>68203</Code>
<Rank>2</Rank>
</SNI>
<Address>
<StreetInfo>
<StreetName>Cupertino road</StreetName>
<StreetNumber>2</StreetNumber>
</StreetInfo>
</Address>
</LocalUnit>
</Enterprise>
</Info>
To sum it up. For the given xml how should I use pullXml to create my objects and call the save method with them?
val xmlFile = resource(this, "/data/enterprise_info.xml")
val xml = pullXml(xmlFile)
val Info = NoNamespaceQName("Info")
val Enterprise = NoNamespaceQName("Enterprise")
val LocalUnit = NoNamespaceQName("LocalUnit")
val LocalUnitName = NoNamespaceQName("LUName")
val EName = NoNamespaceQName("EName")
val Legal = NoNamespaceQName("Legal")
val EnterprisePath = List(Info, Enterprise)
// iterate over each Enterprise
// only an Enterprise at a time is in memory
val itr = iterate(EnterprisePath, xml)
for {
enterprise <- itr
enterpriseName <- enterprise \* EName \* Legal
} {
println("enterprise "+text(enterpriseName) +" has units:")
for {
localUnits <- enterprise \* LocalUnit
localName <- localUnits \* LocalUnitName
}{
println(" " + text(localName))
}
//do a save
}
Pulling in each LocalUnit lazily is more difficult at the moment, you must separate Paths for each subsection which isn't a LocalUnit.
Hth
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With