Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where do I find some code examples of Scales Xml

I'm currently doing some parsing of very large xml files > 40 MB. I have just started developing in scala so I browsed the net for some good libs and stumbled upon Scala Scales which seems to be very good at handling large files.

I have read: http://scala-scales.googlecode.com/svn/sites/scales/scales-xml_2.9.1/0.2/ScalesXmlIntro.html , http://scala-scales.googlecode.com/svn/sites/scales/scales-xml_2.9.2/0.4.4/PullParsing.html

and then tested the pullXml function, to make sure all libs are imported correctly.

val pull = pullXml(new FileReader("/Users/mycrazyxml/tmp/large.xml"))
while( pull.hasNext ){
   pull.next match {
        case Left( i : XmlItem ) =>
          // Handle XmlItem
          Logger.info("XmlItem: "+i)

        case Left( e : Elem ) => {
          // Handle Element
          Logger.info("Element: "+e)
        }

        case Right(endElem) =>
          // Handle endElement
          Logger.info("Endelement: "+endElem)        
      }
    }

This results in that the entire file is printed to the console! Nice! Now it's time create the objects and save to the db, but I'm having trouble in grasping how to do this in a good way. I would really need some good examples of how to do this.

Eg. following XML has several Enterprise elements which can consist of one or several LocalUnits. The idea here is to create an Enterprise object with an array of LocalUnits. When the endElement is the closing tag for an Enterprise call the save method with the Enterprise object with it's LocalUnits.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Info SYSTEM "info.dtd">
<Info>
  <Enterprise>
    <RegNo>12345678</RegNo>
    <Address>
      <StreetInfo>
        <StreetName>Infinite Loop</StreetName>
        <StreetNumber>1</StreetNumber>
      </StreetInfo>
    </Address>
    <EName>
      <Legal>Crazy Company</Legal>
    </EName>
    <SNI>
      <Code>00000</Code>
      <Rank>1</Rank>
    </SNI>
    <LocalUnit>
      <CFARNo>987654321</CFARNo>
      <LUType>1</LUType>
      <LUName>Crazy Company Gym</LUName>
      <LUStatus>1</LUStatus>
      <SNI>
        <Code>46772</Code>
        <Rank>1</Rank>
      </SNI>
      <SNI>
        <Code>68203</Code>
        <Rank>2</Rank>
      </SNI>
      <Address>
        <StreetInfo>
          <StreetName>Infinite Loop</StreetName>
          <StreetNumber>1</StreetNumber>
        </StreetInfo>
      </Address>
    </LocalUnit>
    <LocalUnit>
      <CFARNo>987654322</CFARNo>
      <LUType>1</LUType>
      <LUName>Crazy Company Restaurant</LUName>
      <LUStatus>1</LUStatus>
      <SNI>
        <Code>46772</Code>
        <Rank>1</Rank>
      </SNI>
      <SNI>
        <Code>68203</Code>
        <Rank>2</Rank>
      </SNI>
      <Address>
        <StreetInfo>
          <StreetName>Infinite Loop</StreetName>
          <StreetNumber>1</StreetNumber>
        </StreetInfo>
      </Address>
    </LocalUnit>
  </Enterprise>
<Enterprise>
    <RegNo>12345671220</RegNo>
    <Address>
      <StreetInfo>
        <StreetName>Cupertino Road</StreetName>
        <StreetNumber>2</StreetNumber>
      </StreetInfo>
    </Address>
    <EName>
      <Legal>Fun Company HQ</Legal>
    </EName>
    <SNI>
      <Code>00000</Code>
      <Rank>1</Rank>
    </SNI>
    <LocalUnit>
      <CFARNo>987654321</CFARNo>
      <LUType>1</LUType>
      <LUName>Fun Company</LUName>
      <LUStatus>1</LUStatus>
      <SNI>
        <Code>46772</Code>
        <Rank>1</Rank>
      </SNI>
      <SNI>
        <Code>68203</Code>
        <Rank>2</Rank>
      </SNI>
      <Address>
        <StreetInfo>
          <StreetName>Cupertino road</StreetName>
          <StreetNumber>2</StreetNumber>
        </StreetInfo>
      </Address>
    </LocalUnit>       
  </Enterprise>
</Info>

To sum it up. For the given xml how should I use pullXml to create my objects and call the save method with them?

like image 803
jakob Avatar asked Nov 12 '22 07:11

jakob


1 Answers

val xmlFile = resource(this, "/data/enterprise_info.xml")
val xml = pullXml(xmlFile)

val Info = NoNamespaceQName("Info")
val Enterprise = NoNamespaceQName("Enterprise")
val LocalUnit = NoNamespaceQName("LocalUnit")
val LocalUnitName = NoNamespaceQName("LUName")
val EName = NoNamespaceQName("EName")
val Legal = NoNamespaceQName("Legal")

val EnterprisePath = List(Info, Enterprise)

// iterate over each Enterprise
// only an Enterprise at a time is in memory
val itr =  iterate(EnterprisePath, xml)

for { 
  enterprise <- itr
  enterpriseName <- enterprise \* EName \* Legal
} {
  println("enterprise "+text(enterpriseName) +" has units:")
  for {
    localUnits <- enterprise \* LocalUnit 
    localName <- localUnits \* LocalUnitName
  }{
    println("  " + text(localName))
  }
  //do a save
}

Pulling in each LocalUnit lazily is more difficult at the moment, you must separate Paths for each subsection which isn't a LocalUnit.

Hth

like image 109
Chris Avatar answered Nov 15 '22 13:11

Chris