Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parsing XML in clojure

Tags:

xml

clojure

I am new to clojure so please bear with me. I have a XML which looks like this

<?xml version="1.0" encoding="UTF-8"?>
<XVar Id="cdx9" Type="Dictionary">
  <XVar Id="Base.AccruedPremium" Type="Multi" Value="" Rows="1" Columns="1">
    <Row Id="0">
      <Col Id="0" Type="Num" Value="0"/>
    </Row>
  </XVar>
  <XVar Id="TrancheAnalysis.IndexDuration" Type="Multi" Value="" Rows="1" Columns="1">
    <Row Id="0">
      <Col Id="0" Type="Num" Value="3.4380728252313069"/>
    </Row>
  </XVar>
  <XVar Id="TrancheAnalysis.IndexLevel01" Type="Multi" Value="" Rows="1" Columns="1">
    <Row Id="0">
      <Col Id="0" Type="Num" Value="30693.926279941188"/>
    </Row>
  </XVar>
  <XVar Id="TrancheAnalysis.TrancheDelta" Type="Multi" Value="" Rows="1" Columns="1">
    <Row Id="0">
      <Col Id="0" Type="Num" Value="8.9304387917502073"/>
    </Row>
  </XVar>
  <XVar Id="TrancheAnalysis.TrancheDuration" Type="Multi" Value="" Rows="1" Columns="1">
    <Row Id="0">
      <Col Id="0" Type="Num" Value="3.0775955481964035"/>
    </Row>
  </XVar>
</XVar>

And it repeats. From this I want to be able to produce a CSV file with these columns

IndexName,TrancheAnalysis.IndexDuration,TrancheAnalysis.TrancheDuration
cdx9,3.4380728252313069,3.0775955481964035
.........................................
.........................................

I am able to parse a simple XML file like

<?xml version="1.0" encoding="UTF-8"?>
<CalibrationData>
  <IndexList>
    <Index>
      <Calibrate>Y</Calibrate>
      <UseClientIndexQuotes>Y</UseClientIndexQuotes>
      <IndexName>HYCDX10</IndexName>
      <Tenor>06/20/2013</Tenor>
      <TenorName>3Y</TenorName>
      <IndexLevels>219.6</IndexLevels>
      <Tranche>Equity0To0.15</Tranche>
      <TrancheStart>0</TrancheStart>
      <TrancheEnd>0.15</TrancheEnd>
      <UseBreakEvenSpread>1</UseBreakEvenSpread>
      <UseTlet>0</UseTlet>
      <IsTlet>0</IsTlet>
      <PctExpectedLoss>0</PctExpectedLoss>
      <UpfrontFee>52.125</UpfrontFee>
      <RunningFee>0</RunningFee>
      <DeltaFee>5.3</DeltaFee>
      <CentralCorrelation>0.1</CentralCorrelation>
      <Currency>USD</Currency>
      <RescalingMethod>PTIndexRescaling</RescalingMethod>
      <EffectiveDate>06/17/2011</EffectiveDate>
    </Index>
  </IndexList>
</CalibrationData>

with this code

(ns DynamicProgramming
  (:require [clojure.xml :as xml]))
;Get the Input Files
(def calibrationFile "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/CalibrationQuotes.xml")
(def mktdataFile "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/MarketData.xml")
(def sample "C:/ashwani/Eclipse/HistoricalTrancheAnalysis/src/Sample.xml")

;Parse the Calibration Input File
    (def CalibOp (for [x 
                  (xml-seq 
                    (xml/parse (java.io.File. calibrationFile)))
          :when (or 
                  (= :IndexName (:tag x)) 
                  (= :Tenor (:tag x))
                  (= :UpfrontFee (:tag x))
                  (= :RunningFee (:tag x))
                  (= :DeltaFee (:tag x))
                  (= :IndexLevels (:tag x))
                  (= :TrancheStart (:tag x))
                  (= :TrancheEnd (:tag x))
                 )]
    (first(:content x))))
    (println  CalibOp)

But the second XML is simple; on the other hand I don't know how to iterate through the nested structure of the first XML example and extract the information I want.

Any help will be great.

like image 664
Ash Avatar asked Jun 24 '11 15:06

Ash


1 Answers

I would use data.zip (Formerly clojure.contrib.zip-filter). It provides a lot of xml-parsing power and it's easily capable of performing xpath like expressions. The README describes it as a System for filtering trees, and XML trees in particular.

Below I have some sample code for creating a "row" for the CSV file. The row is a map of the column name to the attribute value.

(ns work 
    (:require [clojure.xml :as xml]
              [clojure.zip :as zip]
              [clojure.contrib.zip-filter.xml :as zf]))

; create a zip from the xml file
(def zip (zip/xml-zip (xml/parse "data.xml")))

; pulls out a list of all of the root "Id" attribute values
(zf/xml-> zip (zf/attr :Id))

(defn value [xvar-zip]
  "Finds the id and value for a particular element"
  (let [id (-> xvar-zip zip/node :attrs :Id) ; manual access
        value (zf/xml1-> xvar-zip ; use xpath like expression to pull value out
                         :Row ; need the row element
                         :Col ; then the column element
                         (zf/attr :Value))] ; and finally pull the Value out
    {id value}))

; gets the "column-value" pair for a single column
(zf/xml1-> zip
           (zf/attr= :Id "cdx9") ; filter on id "cdx9" 
           :XVar ; filter on XVars under it 
           (zf/attr= :Id "TrancheAnalysis.IndexDuration") ; filter on id
           value) ; apply the value function on the result of above

; creates a map of every column key to it's corresponding value
(apply merge (zf/xml-> zip (zf/attr= :Id "cdx9") :XVar value))

I'm not sure how the xml would work with multiple Dictionary XVars, as it is a root element. If you need to, one of the other functions which is useful for this type of work is mapcat, which cats all of the values returned from the mapping function.

There are some more examples in the test source as well.

One other big recommendation I have is to make sure you use a lot of small functions. You'll find things much easier to debug, test, and work with.

like image 152
deterb Avatar answered Sep 18 '22 10:09

deterb