Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Update XML with an SQL query

Tags:

python

sql

xml

Let's say we have the following XML file:

from lxml import etree
root = etree.fromstring("""
    <a>
        <b>
            <c>Hello</c>
            <d>1234</d>
        </b>
        <b>
            <c>Bonjour</c>
            <d>5678</d>
        </b>        
    </a>
    """)

How could it be possible to update the XML with a SQL-like query language?

Example (pseudo-code):

etree.query("""UPDATE a.b SET c.text = 'foo' WHERE d.text = '5678'""")

or even more complex chained-queries, such as:

UPDATE a.b SET c.text = (SELECT ... FROM ... WHERE ...) WHERE d.text = '5678'

The goal is to be able to modify complex XML documents with just one or two lines of code, with a SQL-like syntax. Sidenote: As a comparison, doing this manually with lxml, find, etc. works, but is much longer.

like image 536
Basj Avatar asked Feb 13 '21 12:02

Basj


People also ask

How edit XML in SQL?

modify() Method (xml Data Type) Use this method to modify the content of an xml type variable or column. This method takes an XML DML statement to insert, update, or delete nodes from XML data. The modify() method of the xml data type can only be used in the SET clause of an UPDATE statement.

How do I query XML data in SQL?

SQL Server provides the XQuery feature to querying XML data type or querying with the XML column with the XPATH. Using XQuery, users can Insert, Update and Delete with the XML nodes and node values in an XML column.

Can you use SQL in XML?

SQL Server lets you retrieve data as XML by supporting the FOR XML clause, which can be included as part of your query. You can use the FOR XML clause in the main (outer) query as well as in subqueries. The clause supports numerous options that let you define the format of the XML data.


2 Answers

Consider XSLT, the special purpose language designed to transform XML documents, which shares a few attributes of SQL:

  • Special purpose language that can be called by general purpose application layer.
  • Industry language not restricted to any language like Python and can be run by standalone processors like Saxon and Xalan.
  • Declarative style and semantics and can receive parameters from application layer.

Python's lxml can run XSLT 1.0 scripts which are simply special XML files. To answer your first query, use the conditional XSLT counterpart of <xsl:choose>.

Load Documents

from lxml import etree

root = etree.fromstring("""\
    <a>
        <b>
            <c>Hello</c>
            <d>1234</d>
        </b>
        <b>
            <c>Bonjour</c>
            <d>5678</d>
        </b>        
    </a>""")

xsl = etree.fromstring("""\
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <!-- INITIALIZE PARAMETERS -->
    <xsl:param name="c_val"/>
    <xsl:param name="d_val"/>

    <!-- IDENTITY TRANSFORM -->
    <xsl:template match="node()|@*">
       <xsl:copy>
         <xsl:apply-templates select="node()|@*"/>
       </xsl:copy>
    </xsl:template>

    <!-- CONDITIONALLY UPDATE <c> -->
    <xsl:template match="c">
       <xsl:copy>
         <xsl:choose>
             <xsl:when test="following-sibling::d = $d_val">
                   <xsl:value-of select="$c_val"/>
             </xsl:when>
             <xsl:otherwise>
                   <xsl:value-of select="text()"/>
             </xsl:otherwise>  
         </xsl:choose>
       </xsl:copy>
    </xsl:template>    
</xsl:stylesheet>""")

Run Transformation

# INITIALIZE TRANSFORMER
transformer = etree.XSLT(xsl)

# TRANSFORM WITH PARAMETERS
result = transformer(
            root, 
            c_val=etree.XSLT.strparam("foo"),
            d_val=etree.XSLT.strparam("5678")
         )

# OUTPUT RESULT TO CONSOLE
print(result)
# <a>
#   <b>
#     <c>Hello</c>
#     <d>1234</d>
#   </b>
#   <b>
#     <c>foo</c>
#     <d>5678</d>
#   </b>
#</a>

Not quite clear about your second query. But if needing to query from a different XML document (counterpart to different SQL table), XSLT maintains the document() function and can be called in most places including parameter. Adjust XPath below.

<xsl:param name="c_val">
   <xsl:value-of select="document('other.xml')/a/b/c[1]">
</xsl:param>

If not passing c_val from app layer, be sure to remove that parameter in Python, otherwise you will overwrite above.

like image 163
Parfait Avatar answered Oct 14 '22 00:10

Parfait


At the first look, it seems this is an easy job to do but, in fact, isn't.

The hardest work will be "how to parse the XML recursively", to identify each (node -> sub-node -> sub-sub-node ... ), after that, the building of the query will be a little easier.

I believe that lxml library will give you the nodes as iterable objects, so they all contain iter() method which can be executed by next, so I believe that standard python with lxml library is sufficient for the recursive work.

but, thinking again about the idea (from XML to SQL query), here the SQL query has its own properties, and managing them is a need :

  • Database.
  • Table.
  • Column.
  • Statements.

To manage this! You end up building a medium-size library (and maybe an ORM) to convert each node to it's correct SQL statement.

let consider this :

<a>
   <b>
      <c>Hello</c>
      <d>1234</d>
   </b>
</a>

Nodes must be identified as:

  • a node is a database.
  • b node is a table.
  • c and d are columns.

And that what makes the idea unique and must be self-programmed.

like image 1
bguernouti Avatar answered Oct 13 '22 22:10

bguernouti