Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search and replace multiple lines in xml/text files using python

---Update 3: I have got the script to update the required data into the xml files completed but the following code is being dropped from the written file. Why is this? how can I replace it?

<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type='text/xsl' href='ANZMeta.xsl'?>

Current working code (except for issue mentioned above).

import os, xml, arcpy, shutil
from xml.etree import ElementTree as et 

path=os.getcwd()
arcpy.env.workspace = path

FileList = arcpy.ListFeatureClasses()
FileCount = len(FileList)
zone="_Zone"

for File in FileList:
    FileDesc_obj = arcpy.Describe(File)
    FileNm=FileDesc_obj.file
    newMetaFile=FileNm+"_BaseMetadata.xml"

    check_meta=os.listdir(path)
    if FileNm+'.xml' in check_meta:
        shutil.copy2(FileNm+'.xml', newMetaFile)
    else:
        shutil.copy2('L:\Data_Admin\QA\Metadata_python_toolset\Master_Metadata.xml', newMetaFile)
    tree=et.parse(newMetaFile)

    print "Processing: "+str(File)

    for node in tree.findall('.//title'):
        node.text = str(FileNm)
    for node in tree.findall('.//northbc'):
        node.text = str(FileDesc_obj.extent.YMax)
    for node in tree.findall('.//southbc'):
        node.text = str(FileDesc_obj.extent.YMin)
    for node in tree.findall('.//westbc'):
        node.text = str(FileDesc_obj.extent.XMin)
    for node in tree.findall('.//eastbc'):
        node.text = str(FileDesc_obj.extent.XMax)        
    for node in tree.findall('.//native/nondig/formname'):
        node.text = str(os.getcwd()+"\\"+File)
    for node in tree.findall('.//native/digform/formname'):
        node.text = str(FileDesc_obj.featureType)
    for node in tree.findall('.//avlform/nondig/formname'):
        node.text = str(FileDesc_obj.extension)
    for node in tree.findall('.//avlform/digform/formname'):
        node.text = str(float(os.path.getsize(File))/int(1024))+" KB"
    for node in tree.findall('.//theme'):
        node.text = str(FileDesc_obj.spatialReference.name +" ; EPSG: "+str(FileDesc_obj.spatialReference.factoryCode))
    print node.text
    projection_info=[]
    Zone=FileDesc_obj.spatialReference.name

    if "GCS" in str(FileDesc_obj.spatialReference.name):
        projection_info=[FileDesc_obj.spatialReference.GCSName, FileDesc_obj.spatialReference.angularUnitName, FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName]
        print "Geographic Coordinate system"
    else:
        projection_info=[FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName, FileDesc_obj.spatialReference.angularUnitName, Zone[Zone.rfind(zone)-3:]]
        print "Projected Coordinate system"
    x=0
    for node in tree.findall('.//spdom'):
        for node2 in node.findall('.//keyword'):
            print node2.text
            node2.text = str(projection_info[x])
            print node2.text
            x=x+1


    tree.write(newMetaFile)

---Update 1&2: Thanks to Aleyna I have the following basic code that works

import os, xml, arcpy, shutil
from xml.etree import ElementTree as et 

CodeString=['northbc','southbc', '<nondig><formname>']

nondig='nondigital'
path=os.getcwd()
arcpy.env.workspace = path
xmlfile = path+"\\test.xml"

FileList = arcpy.ListFeatureClasses()
FileCount = len(FileList)

for File in FileList:
    FileDesc_obj = arcpy.Describe(File)
    FileNm=FileDesc_obj.file
    newMetaFile=FileNm+"_Metadata.xml"
    shutil.copy2('L:\Data_Admin\QA\Metadata_python_toolset\Master_Metadata.xml', newMetaFile)
    tree=et.parse(newMetaFile)

    for node in tree.findall('.//northbc'):
        node.text = str(FileDesc_obj.extent.YMax)
    for node in tree.findall('.//southbc'):
        node.text = str(FileDesc_obj.extent.YMin)
    for node in tree.findall('.//westbc'):
        node.text = str(FileDesc_obj.extent.XMin)
    for node in tree.findall('.//eastbc'):
        node.text = str(FileDesc_obj.extent.XMax)        
    for node in tree.findall('.//native/nondig/formname'):
        node.text = nondig

    tree.write(newMetaFile)

The issue is with dealing with xml code like

- <spdom>
  <keyword thesaurus="">GDA94</keyword> 
  <keyword thesaurus="">GRS80</keyword> 
  <keyword thesaurus="">Transverse Mercator</keyword> 
  <keyword thesaurus="">Zone 55 (144E - 150E)</keyword> 
  </spdom>

As keyword thes...is not unique within the <spdom> can we update these in a order from the values coming from

FileDesc_obj.spatialReference.name

u'GCS_GDA_1994'

---ORIGINAL POST---

I am building up a program to generate xml metadata files from spatial files in our library. I have already created the scripts to extract the required spatial and attrib data from the files and create a shp and text file based index of the files but now I want to write this info to base metadata xml file that is written to anzlic standards by replacing the values held by common/static elements...

So for example I want to replace the following xml code

<northbc>8097970</northbc>
<southbc>8078568</southbc>

with

<northbc> GeneratedValue_[desc.extent.XMax] /<northbc>
<southbc> GeneratedValue_[desc.extent.XMax] </southbc>

The issue is that obviously the number/value between and will not be the same.

Similarly for xml tags like <title>, <nondig><formname> etc...in the latter example both tags must be searched for together as formname appears multiple times (is not unique).

I am using the Python Regular Expression manual [here][1],

like image 508
GeorgeC Avatar asked Oct 24 '22 11:10

GeorgeC


2 Answers

Using the given tag(s) above:

import os
import xml
from xml.etree import ElementTree as et 
path = r"/your/path/to/xml.file" 
tree = et.parse(path)
for node in tree.findall('.//northbc'):
    node.text = "New Value"
tree.write(path)

Here, XPATH .//northbc returns all the 'northbc' nodes in the XML doc. You can tailor the code for your need easily.

like image 109
Aleyna Avatar answered Oct 27 '22 09:10

Aleyna


If you're dealing with valid XML, use XPath to find the nodes of interest and the ElementTree api to manipulate the node.

For instance, your xpath might be something like '//northbc' and you would just replace the text node inside it.

See http://docs.python.org/library/xml.etree.elementtree.html as well as http://pypi.python.org/pypi/lxml/2.2.8 for two different libraries that will help you get this done. Search google for XPath and see the w3c tutorial for a decent intro to XPath (I apparently can't post more than two links in a post or I'd link it too)

like image 44
gfortune Avatar answered Oct 27 '22 11:10

gfortune