Goal:
Get the values inside <Name>
tags and print them out. Simplified XML below.
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soap:Body>
<GetStartEndPointResponse xmlns="http://www.etis.fskab.se/v1.0/ETISws">
<GetStartEndPointResult>
<Code>0</Code>
<Message />
<StartPoints>
<Point>
<Id>545</Id>
<Name>Get Me</Name>
<Type>sometype</Type>
<X>333</X>
<Y>222</Y>
</Point>
<Point>
<Id>634</Id>
<Name>Get me too</Name>
<Type>sometype</Type>
<X>555</X>
<Y>777</Y>
</Point>
</StartPoints>
</GetStartEndPointResult>
</GetStartEndPointResponse>
</soap:Body>
</soap:Envelope>
Attempt:
import requests
from xml.etree import ElementTree
response = requests.get('http://www.labs.skanetrafiken.se/v2.2/querystation.asp?inpPointfr=yst')
# XML parsing here
dom = ElementTree.fromstring(response.text)
names = dom.findall('*/Name')
for name in names:
print(name.text)
I have read other people recommending zeep
to parse soap xml but I found it hard to get my head around.
Example Read XML File in Python To read an XML file, firstly, we import the ElementTree class found inside the XML library. Then, we will pass the filename of the XML file to the ElementTree. parse() method, to start parsing. Then, we will get the parent tag of the XML file using getroot() .
The issue here is dealing with the XML namespaces:
import requests
from xml.etree import ElementTree
response = requests.get('http://www.labs.skanetrafiken.se/v2.2/querystation.asp?inpPointfr=yst')
# define namespace mappings to use as shorthand below
namespaces = {
'soap': 'http://schemas.xmlsoap.org/soap/envelope/',
'a': 'http://www.etis.fskab.se/v1.0/ETISws',
}
dom = ElementTree.fromstring(response.content)
# reference the namespace mappings here by `<name>:`
names = dom.findall(
'./soap:Body'
'/a:GetStartEndPointResponse'
'/a:GetStartEndPointResult'
'/a:StartPoints'
'/a:Point'
'/a:Name',
namespaces,
)
for name in names:
print(name.text)
The namespaces come from the xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
and xmlns="http://www.etis.fskab.se/v1.0/ETISws"
attributes on the Envelope
and GetStartEndPointResponse
nodes respectively.
Keep in mind, a namespace is inherited by all children nodes of a parent even if the namespace isn't explicitly specified on the child's tag as <namespace:tag>
.
Note: I had to use response.content
rather than response.body
.
Again, replying to an old question but I think this solution is worth sharing. Using BeautifulSoup was piece of cake for me. You can install BeautifulSoup form here.
from bs4 import BeautifulSoup
xml = BeautifulSoup(xml_string, 'xml')
xml.find('soap:Body') # to get the soup:Body tag.
xml.find('X') # for X tag
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With