I'm trying to understand and XPath that was sent to me for use with ACORD XML forms (common format in insurance). The XPath they sent me is (truncated for brevity):
./PersApplicationInfo/InsuredOrPrincipal[InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]/GeneralPartyInfo
Where I'm running into trouble is that Python's lxml
library is telling me that [InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]
is an invalid predicate
. I'm not able to find anywhere in the XPath spec on predicates which identifies this syntax so that I can modify this predicate to work.
Is there any documentation on what exactly this predicate is selecting? Also, is this even a valid predicate, or has something been mangled somewhere?
Possibly related:
I believe the company I am working with is an MS shop, so this XPath may be valid in C# or some other language in that stack? I'm not entirely sure.
Updates:
Per comment demand, here is some additional info.
XML sample:
<ACORD>
<InsuranceSvcRq>
<HomePolicyQuoteInqRq>
<PersPolicy>
<PersApplicationInfo>
<InsuredOrPrincipal>
<InsuredOrPrincipalInfo>
<InsuredOrPrincipalRoleCd>AN</InsuredOrPrincipalRoleCd>
</InsuredOrPrincipalInfo>
<GeneralPartyInfo>
<Addr>
<Addr1></Addr1>
</Addr>
</GeneralPartyInfo>
</InsuredOrPrincipal>
</PersApplicationInfo>
</PersPolicy>
</HomePolicyQuoteInqRq>
</InsuranceSvcRq>
</ACORD>
Code sample (with full XPath instead of snippet):
>>> from lxml import etree
>>> tree = etree.fromstring(raw)
>>> tree.find('./InsuranceSvcRq/HomePolicyQuoteInqRq/PersPolicy/PersApplicationInfo/InsuredOrPrincipal[InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]/GeneralPartyInfo/Addr/Addr1')
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "lxml.etree.pyx", line 1409, in lxml.etree._Element.find (src/lxml/lxml.etree.c:39972)
File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 271, in find
it = iterfind(elem, path, namespaces)
File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 261, in iterfind
selector = _build_path_iterator(path, namespaces)
File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 245, in _build_path_iterator
selector.append(ops[token[0]](_next, token))
File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 207, in prepare_predicate
raise SyntaxError("invalid predicate")
SyntaxError: invalid predicate
Change tree.find
to tree.xpath
. find
and findall
are present in lxml to provide compatibility with other implementations of ElementTree. These methods do not implement the entire XPath language. To employ XPath expressions containing more advanced features, use the xpath
method, the XPath
class, or XPathEvaluator
.
For example:
import io
import lxml.etree as ET
content='''\
<ACORD>
<InsuranceSvcRq>
<HomePolicyQuoteInqRq>
<PersPolicy>
<PersApplicationInfo>
<InsuredOrPrincipal>
<InsuredOrPrincipalInfo>
<InsuredOrPrincipalRoleCd>AN</InsuredOrPrincipalRoleCd>
</InsuredOrPrincipalInfo>
<GeneralPartyInfo>
<Addr>
<Addr1></Addr1>
</Addr>
</GeneralPartyInfo>
</InsuredOrPrincipal>
</PersApplicationInfo>
</PersPolicy>
</HomePolicyQuoteInqRq>
</InsuranceSvcRq>
</ACORD>
'''
tree=ET.parse(io.BytesIO(content))
path='//PersApplicationInfo/InsuredOrPrincipal[InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]/GeneralPartyInfo'
result=tree.xpath(path)
print(result)
yields
[<Element GeneralPartyInfo at b75a8194>]
while tree.find
yields
SyntaxError: invalid node predicate
Your example is perfectly fine in my opinion. I would check if lxmls XPath implementation has some documented limitations or something like that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With