I'm trying to use lxml to get an array of tags that are formatted as
<TEXT1>TEXT</TEXT1>
<TEXT2>TEXT</TEXT2>
<TEXT3>TEXT</TEXT3>
I tried using
xml_file.findall("TEXT*")
but this searches for a literal asterisk.
I've also try to use ETXPath but it seems to not work. Is there any API function to work with that, because assuming that TEXT is append by integers isn't the prettiest solution.
Yes, you can use regular expressions in lxml xpath.
Here is one example:
results = root.xpath(
"//*[re:test(local-name(), '^TEXT.*')]",
namespaces={'re': "http://exslt.org/regular-expressions"})
Of course, in the example you mention you don't really need a regular expression. You could use the starts-with()
xpath function:
results = root.xpath("//*[starts-with(local-name(), 'TEXT')]")
Complete program:
from lxml import etree
root = etree.XML('''
<root>
<TEXT1>one</TEXT1>
<TEXT2>two</TEXT2>
<TEXT3>three</TEXT3>
<x-TEXT4>but never four</x-TEXT4>
</root>''')
result1 = root.xpath(
"//*[re:test(local-name(), '^TEXT.*')]",
namespaces={'re': "http://exslt.org/regular-expressions"})
result2 = root.xpath("//*[starts-with(local-name(), 'TEXT')]")
assert(result1 == result2)
for result in result1:
print result.text, result.tag
Addressing a new requirement, consider this XML:
<root>
<tag>
<TEXT1>one</TEXT1>
<TEXT2>two</TEXT2>
<TEXT3>three</TEXT3>
</tag>
<other_tag>
<TEXT1>do not want to found one</TEXT1>
<TEXT2>do not want to found two</TEXT2>
<TEXT3>do not want to found three</TEXT3>
</other_tag>
</root>
If one wants to find all TEXT
elements that are immediate children of a <tag>
element:
result = root.xpath("//tag/*[starts-with(local-name(), 'TEXT')]")
assert(' '.join(e.text for e in result) == 'one two three')
Or, if one wants to all TEXT
elements that are immediate children of only the first tag
element:
result = root.xpath("//tag[1]/*[starts-with(local-name(), 'TEXT')]")
assert(' '.join(e.text for e in result) == 'one two three')
Or, if one wants to find only the first TEXT
element of each tag
element:
result = root.xpath("//tag/*[starts-with(local-name(), 'TEXT')][1]")
assert(' '.join(e.text for e in result) == 'one')
Resorources:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With