I am using XPath with Python lxml (Python 2). I run through two passes on the data, one to select the records of interest, and one to extract values from the data. Here is a sample of the type of code. <pre class="prettyprint"><code>from lxml import etree xml = """ <records> <row id="1" height="160" weight="80" /> <row id="2" weight="70" /> <row id="3" height="140" /> </records> """ parsed = etree.fromstring(xml) nodes = parsed.xpath('/records/row') for node in nodes: print node.xpath("@id|@height|@weight") </code></pre> When I run this script the output is: <pre class="prettyprint"><code>['1', '160', '80'] ['2', '70'] ['3', '140'] </code></pre> As you can see from the result, where an attribute is missing, the position of the other attributes changes, so I cannot tell in row 2 and 3 whether this is the height or the weight. Is there a way to get the names of the attributes returned from etree/lxml? Ideally, I should be looking at a result in the format: <pre class="prettyprint"><code>[('@id', '1'), ('@height', '160'), ('@weight', '80')] </code></pre> I recognise that I can solve this specific case using elementtree and Python. However, I wish to resolve this using XPaths (and relatively simple XPaths), rather than process the data using python.

You should try following: <pre class="prettyprint"><code>for node in nodes: print node.attrib </code></pre> This will return dict of all attributes of node as <code>{'id': '1', 'weight': '80', 'height': '160'}</code> If you want to get something like <code>[('@id', '1'), ('@height', '160'), ('@weight', '80')]</code>: <pre class="prettyprint"><code>list_of_attributes = [] for node in nodes: attrs = [] for att in node.attrib: attrs.append(("@" + att, node.attrib[att])) list_of_attributes.append(attrs) </code></pre> Output: <pre class="prettyprint"><code>[[('@id', '1'), ('@height', '160'), ('@weight', '80')], [('@id', '2'), ('@weight', '70')], [('@id', '3'), ('@height', '140')]] </code></pre>

Retrieve attribute names and values with Python / lxml and XPath

I am using XPath with Python lxml (Python 2). I run through two passes on the data, one to select the records of interest, and one to extract values from the data. Here is a sample of the type of code.

from lxml import etree

xml = """
  <records>
    <row id="1" height="160" weight="80" />
    <row id="2" weight="70" />
    <row id="3" height="140" />
  </records>
"""

parsed = etree.fromstring(xml)
nodes = parsed.xpath('/records/row')
for node in nodes:
    print node.xpath("@id|@height|@weight")

When I run this script the output is:

['1', '160', '80']
['2', '70']
['3', '140']

As you can see from the result, where an attribute is missing, the position of the other attributes changes, so I cannot tell in row 2 and 3 whether this is the height or the weight.

Is there a way to get the names of the attributes returned from etree/lxml? Ideally, I should be looking at a result in the format:

[('@id', '1'), ('@height', '160'), ('@weight', '80')]

I recognise that I can solve this specific case using elementtree and Python. However, I wish to resolve this using XPaths (and relatively simple XPaths), rather than process the data using python.

What is XPath in lxml?

The xpath() method For ElementTree, the xpath method performs a global XPath query against the document (if absolute) or against the root node (if relative): >>> f = StringIO('<foo><bar></bar></foo>') >>> tree = etree.

Is XML and lxml are same?

lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. There are a lot of off-the-shelf XML parsers out there, but for better results, developers sometimes prefer to write their own XML and HTML parsers. This is when the lxml library comes to play.

Is lxml included in Python?

lxml has been downloaded from the Python Package Index millions of times and is also available directly in many package distributions, e.g. for Linux or macOS.

You should try following:

for node in nodes:
    print node.attrib

This will return dict of all attributes of node as {'id': '1', 'weight': '80', 'height': '160'}

If you want to get something like [('@id', '1'), ('@height', '160'), ('@weight', '80')]:

list_of_attributes = []
for node in nodes:
    attrs = []
    for att in node.attrib:
        attrs.append(("@" + att, node.attrib[att]))
    list_of_attributes.append(attrs)

Output:

[[('@id', '1'), ('@height', '160'), ('@weight', '80')], [('@id', '2'), ('@weight', '70')], [('@id', '3'), ('@height', '140')]]

I was wrong in my assertion that I was not going to use Python. I found that the lxml/etree implementation is easily extended to that I can use the XPath DSL with modifications.

I registered the function "dictify". I changed the XPath expression to :

dictify('@id|@height|@weight|weight|height')

The new code is:

from lxml import etree

xml = """
<records>
    <row id="1" height="160" weight="80" />
    <row id="2" weight="70" ><height>150</height></row>
    <row id="3" height="140" />
</records>
"""

def dictify(context, names):
    node = context.context_node
    rv = []
    rv.append('__dictify_start_marker__')
    names = names.split('|')
    for n in names:
        if n.startswith('@'):
            val =  node.attrib.get(n[1:])
            if val != None:
                rv.append(n)
                rv.append(val)
        else:
            children = node.findall(n)
            for child_node in children:
                rv.append(n)
                rv.append(child_node.text)
    rv.append('__dictify_end_marker__')
    return rv

etree_functions = etree.FunctionNamespace(None)
etree_functions['dictify'] = dictify


parsed = etree.fromstring(xml)
nodes = parsed.xpath('/records/row')
for node in nodes:
    print node.xpath("dictify('@id|@height|@weight|weight|height')")

This produces the following output:

['__dictify_start_marker__', '@id', '1', '@height', '160', '@weight', '80', '__dictify_end_marker__']
['__dictify_start_marker__', '@id', '2', '@weight', '70', 'height', '150', '__dictify_end_marker__']
['__dictify_start_marker__', '@id', '3', '@height', '140', '__dictify_end_marker__']

Retrieve attribute names and values with Python / lxml and XPath

Tags:

python

xpath

lxml

Kevin Gill

People also ask

2 Answers

Andersson

Kevin Gill

Recent Activity

Donate For Us

Retrieve attribute names and values with Python / lxml and XPath

Tags:

python

xpath

lxml

Kevin Gill

People also ask

2 Answers

Andersson

Kevin Gill

Related questions

Recent Activity

Donate For Us