Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

lxml._ElementTree.getpath(element) returns "*" instead of tag names for elements in non default name space

Please help to make getpath() to return full tag names in getpath() xpath or to find workaround

I'm trying to generate xpath to element in a lxml.etree._ElementTree. ElementTree is generated by parsing 600Kb response from some production WebService.

print elem.getroottree().getpath(elem)

Here is result I get:

'/S:Envelope/S:Body/ns5:getPhysicalResponse/*[18]/*[12]/*[6]/*[2]'

Unfortunately I cannot post original xml - it contains proprietary customer information. Also I tried to reproduce this result with automatically generated simple element tree that has 100 nested levels, each level having 100 children but without luck - getpath() returned xpath with full tag names.

Update Looking into lxml source code - it points to tree.h xmlGetNodePath method from the libxml2 library. So this is actually libxml2 behavior.

Update Doing more tests I figured out that this happens every time when tag has non default namespace.

like image 909
vvladymyrov Avatar asked Nov 13 '22 09:11

vvladymyrov


1 Answers

use getelementpath() and postprocess namespace as you like.

like image 67
Tay Cho Avatar answered Nov 14 '22 21:11

Tay Cho