Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath in Nokogiri returning empty array [] whereas I am expecting to have results

I am trying to parse XML files using Nokogiri, Ruby and XPath. I usually don't encounter any problem but with the following I can't make any xpath request:

doc = Nokogiri::HTML(open("myfile.xml"))
doc.("//Meta").count 
# result ==> 0

doc.xpath("//Meta") 
# result ==> []

doc.xpath(.).count
# result => 1

Here is an simplified version of my XML File

<Answer xmlns="test:com.test.search" context="hf%3D10%26target%3Dst0" last="0" estimated="false" nmatches="1" nslices="0" nhits="1" start="0">
  <time>
    ...
  </time>
  <promoted>
    ...
  </promoted>
  <hits>
    <Hit url="http://www.test.com/" source="test" collapsed="false" preferred="false" score="1254772" sort="0" mask="272" contentFp="4294967295" did="1287" slice="1">
      <groups>
        ...
      </groups>
      <metas>
        <Meta name="enligne">
          <MetaString name="value">
          </MetaString>
        </Meta>

        <Meta name="language">
          <MetaString name="value">
            fr
          </MetaString>
        </Meta>
        <Meta name="text">
          <MetaText name="value">
            <TextSeg highlighted="false" highlightClass="0">
              La
            </TextSeg>
          </MetaText>
        </Meta>
      </metas>
    </Hit>
  </hits>
  <keywords>
    ...
  </keywords>
  <groups>
    ...
  </groups>

How can I get all children of <Hit> from this XML?

like image 296
jaouad Avatar asked Jun 22 '12 13:06

jaouad


3 Answers

Include the namespace information when calling xpath:

doc.xpath("//x:Meta", "x" => "test:com.test.search")
like image 66
dusan Avatar answered Nov 08 '22 06:11

dusan


You can use the remove_namespaces! method and save your day.

like image 27
rizidoro Avatar answered Nov 08 '22 06:11

rizidoro


This is one of the most FAQ XPAth questions -- search for "XPath default namespace".

If there is no way to register a namespace for the default namespace and use the registered prefix (say "x" in //x:Meta) then use:

//*[name() = 'Meta` and namespace-uri()='test:com.test.search']

If it is known that Meta can only belong to the default namespace, then the above can be shortened to:

//*[name() = 'Meta`]
like image 20
Dimitre Novatchev Avatar answered Nov 08 '22 06:11

Dimitre Novatchev