Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to extract text outside tags xml

I want to extract text outside tags. For example,

<body>
    This is an exmaple
    <p>
        blablabla
    </p>
    <references>
        refer 1
        refer 2
    </references>
</body>

I want to get the text "This is an example" only without text in other tags (p or reference). I tried several methods but does not work. Any1 can help? Big thanks.

like image 761
Jun Hou Avatar asked Jul 29 '11 09:07

Jun Hou


2 Answers

You must think a text inside a tag like a node. A text node is retrieved using the test node text(). Example. Given:

<body>
    This is an exmaple
    <p>
    blablabla
    <\p>
    <references>
        refer 1
        refer 2
    <\references>
    another example
<\body>

XPath:

"/body/text()"

Will retrieve all children text nodes of body, like "This is an exmaple" and "another example", while:

"/body/text()[1]"

will retrieve only the first one, "This is an exmaple". If you want all the descendant text nodes you can use:

"/body//text()"

or, you want all the text nodes inside first p:

"/body/p[1]//text()"
like image 88
Emiliano Poggi Avatar answered Oct 21 '22 01:10

Emiliano Poggi


Use this XPath: /body/text(). It will select This is an exmaple.

like image 44
Kirill Polishchuk Avatar answered Oct 21 '22 03:10

Kirill Polishchuk