Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you extract text matching a pattern in XPATH?

I have data that looks like this:

<value>v13772   @FBst0451145:w&lt;up&gt;1118&lt;/up&gt;; P{GD3649}v13772@
v13773  @FBst0451146:w&lt;up&gt;1118&lt;/up&gt;; P{GD3649}v13773@</value>

How can I process this string in XPATH to extract any and all @FBst####### numbers?

I know of the xpath matches() function... but that only returns true or false. No good if I want the matching string. I've searched around but cannot find a satisfactory answer to this problem, which is probably really common.

Thanks!

like image 355
JD. Avatar asked Aug 01 '12 20:08

JD.


3 Answers

In addition to the good answer by Michael Kay, if you want to use only the replace() function, then use:

replace(.,'.*?(@FBst\d+).*','$1')

The result is:

@FBst0451145
@FBst0451146

And if you only want the numbers from the above result, use:

replace(replace(.,'.*?(@FBst\d+).*','$1'),
          '[^0-9]+', ' ')

This produces:

 0451145 0451146
like image 87
Dimitre Novatchev Avatar answered Nov 22 '22 07:11

Dimitre Novatchev


I Assume you can also use XQuery. The get_matches() function from the FunctX module should work for you. Download the file which supports your version of XQuery. Then import the module whenever you need its functionality.

import module namespace functx = "http://www.functx.com" at "functx-1.0-doc-2007-01.xq";

functx:get-matches(string-join(//text()),'xyz')
like image 27
Sicco Avatar answered Nov 22 '22 07:11

Sicco


Try

tokenize(value, '[^0-9]+')

which should return the sequence of tokens separated by sequences of non-digits.

like image 39
Michael Kay Avatar answered Nov 22 '22 06:11

Michael Kay