Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex - Extract a substring from a given string

I have a string here, This is a string: AAA123456789.

So the idea here is to extract the string AAA123456789 using regex.

I am incorporating this with X-Path.

Note: If there is a post to this, kindly lead me to it.

I think, by right, I should substring(myNode, [^AAA\d+{9}]),

I am not really sure bout the regex part.

The idea is to extract the string when met with "AAA" and only numbers but 9 consequent numbers only.

like image 245
Vincent Avatar asked Sep 20 '12 06:09

Vincent


People also ask

How do you substring in regex?

You can use regular expression matching to retrieve a single substring or all substrings from a string. To extract a single substring, you use the RExtract function. The RExtract function takes a string and a regular expressions pattern as input parameters.

How do you extract a substring from a string in Python regex?

Use re.search() to extract a substring matching a regular expression pattern. Specify the regular expression pattern as the first parameter and the target string as the second parameter. \d matches a digit character, and + matches one or more repetitions of the preceding pattern.

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.


2 Answers

Pure XPath solution:

substring-after('This is a string: AAA123456789', ': ')

produces:

AAA123456789

XPath 2.0 solutions:

tokenize('This is a string: AAA123456789 but not an double',
              ' '
              )[starts-with(., 'AAA')]

or:

tokenize('This is a string: AAA123456789 but not an double',
              ' '
              )[matches(., 'AAA\d+')]

or:

replace('This is a string: AAA123456789 but not an double',
              '^.*(A+\d+).*$',
              '$1'
              )
like image 157
Dimitre Novatchev Avatar answered Oct 10 '22 11:10

Dimitre Novatchev


Alright, after referencing answers and comments by wonderful people here, I summarized my findings with this solution which I opted for. Here goes,

concat("AAA", substring(substring-after(., "AAA"), 1, 9)).

So I firstly, substring-after the string with "AAA" as the 1st argument, with the length of 1 to 9...anything more, is ignored. Then since I used the AAA as a reference, this will not appear, thus, concatenating AAA to the front of the value. So this means that I will get the 1st 9 digits after AAA and then concat AAA in front since its a static data.

This will allow the data to be correct no matter what other contributions there is.

But I like the regex by @Dimitre. The replace part. The tokenize not so as what if there isn't space as the argument. The replace with regex, this is also wonderful. Thanks.

And also thanks to you guys out there to...

like image 32
Vincent Avatar answered Oct 10 '22 12:10

Vincent