I have a string here, <code>This is a string: AAA123456789</code>. So the idea here is to extract the string <code>AAA123456789</code> using regex. I am incorporating this with X-Path. Note: If there is a post to this, kindly lead me to it. I think, by right, I should <code>substring(myNode, [^AAA\d+{9}])</code>, I am not really sure bout the regex part. The idea is to extract the string when met with "AAA" and only numbers but 9 consequent numbers only.

Pure XPath solution: <pre class="prettyprint"><code>substring-after('This is a string: AAA123456789', ': ') </code></pre> produces: <pre class="prettyprint"><code>AAA123456789 </code></pre> XPath 2.0 solutions: <pre class="prettyprint"><code>tokenize('This is a string: AAA123456789 but not an double', ' ' )[starts-with(., 'AAA')] </code></pre> or: <pre class="prettyprint"><code>tokenize('This is a string: AAA123456789 but not an double', ' ' )[matches(., 'AAA\d+')] </code></pre> or: <pre class="prettyprint"><code>replace('This is a string: AAA123456789 but not an double', '^.*(A+\d+).*$', '$1' ) </code></pre>

Regex - Extract a substring from a given string

2 Answers

Pure XPath solution:

substring-after('This is a string: AAA123456789', ': ')

produces:

AAA123456789

XPath 2.0 solutions:

tokenize('This is a string: AAA123456789 but not an double',
              ' '
              )[starts-with(., 'AAA')]

or:

tokenize('This is a string: AAA123456789 but not an double',
              ' '
              )[matches(., 'AAA\d+')]

or:

replace('This is a string: AAA123456789 but not an double',
              '^.*(A+\d+).*$',
              '$1'
              )

157

answered Oct 10 '22 11:10

Dimitre Novatchev

Alright, after referencing answers and comments by wonderful people here, I summarized my findings with this solution which I opted for. Here goes,

concat("AAA", substring(substring-after(., "AAA"), 1, 9)).

So I firstly, substring-after the string with "AAA" as the 1st argument, with the length of 1 to 9...anything more, is ignored. Then since I used the AAA as a reference, this will not appear, thus, concatenating AAA to the front of the value. So this means that I will get the 1st 9 digits after AAA and then concat AAA in front since its a static data.

This will allow the data to be correct no matter what other contributions there is.

But I like the regex by @Dimitre. The replace part. The tokenize not so as what if there isn't space as the argument. The replace with regex, this is also wonderful. Thanks.

And also thanks to you guys out there to...

answered Oct 10 '22 12:10

Vincent

Related questions
                            
                                Delphi TRegEx backreference broken?
                            
                                How to split a String based on first occurence?
                            
                                Capturing Quantifiers and Quantifier Arithmetic
                            
                                Regex to match only two specific words, e.g. Yes or No
                            
                                Regex Pattern Catastrophic backtracking
                            
                                Backreference does not work in PHP
                            
                                Segmentation fault in std::transform
                            
                                Regular expression for file extensions in Java
                            
                                remove emoji from string in R
                            
                                ansible regex_search with variable
                            
                                Getting PEP8 "invalid escape sequence" warning trying to escape parentheses in a regex
                            
                                jQuery US Currency validation regEx to allow whole numbers as well
                            
                                awk extract multiple groups from each line
                            
                                findstr.exe is not working
                            
                                How to NOT match a word in mod_rewrite
                            
                                Regular expression for recognizing in-text citations
                            
                                Regex created via new RegExp(myString) not working (backslashes)
                            
                                Find specific link w/ beautifulsoup
                            
                                PHP preg_match bible scripture format
                            
                                Positive lookahead to match '/' or end of string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regex - Extract a substring from a given string

Tags:

substring

regex

xpath

Vincent

People also ask

2 Answers

Dimitre Novatchev

Vincent

Recent Activity

Donate For Us