Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract an 8 digits number from a string with additional conditions

I need to extract a number from a string with several conditions.

  1. It has to start with 1-9, not with 0, and it will have 8 digits. Like 23242526 or 65478932
  2. There will be either an empty space or a text variable before it. Like MMX: 23242526 or bgr65478932
  3. It could have come in rare cases: 23,242,526
  4. It ends with an emty space or a text variable.

Here are several examples:

  • From RE: Markitwire: 120432889: Mx: 24,693,059 i need to get 24693059

  • From Automatic reply: Auftrag zur Übertragung IRD Ref-Nr. MMX_23497152 need to get 23497152

  • From FW: CGMSE 2019-2X A1AN XS2022418672 Contract 24663537 need to get 24663537
  • From RE: BBVA-MAD MMX_24644644 + MMX_24644645 need to get 24644644, 24644645

Right now I'm using the regexextract function(found it on this web-site), which extracts any number with 8 digits starting with 2. However it would also extract a number from, let's say, this expression TGF00023242526, which is incorrect. Moreover, I don't know how to add additional conditions to the code.

=RegexExtract(A11, ""(2\d{7})\b"", ", ")

Thank you in advance.

Function RegexExtract(ByVal text As String, _
                      ByVal extract_what As String, _
                      Optional seperator As String = "") As String
Dim i As Long, j As Long
Dim result As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = extract_what
RE.Global = True
RE.IgnoreCase = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.Count - 1
    For j = 0 To allMatches.Item(i).SubMatches.Count - 1
        result = result & seperator & allMatches.Item(i).SubMatches.Item(j)
    Next
Next
If Len(result) <> 0 Then
    result = Right(result, Len(result) - Len(seperator))
End If
RegexExtract = result
End Function
like image 234
levanski Avatar asked Nov 15 '19 15:11

levanski


1 Answers

You may create a custom boundary using a non-capturing group before the pattern you have:

(?:[\D0]|^)(2\d{7})\b
^^^^^^^^^^^

The (?:[\D0]|^) part matches either a non-digit (\D) or 0 or (|) start of string (^).

like image 152
Wiktor Stribiżew Avatar answered Sep 28 '22 10:09

Wiktor Stribiżew