Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract fixed length number from Excel cell

Some similarly-named threads for this, but still couldn't solve my problem. I need to extract a fixed-length NUMBER value from an Excel string (8 digits in my scenario). Following Excel formula was provided for this purpose:

=MID(A1,FIND("--------",SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"0","-"),"1","-"),"2","-"),"3","-"),"4","-"),"5","-"),"6","-"),"7","-"),"8","-"),"9","-")),8)

It does the job, however I have two issues with this:

  1. Most crucially - I'm looking for an exact match. While it does extract the first 8-digit sequence it finds, I'm really after only 8-digit numbers, meaning that 9-digit (or longer) numbers should be ignored (as 7-digit numbers already are). This formula also extracts first 8 digits from a longer number.

  2. Less important, but would be great to only look for numbers starting with 1. So, really just trying to extract this: 1??????? as a numeric value. So something like "a12891212a" or "a 12891212 a" should be extracted, meanwhile 128912120a or 23456789 should not.

If reasonably doable, I'd prefer an Excel formula-based approach compared to VBA. Any help is much appreciated!

like image 842
dotsent12 Avatar asked Dec 13 '22 08:12

dotsent12


2 Answers

This could be done through formula quite alright, but all depends on your Excel version:

enter image description here


1) Excel 2016, you could still use a formula:

Formula in B1:

=IFERROR(MID(A1,MAX((MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1)="1")*(ISNUMBER(--MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),8)))*(NOT(ISNUMBER(--MID(A1,ROW(A$1:INDEX(A:A,LEN(A1)))+8,1))))*(NOT(ISNUMBER(--MID(A1,ROW(A$1:INDEX(A:A,LEN(A1)))-1,1))))*(ROW(A$1:INDEX(A:A,LEN(A1))))),8),"Nothing found")

Note: This is an array formula and needs to be confirmed through CtrlShiftEnter


2) Excel 2019, using CONCAT() and FILTERXML():

Formula in B1:

=IFERROR(FILTERXML("<t><s>"&CONCAT(IF(ISNUMBER(--MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1)),MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1),"</s><s>"))&"</s></t>","//s[starts-with(., '1') and string-length(.) =8]"),"Nothing Found")

Note: This is an array formula and needs to be confirmed through CtrlShiftEnter


3) Excel 365, using previous mentioned functions but including SEQUENCE():

Formula in B1:

=IFERROR(FILTERXML("<t><s>"&LET(X,MID(A1,SEQUENCE(LEN(A1)),1),CONCAT(IF(ISNUMBER(--X),X,"</s><s>")))&"</s></t>","//s[starts-with(., '1') and string-length(.) =8]"),"Nothing Found")

The XPATH part of the formulas take care of the actual query, looking for strings that start with a '1' and are of a total length of '8'. This would then even work with strings like 'abc123456789abc12345678abc29876543' returning '12345678'.

If you enjoy FILTERXML and XPATH, then you might find this interesting.


4) Excel 365, insiders edition (time of writing) using TEXTSPLIT():

=LET(X,MID(A1,SEQUENCE(LEN(A1)),1),Y,TEXTSPLIT(A1,IF(ISNUMBER(--X)," ",X),,1),FILTER(Y,(--LEFT(Y)=1)*(LEN(Y)=8),"Nothing Found"))

5) VBA: For if you must use VBA, I guess an UDF is a good option. Something like:

Function GetStr(str As String, pat As String) As String

With CreateObject("vbscript.regexp")
    .Pattern = pat
    .Global = True
    If .Test(str) = True Then
        GetStr = .Execute(str)(0).Submatches(0)
    Else
        GetStr = "Nothing found"
    End If
End With

End Function

You can call this in B1 as per =GetStr(A1,"(?:^|\D)(1\d{7})(?:\D|$)"). This is making use of a regular expression. If you are interested and want to learn more then this is an interesting read for you.

I left the pattern outside the UDF on purpose might you ever want to change it up. The current pattern can be seen in this online Demo, where from left to right the engine will look for:

  • (?: - 1st Non-capturing group
    • ^|\D - Either a start string ancor or anything other than a digit.
    • ) - Close 1st non-capturing group.
  • ( - 1st capture group.
    • 1\d{7} - Search for a literal 1 followed by 7 digits.
    • ) - Close 1st capture group.
  • (?: - 2nd Non-capturing group
    • \D|$ - Either anything other than a digit or an end string ancor.
    • ) - Close 2nd non-capturing group.

enter image description here

like image 125
JvdV Avatar answered Dec 31 '22 11:12

JvdV


Here is a simple User Defined Function that looks for sub-strings that are numerals. It creates an array of the sub-strings. It then looks for an element of that array that has length 8 and a leading character of 1:

Option Explicit

Public Function NineD(s As String) As String
    Dim L As Long, temp As String, wf As WorksheetFunction
    Dim i As Long, arr, a
    
    Set wf = Application.WorksheetFunction
    temp = s
    L = Len(s)
    
    For i = 1 To L
        If Mid(temp, i, 1) Like "[0-9]" Then
        Else
            temp = wf.Replace(temp, i, 1, " ")
        End If
    Next i
    
    arr = Split(wf.Trim(temp), " ")
    For Each a In arr
        If Len(a) = 8 And Left(a, 1) = "1" Then
            NineD = a
            Exit Function
        End If
    Next a
End Function

enter image description here

User Defined Functions (UDFs) are very easy to install and use:

  1. ALT-F11 brings up the VBE window
  2. ALT-I ALT-M opens a fresh module
  3. paste the stuff in and close the VBE window

If you save the workbook, the UDF will be saved with it. If you are using a version of Excel later then 2003, you must save the file as .xlsm rather than .xlsx

To remove the UDF:

  1. bring up the VBE window as above
  2. clear the code out
  3. close the VBE window

To use the UDF from Excel:

=NineD(A1)

To learn more about macros in general, see:

http://www.mvps.org/dmcritchie/excel/getstarted.htm

and

http://msdn.microsoft.com/en-us/library/ee814735(v=office.14).aspx

and for specifics on UDFs, see:

http://www.cpearson.com/excel/WritingFunctionsInVBA.aspx

Macros must be enabled for this to work!

like image 33
Gary's Student Avatar answered Dec 31 '22 12:12

Gary's Student