I have the below test file names:
abc001_20111104_summary_123.txt
abc008_200700953_timeline.txt
abc008_20080402_summary200201573unitf.txt
123456.txt
100101-100102 test.txt
abc008_20110902_summary200110254.txt
abcd 200601141 summary.txt
abc008_summary_200502169_xyz.txt
I need to extract a number from each file name.
The number must be 6, 7, 9 or 10 digits long (so, excluding 8-digit numbers).
I want to get the first number, if more than one is found, or empty string if none is found.
I managed to do this in a 2 steps process, first removing the 8-digit numbers, then extracting the 6 to 10 digits numbers from my list.
step 1
regex: ([^0-9])([0-9]{8})([^0-9])
replacement: \1\3
step 2
regex: (.*?)([1-9]([0-9]{5,6}|[0-9]{8,9}))([^0-9].*)
replacement: \2
The numbers I get after this 2 steps process are exactly what I'm looking for:
[]
[200700953]
[200201573]
[123456]
[100101]
[200110254]
[200601141]
[200502169]
Now, the question is: Is there a way to do this in a one step process?
I've seen this nice solution to a similar question, however, it gives me the latest number if more than one found.
Note: Testing with The Regex Coach.
match(/(\d{5})/g);
In regex, the uppercase metacharacter is always the inverse of the lowercase counterpart. \d (digit) matches any single digit (same as [0-9] ). The uppercase counterpart \D (non-digit) matches any single character that is not a digit (same as [^0-9] ).
Assuming your regex engine supports lookbehind assertions:
(?<!\d)\d{6}(?:\d?|\d{3,4})(?!\d)
Explanation:
(?<!\d) # Assert that the previous character (if any) isn't a digit
\d{6} # Match 6 digits
(?: # Either match
\d? # 0 or 1 digits
| # or
\d{3,4} # 3 or 4 digits
) # End of alternation
(?!\d) # Assert that the next character (if any) isn't a digit
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With