I have the below test file names: <pre class="prettyprint"><code>abc001_20111104_summary_123.txt abc008_200700953_timeline.txt abc008_20080402_summary200201573unitf.txt 123456.txt 100101-100102 test.txt abc008_20110902_summary200110254.txt abcd 200601141 summary.txt abc008_summary_200502169_xyz.txt </code></pre> I need to extract a number from each file name. The number must be 6, 7, 9 or 10 digits long (so, excluding 8-digit numbers). I want to get the first number, if more than one is found, or empty string if none is found. I managed to do this in a 2 steps process, first removing the 8-digit numbers, then extracting the 6 to 10 digits numbers from my list. <pre class="prettyprint"><code>step 1 regex: ([^0-9])([0-9]{8})([^0-9]) replacement: \1\3 step 2 regex: (.*?)([1-9]([0-9]{5,6}|[0-9]{8,9}))([^0-9].*) replacement: \2 </code></pre> The numbers I get after this 2 steps process are exactly what I'm looking for: <pre class="prettyprint"><code>[] [200700953] [200201573] [123456] [100101] [200110254] [200601141] [200502169] </code></pre> Now, the question is: Is there a way to do this in a one step process? I've seen this nice solution to a similar question, however, it gives me the latest number if more than one found. Note: Testing with The Regex Coach.

Assuming your regex engine supports lookbehind assertions: <pre class="prettyprint"><code>(?<!\d)\d{6}(?:\d?|\d{3,4})(?!\d) </code></pre> Explanation: <pre class="prettyprint"><code>(?<!\d) # Assert that the previous character (if any) isn't a digit \d{6} # Match 6 digits (?: # Either match \d? # 0 or 1 digits | # or \d{3,4} # 3 or 4 digits ) # End of alternation (?!\d) # Assert that the next character (if any) isn't a digit </code></pre>

RegEx to extract the first 6 to 10 digit number, excluding 8 digit numbers

Tags:

c#

regex

I have the below test file names:

abc001_20111104_summary_123.txt
abc008_200700953_timeline.txt
abc008_20080402_summary200201573unitf.txt
123456.txt
100101-100102 test.txt
abc008_20110902_summary200110254.txt
abcd 200601141 summary.txt
abc008_summary_200502169_xyz.txt

I need to extract a number from each file name.

The number must be 6, 7, 9 or 10 digits long (so, excluding 8-digit numbers).

I want to get the first number, if more than one is found, or empty string if none is found.

I managed to do this in a 2 steps process, first removing the 8-digit numbers, then extracting the 6 to 10 digits numbers from my list.

step 1 
  regex:  ([^0-9])([0-9]{8})([^0-9])
  replacement:  \1\3

step 2
  regex: (.*?)([1-9]([0-9]{5,6}|[0-9]{8,9}))([^0-9].*)
  replacement:  \2

The numbers I get after this 2 steps process are exactly what I'm looking for:

[]
[200700953]
[200201573]
[123456]
[100101]
[200110254]
[200601141]
[200502169]

Now, the question is: Is there a way to do this in a one step process?

I've seen this nice solution to a similar question, however, it gives me the latest number if more than one found.

Note: Testing with The Regex Coach.

703

asked Jul 30 '12 13:07

leoinfo

1 Answers

Assuming your regex engine supports lookbehind assertions:

(?<!\d)\d{6}(?:\d?|\d{3,4})(?!\d)

Explanation:

(?<!\d)   # Assert that the previous character (if any) isn't a digit
\d{6}     # Match 6 digits
(?:       # Either match
 \d?      # 0 or 1 digits
|         # or
 \d{3,4}  # 3 or 4 digits
)         # End of alternation
(?!\d)    # Assert that the next character (if any) isn't a digit

answered Nov 14 '22 20:11

Tim Pietzcker

Related questions
                            
                                Is it possible to define a precise font size when drawing text using GDI+?
                            
                                Access to .config files in Roslyn REPL
                            
                                Options to use multithreading to process a group of database records?
                            
                                Get an equal object from HashSet<T> in O(1)
                            
                                How do you identify the field that is causing binary serialization to fail in .NET?
                            
                                How to remove all empty XElements
                            
                                Why does LINQ-to-Entities put this query in a sub-select?
                            
                                where is op_addition in [int,float,double]
                            
                                DataGridColumn SortMemberPath on MultiBinding
                            
                                Raise an event when Property Changed using Reflection
                            
                                .NET error handling: try/catch VS event VS return-value/status-fields
                            
                                Grouping / Multiple grouping with LINQ
                            
                                How do I serialize IHtmlString to JSON with Json.NET?
                            
                                set both the HTTP Accept and Content-Type headers to "application/xml" in C#
                            
                                NLog File target and keepFileOpen flag
                            
                                String comparison to consider numbers
                            
                                How to data bind nested ListView ItemTemplates in Metro/WinRT?
                            
                                How do you assign the Assembly Location of an Excel VSTO Installation?
                            
                                How to ignore default values while serializing json with Newtonsoft.Json
                            
                                Is there a publically available table of prime numbers in .NET

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With