Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# - Regex - Matching file names according to a specific naming pattern

Tags:

c#

regex

I have an application which needs to find and then process files which follow a very specific naming convention as follows.

IABC_12345-0_YYYYMMDD_YYYYMMDD_HHMMSS.zip

I cant see any easy way of doing this using a search pattern so Im assuming Ill have to do something like this after I have generated a list of files using a simpler wildcard pattern.

RegEx re = new RegEx("blah");

foreach(FileInfo fi in Directory.GetFiles(path, "I*.zip"))
{
    if(re.IsMatch(fi.Name))
       //blah blah blah
}

Is this the best way of doing this, and if so, how would I form a regular expression to match this file format?

like image 241
Andrew Avatar asked Oct 21 '09 14:10

Andrew


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is the full name of C?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr. Stroustroupe.

Is C language easy?

C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.

How old is the letter C?

The letter c was applied by French orthographists in the 12th century to represent the sound ts in English, and this sound developed into the simpler sibilant s.


3 Answers

    string pattern = @"I[A-Z]{3}_\d{5}-\d_\d{8}_\d{8}_\d{6}\.zip";
    var matches = Directory.GetFiles(@"c:\temp")
        .Where(path => Regex.Match(path, pattern).Success);

    foreach (string file in matches)
        Console.WriteLine(file); // do something
like image 112
xcud Avatar answered Oct 03 '22 19:10

xcud


It depends on how specific you want to match those names. Is this specific enough:

I[A-Z]{3}_\d{5}-\d_\d{8}_\d{8}_\d{6}\.zip

?

Explanation:

I             // match an 'I'
[A-Z]{3}      // followed by three upper case letters
_             // followed by an underscore
\d{5}         // followed by five digits
-             // followed by a hyphen
\d            // followed by a single digit
_             // followed by an underscore
\d{8}         // followed by eight digits
_             // followed by an underscore
\d{8}         // followed by eight digits
_             // followed by an underscore
\d{6}         // followed by six digits
\.zip         // followed by '.zip'

But, if you have files whose names contain invalid dates or times, it cannot practically be done with regex alone, especially if your DATE_DATE part specifies a date range. You will have to match all file names like I (and others) have shown you, and then perform some "regular" programming logic to filter out the invalid ones.

like image 35
Bart Kiers Avatar answered Oct 03 '22 17:10

Bart Kiers


For a simple regular expression that will also match invalid time specifications (ie. hours=73 etc.), you could use something like this:

^I[A-Z]{3}_\d{5}-\d_\d{8}_\d{8}_\d{6}\.zip$
like image 27
Lasse V. Karlsen Avatar answered Oct 03 '22 17:10

Lasse V. Karlsen