Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How might I complete this example using LINQ and string parsing?

I'm trying to write a simple program that will compare the files in separate folders. I'm currently using LINQ to Objects to parse the folder and would like to included information extracted from the string in my result set as well.

Here's what I have so far:

FileInfo[] fileList = new DirectoryInfo(@"G:\Norton Backups").GetFiles();

var results = from file in fileList
              orderby file.CreationTime
              select new { file.Name, file.CreationTime, file.Length };

foreach (var x in results)
    Console.WriteLine(x.Name);

This produces:

AWS025.sv2i
AWS025_C_Drive038.v2i
AWS025_C_Drive038_i001.iv2i
AWS025_C_Drive038_i002.iv2i
AWS025_C_Drive038_i003.iv2i
AWS025_C_Drive038_i004.iv2i
AWS025_C_Drive038_i005.iv2i    
...

I would like to modify the LINQ query so that:

  • It only includes actual "backup" files (you can tell the backup files because of the _C_Drive038 in the examples above, though 038 and possibly the drive letter could change).
  • I want to include a field if the file is the "main" backup file (i.e., it doesn't have _i0XX at the end of the file name).
  • I want to include the "image number" of the file (e.g. in this case it's 038).
  • I want to include the increment number if it's an incrememnt of a base file (e.g. 001 would be an increment number)

I believe the basic layout of the query would look like the following, but I'm not sure how best to complete it (I've got some ideas for how some of this might be done, but I'm interested to heard how others might do it):

var results = from file in fileList
              let IsMainBackup = \\ ??
              let ImageNumber = \\ ??
              let IncrementNumber = \\ ??
              where \\ it is a backup file.
              orderby file.CreationTime
              select new { file.Name, file.CreationTime, file.Length, 
                           IsMainBackup, ImageNumber, IncrementNumber };

When looking for the ImageNumber and IncrementNumber, I would like to assume that the location of this data is not always fixed, meaning, I'd like to know of a good way to parse this (If this requires RegEx, please explain how I might use it).

NOTE: Most of my past experience in parsing text involved using location-based string functions, such as LEFT, RIGHT, or MID. I'd rather not fall back on those if there is a better way.

like image 486
Ben McCormack Avatar asked Dec 23 '09 21:12

Ben McCormack


People also ask

Can you use LINQ on a string?

LINQ can be used to query and transform strings and collections of strings. It can be especially useful with semi-structured data in text files. LINQ queries can be combined with traditional string functions and regular expressions.

What is LINQ when and how would you use it?

LINQ to objects – Allows querying in-memory objects like arrays, lists, generic list and any type of collections. LINQ to XML – Allows querying the XML document by converting the document into XElement objects and then querying using the local execution engine.

What is LINQ why is it used give an example for the same?

LINQ applies the principles of object-oriented programming to relational data. It provides a unified programming model for querying data from different types of data sources, and extends data capabilities directly into the C# and Visual Basic languages. For more information, see Language-Integrated Query (LINQ).

What is LINQ in which scenarios it should be used?

LINQ is a data querying API that provides querying capabilities to . NET languages with a syntax similar to a SQL. LINQ queries use C# collections to return data. LINQ in C# is used to work with data access from sources such as objects, data sets, SQL Server, and XML. LINQ stands for Language Integrated Query.


2 Answers

Using regular expressions:

    Regex regex = new Regex(@"^.*(?<Backup>_\w_Drive(?<ImageNumber>\d+)(?<Increment>_i(?<IncrementNumber>\d+))?)\.[^.]+$");
    var results = from file in fileList
                  let match = regex.Match(file.Name)
                  let IsMainBackup = !match.Groups["Increment"].Success
                  let ImageNumber = match.Groups["ImageNumber"].Value
                  let IncrementNumber = match.Groups["IncrementNumber"].Value
                  where match.Groups["Backup"].Success
                  orderby file.CreationTime
                  select new { file.Name, file.CreationTime, file.Length,
                               IsMainBackup, ImageNumber, IncrementNumber };

Here is a description of the regular expression:

^                   Start of string.
.*                  Allow anything at the start.
(?<Backup>...)      Match a backup description (explained below).
\.                  Match a literal period.
[^.]+$              Match the extension (anything except periods).
$                   End of string.

Backup is:

_\w_Drive           A literal underscore, any letter, another underscore, then the string "Drive".
(?<ImageNumber>\d+) At least one digit, saved as ImageNumber.
(?<Increment>...)?  An optional increment description.

Increment is:

_i                      A literal underscore, then the letter i.
(?<IncrementNumber>\d+) At least one digit, saved as IncrementNumber.

Here is the test code I used:

using System;
using System.IO;
using System.Text.RegularExpressions;
using System.Linq;

class Program
{
    static void Main(string[] args)
    {
        FileInfo[] fileList = new FileInfo[] {
            new FileInfo("AWS025.sv2i"),
            new FileInfo("AWS025_C_Drive038.v2i"),
            new FileInfo("AWS025_C_Drive038_i001.iv2i"),
            new FileInfo("AWS025_C_Drive038_i002.iv2i"),
            new FileInfo("AWS025_C_Drive038_i003.iv2i"),
            new FileInfo("AWS025_C_Drive038_i004.iv2i"),
            new FileInfo("AWS025_C_Drive038_i005.iv2i")
        };

        Regex regex = new Regex(@"^.*(?<Backup>_\w_Drive(?<ImageNumber>\d+)(?<Increment>_i(?<IncrementNumber>\d+))?)\.[^.]+$");
        var results = from file in fileList
                      let match = regex.Match(file.Name)
                      let IsMainBackup = !match.Groups["Increment"].Success
                      let ImageNumber = match.Groups["ImageNumber"].Value
                      let IncrementNumber = match.Groups["IncrementNumber"].Value
                      where match.Groups["Backup"].Success
                      orderby file.CreationTime
                      select new { file.Name, file.CreationTime,
                                   IsMainBackup, ImageNumber, IncrementNumber };

        foreach (var x in results)
        {
            Console.WriteLine("Name: {0}, Main: {1}, Image: {2}, Increment: {3}",
                x.Name, x.IsMainBackup, x.ImageNumber, x.IncrementNumber);
        }
    }
}

And here is the output I get:

Name: AWS025_C_Drive038.v2i, Main: True, Image: 038, Increment:
Name: AWS025_C_Drive038_i001.iv2i, Main: False, Image: 038, Increment: 001
Name: AWS025_C_Drive038_i002.iv2i, Main: False, Image: 038, Increment: 002
Name: AWS025_C_Drive038_i003.iv2i, Main: False, Image: 038, Increment: 003
Name: AWS025_C_Drive038_i004.iv2i, Main: False, Image: 038, Increment: 004
Name: AWS025_C_Drive038_i005.iv2i, Main: False, Image: 038, Increment: 005
like image 119
Mark Byers Avatar answered Nov 14 '22 23:11

Mark Byers


It was a bit of fun working out a good answer for this one :)

The below piece of code gives you what you need. Note the use of the search pattern when retrieving the files - there is no point retrieving more files than necessary. Also notice the use of the parseNumber() function, this was just to show you how to change the string result from the regex to a number should you need it in that format.

static class Program
{
    [STAThread]
    static void Main()
    {
        Application.EnableVisualStyles();
        Application.SetCompatibleTextRenderingDefault(false);
        //Application.Run(new Form1());

        GetBackupFiles(@"c:\temp\backup files");
    }

    static void GetBackupFiles(string path)
    {
        FileInfo[] fileList = new DirectoryInfo(path).GetFiles("*_Drive*.*v2i");

        var results = from file in fileList
                      orderby file.CreationTime
                      select new 
                      {  file.Name
                        ,file.CreationTime
                        ,file.Length 
                        ,IsMainBackup = file.Extension.ToLower() == ".v2i"
                        ,ImageNumber = Regex.Match(file.Name, @"drive([\d]{0,5})", RegexOptions.IgnoreCase).Groups[1]
                        ,IncrementNumber = parseNumber( Regex.Match(file.Name, @"_i([\d]{0,5})\.iv2i", RegexOptions.IgnoreCase).Groups[1])
                      };

        foreach (var x in results)
            Console.WriteLine(x.Name);
    }

    static int? parseNumber(object num)
    {
        int temp;
        if (num != null && int.TryParse(num.ToString(), out temp))
            return temp;
        return null;
    }
}

Note that with the regexs i am assuming some consistency in the file names, if they were to deviate from the format you mentioned then you would have to adjust them.

like image 33
slugster Avatar answered Nov 14 '22 21:11

slugster