Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Looping through two lists, removing duplicates from list #2

Tags:

arrays

c#

list

I have an array of file names (aryFileNames) of a directory. I have a list of file names (lstKeepers) from a CSV file. This list is a listing of the files that are SUPPOSED to be in the directory. What I'm essentially doing is looking for orphan files in the directory.

I have redone this logic 3 times now and every time I eventually hit a bump that has me needing to rework it, so I'm just going to ask flat out how I should handle this.

My current logic is this:

List<string> lstKeepers = new List<string>(aryKeepers);
DirectoryInfo dir = new DirectoryInfo(txtMSALoc.Text);
FileInfo[] attFiles = dir.GetFiles();
//variable for testing if a keeper was found.
bool bolTest = false;
//Loop through the directory's files
foreach (FileInfo attFile in attFiles)
{
    //Loop through the list of keepers
    foreach (string lstKeeper in lstKeepers){
        if (lstKeeper == attFile.Name)
        {
            //This file is a keeper not an orphan, remove it from the list.
            // This line doesn't actually work.  Is a List the right way to go?
            lstKeepers(lstKeeper).remove;
            bolTest = true;
            break;
        }
    }
    // Code fell out of the loop, see if it found a keeper.
    if (bolTest)
    {
        bolTest=false;
    }
    else
    {
        //CODE TO MOVE FILE INTO ORPHAN DIRECTORY
    }
}

I'm dealing with potential directories (and keeper lists) of up to 2 million files, so that's the reason I want to keep shrinking the keeper list with every file it finds, so things should go faster the longer it runs.

So my first question is, is there a better way of doing this?

My next question is, are arrays and lists the best things to use? I saw something about linkedlist being better when you need to remove stuff.

Here's briefly what I tried before:

1) Looping through the directory list and keeper list at the same time: As both are numeric values for the most part file.name = 123, file.name = 124 and so on, I was just comparing the values of the file names and determining actions to take based on whether the current pair was < or > each other or =. But because of lack of natural sorting this didn't work.

2) Using just two arrays, but removing items from an array isn't practical (as I'd have to keep recreating the array).

3) (current) this seemed the way to go as I could remove items but then someone said to use LinkedLists for removing items and because I'm tired of restarting this project, that was the straw that broke the coder's back :)

Any advice is appreciated!

UPDATE: This is the final version, thanks much to everyone for your help!

            string[] aryKeepers;
            if (File.Exists("Keepers.csv"))
            {
                aryKeepers = File.ReadAllLines("Keepers.csv");
            }
            else
            {
                MessageBox.Show("Cannot find 'Keepers.csv' file.", "Missing CSV File Error", MessageBoxButtons.OK, MessageBoxIcon.Exclamation);
                aryKeepers = null;
                return;
            }
            List<string> lstKeepers = new List<string>(aryKeepers);
            DirectoryInfo dir = new DirectoryInfo(txtMSALoc.Text);
            FileInfo[] attFiles = dir.GetFiles();
            List<string> lstOrphans = attFiles
                                        .Where(x => !lstKeepers.Contains(x.Name))
                                        .Select(x => x.Name)
                                        .ToList();
            if (lstOrphans.Count > 0){
                intOrphan = lstOrphans.Count;
                lstOrphans.ForEach (lstOrphan => {
                    string strOldFile = txtMSALoc.Text + @"\" + lstOrphan;
                    string strNewFile = dirOrphan + lstOrphan;
                    File.Move(strOldFile, strNewFile);
                });
            }
like image 848
Charles B Hamlyn Avatar asked Dec 21 '25 02:12

Charles B Hamlyn


1 Answers

Why not just

List<string> orphans = new List<string>();

// Enumerate files in directory
foreach(string file in attFiles)
{
    // If the filename isn't in the keepers list it must be 
    // an orphan, add it to the orphans list
    if(!lstKeepers.Contains(file.Name))
        orphans.Add(file.Name);
}

Then afterwards

foreach(string orphanedFile in orphans)
{ 
    // Move the file
}

I don't think it will be amazingly performant but you didn't mention performance issues - just that you couldn't get the logic right

I might also add that trying to remove items from a list whilst enumerating it (i.e. the foreach loop) will cause a runtime exception, which might be one of the issues you are encountering

Edit: Just for fun and because everyone loves linq (and because gunr2171 suggested it)

List<string> orphans = attFiles
                            .Where(x => !lstKeepers.Contains(x.Name))
                            .Select(x => x.Name)
                            .ToList();

Then you could

orphans.ForEach(orphan => { // Do something });
like image 92
Charleh Avatar answered Dec 22 '25 16:12

Charleh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!