Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loop optimization within a nested loop c#

I am comparing two lists of data that were generated from a binary file. I have a good idea on why it's running slow, when there's a significant amount of records, it does un-needed redundant work.

For example, if a1 = a1, condition is true. Since 2a != 1a so why even bother checking it? I need to eliminate 1a from being checked again. If I don't, it will check the first record when it goes to check the 400,000th record. I thought about making the second for loop a foreach, but I can't remove 1a while iterating through the nested loop

The amount of items that can be in either 'for loop' can vary. I don't think a single for loop using 'i' will work since the match can be anywhere. I'm reading from a binary file

This is my current code. Program has been running for over an hour, and it's still going. I removed a lot of my iterating code for readability reasons.

   for (int i = 0; i < origItemList.Count; i++)
    {
        int modFoundIndex = 0;
        Boolean foundIt = false;
        for (int g = 0; g < modItemList.Count; g++)
        {
            if ((origItemList[i].X == modItemList[g].X)
                && (origItemList[i].Y == modItemList[g].Y)
                && (origItemList[i].Z == modItemList[g].Z)
                && (origItemList[i].M == modItemList[g].M))
            {

            foundIt = true;
            modFoundIndex = g;
            break;

            }
            else
            {
                foundIt = false;
            }

        }
        if (foundIt)                         
        {
            /*
             * This is run assumming it finds an x,y,z,m 
             coordinate. It thenchecks the database file.
             * 
             */
            //grab the rows where the coordinates match 
            DataRow origRow = origDbfFile.dataset.Tables[0].Rows[i];
            DataRow modRow = modDbfFile.dataset.Tables[0].Rows[modFoundIndex];

            //number matched indicates how many columns were matched
            int numberMatched = 0;

            //get the number of columns to match in order to detect all changes
            int numOfColumnsToMatch = origDbfFile.datatable.Columns.Count;

            List<String> mismatchedColumns = new List<String>();

            //check each column name for a change
            foreach (String columnName in columnNames)
            {
                //this grabs whatever value is in that field                            
                String origRowValue = "" + origRow.Field<Object>(columnName);
                String modRowValue = "" + modRow.Field<Object>(columnName);

                //check if they are the same
                if (origRowValue.Equals(modRowValue))
                {
                    //if they aren the same, increase the number matched by one
                    numberMatched++;
                    //add the column to the list of columns that don't match

                }
                else
                {
                    mismatchedColumns.Add(columnName);
                }

            }
            /* In the event it matches 15/16 columns, show the change */
            if (numberMatched != numOfColumnsToMatch)
            {
                //Grab the shapeFile in question
                Item differentAttrShpFile = origItemList[i];
                //start blue highlighting
                result += "<div class='turnBlue'>";
                //show where the change was made at
                result += "Change Detected at<br/> point X: " +
                differentAttrShpFile.X + ",<br/> point Y: " +
                    differentAttrShpFile.Y + ",<br/>";
                result += "</div>"; //end turnblue div
                foreach (String mismatchedColumn in mismatchedColumns)
                {
                    //iterate changes here

                }

            }

        }

    }
like image 827
Evan Parsons Avatar asked Jun 02 '26 08:06

Evan Parsons


2 Answers

You're coming at this in a totally wrong way. The loop you have is O(n^2), breaking when you find the match will on average cut the time in half for a hit, that's not enough. If you have a quarter million items in the list then this loop executes 62 billion times and even if the compiler optimizes out the extra array lookups you're still looking at at least a trillion instructions. You don't do O(n^2) for large n if you can possibly help it!

What you need to do is get rid of the O(n^2) aspect of this. My suggestion:

1) Define a hashing function that looks at the x, y, z & m and comes up with an integer value, my inclination would be to use one that's the wordsize of your target platform.

2) Iterate over both lists, compute hashes for everything.

3) Build an index to one of the tables, hash and the object. I suspect a dictionary is the best data structure here but a simple sorted array would also do.

4) Iterate over the list you didn't build the index over, compare the hashes to the entries in the index. If it's a hash that's an O(n) task, if it's a sorted array it's O(n log n).

5) When the hashes match do a full comparison to confirm the hit is real as you will get the occasional collision with a good 64-bit hash and you'll get a decent number of them if your hashes are 32-bit.

like image 199
Loren Pechtel Avatar answered Jun 05 '26 00:06

Loren Pechtel


This is something similar to Loren said but below is in language of .NET :)
1. Override GetHashCode method to return sum of x,y,z and m. Override Equals method to check for this sum.
2. Iterate and create HashSet from modItemList (List) before loop.
3. In inner loop, first check if origItemList[i] exists in HashSet using YourModHashSet.Contains(MyObject) method.
4. If .Contains return you false, carry one, no match.
5. If .Contains return you true, iterate thru entire modItemList and apply your current logic of checking for x,y,z and m for entire list. Note that here you should use List as hash table might eat up many objects for which hash code is same.

Also, I would use Foreach instead of For because I've seen Foreach giving little better results (5 to 30% faster) in such case.

Update:

I created MyObject class like below:

 public class MyObject
 {
    public int X, Y, Z, M;
    public override int GetHashCode()
    {
          return X*10000 + Y*100 + Z*10 + M;
    }

    public override bool Equals(object obj)
    {
        return (obj.GetHashCode() == this.GetHashCode());
    }
}

GetHashCode method is important here. We don't want many false positives. False positive occurs when Hash matches for some other combination of X, Y, Z and M. Best way to prevent false positive is to multiply each member such that each will impact one decimal place in HashCode. Note that you should consider not exceeding Int.Max value. If the expected value of X,Y,Z and M are small you should be good.

set2.Clear();
s1 = DateTime.Now;
MyObject matchingElement;
totalmatch = 0;

foreach (MyObject elem in list2)
set2.Add(elem);
foreach (MyObject t1 in list1)
{
if (set2.Contains(t1))
{
    matchingElement = null;
    foreach (MyObject t2 in list2)
    {
        if (t1.X == t2.X && t1.Y == t2.Y && t1.Z == t2.Z && t1.M == t2.M)
        {
            totalmatch++;
            matchingElement = t2;
            break;
        }
    }
    //Do Something on matchingElement if not null
}
}
Console.WriteLine("set foreach with contains: " + (DateTime.Now - s1).TotalSeconds + "\t Total Match: " + totalmatch);

Above is sample code that I was trying to describe in my answer. This code should work super fast if matches are expected to be less.

like image 21
Yogee Avatar answered Jun 04 '26 23:06

Yogee