Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linq in large lists

Tags:

c#

linq

wpf

I have two custom classes Grid, and Element:

public class Grid
{
    public double ID { get; set; }

    public double X { get; set; }
    public double Y { get; set; }
    public double Z { get; set; }

    public MyClass MoreProperties {get; set;}

    public Grid(int id, double x, double y, double z)
    {
        this.ID = id;
        this.X = x;
        this.Y = y;
        this.Z = z;
    }
}

Element:

public abstract class Element
{
    public  int ID { get; set; }

    public int NumberOfGrids { get; set; }

    public List<Grid> Grids { get; set; } //4 Grids in this case

    public Element()
    {
        Grids = new List<Grid>();              
    }
}

To illustrate the situation please see this picture:

enter image description here

There is a container for that classes called Data:

class Data : ModelBase
{
    public List<Grid> Grids{ get; set; }

    public List<Element> Elements { get; set; }
}

I read text files in which there are a lot of data: Grids and Elements This is the format for grids (simplified):

GRID ID X Y Z

And for the element

ELEMENT ID GRID1 GRID2 GRID3 GRID4

So, the GRID entry provides the position and ID of a grid point and ELEMENT provides the ID of grids of that element and it's own ID.

What I want is to associate for each element all 4 grids, this way I will have the coordinates of each grid inside of the element object.

For that I read the file two times (because the element entry comes before grid and to simplify things): the first time I read it I fill the Grids list (from the Data class). The second time I fill the Elements list and do more stuff. When I fill the Elements list I can only fill the ID of the associated Grid.

If you've read till here we have this class Data that contains two lists of Grid and Elements.

For the association I've come up with this method:

public void AsociateGridsToElements()
{
    foreach (Element elem in Elements)
    {
        for (int i = 0; i < elem.Grids.Count; i++)
        {
            elem.Grids[i] = Grids.Where(g => g.ID == elem.Grids[i].ID).FirstOrDefault();
        }
    }
}

It loops through each element and then through each grid of that element (4 in this case), then it looks in the whole list of grids what grid has the same ID. When it founds the first, it assigns that grid, this way the element have the "complete" Grid object, not the one with only ID filled (because that it's the only thing I can get when I read the file).

Here comes the problem: these files are pretty big: around 20 000 grid points and 10 000 elements, if I loop for each element looking each time the whole collection of grids (4 times) it is: 20 000 x 10 000 = 200 000 000 operations. So the computer can't handle it, and I think it must be improved.

Could anyone give a hint or help me to optimize this problem? Thank you.

like image 698
Sturm Avatar asked Feb 27 '14 15:02

Sturm


1 Answers

If the IDs of each Grid object is guaranteed to be unique, I would start by creating a dictionary of Grid objects, with the ID as the key in the dictionary. Then the lookup of a populated Grid during the enumeration of Elements will only require a dictionary lookup instead of a new enumeration of the Grids list.

public void AsociateGridsToElements()
{
    var gridLookup = Grids.ToDictionary(grid => grid.ID);

    foreach (Element elem in Elements)
    {
        for (int i = 0; i < elem.Grids.Count; i++)
        {
            Grid fullyPopulatedGrid;
            if (gridLookup.TryGetValue(elem.Grids[i].ID, out fullyPopulatedGrid))
            {
                elem.Grids[i] = fullyPopulatedGrid;
            }
            else
            {
                // Unable to locate Grid Element
            }
        }
    }
}

Creating the dictionary lookup dramatically improves performance in this case because it prevents additional enumerations of the Grids list.

The code above performs the following operations (based on your estimates):

  1. Enumerate all Grid items and create a key value pair for each item. (about 20,000 steps)
  2. Enumerate all Element items (about 10,000 steps)
  3. Enumerate each partial Grid in that Element (call it 4 steps)
  4. Perform lookup on dictionary to find properly populated Grid (1 Hash lookup)

The total steps here is about 20,000 + (10,000 * 4) * 2 (1 Hash lookup per Element / Grid) = 100,000 steps

Your original code performs the following operations:

  1. Enumerate all Element items (about 10,000 steps)
  2. Enumerate each partial Grid in that Element (call it 4 steps)
  3. Enumerate all populated Grid items (about 20,000 steps) to find the first match (this requires a separate iteration for each Element / Grid combination)

The total steps here is about 10,000 * 4 * 20,000 = 800,000,000 steps

like image 115
davisoa Avatar answered Oct 06 '22 01:10

davisoa