Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create an efficient data structure for two-dimensional LINQ operations?

Problem: I have 2 kinds of objects, lets call them Building and Improvement. There are roughly 30 Improvement instances, while there can be 1-1000 Buildings. For each combination of Building and Improvement, I have to perform some heavy calculation, and store the result in a Result object.

Both Buildings and Improvements can be represented by an integer ID.

I then need to be able to:

  • Access the Result for a given Building and Improvement efficiently (EDIT: see comment further down)
  • Perform aggregations on the Results for all Improvements for a given Building, like .Sum() and .Average()
  • Perform the same aggregations on the Results for all Buildings for a given Improvement

This will happen on a web-server back-end, so memory may be a concern, but speed is most important.

Thoughts so far:

  1. Use a Dictionary<Tuple<int, int>, Result> with <BuildingID, ImprovementID> as key. This should give me speedy inserts and single lookups, but I am concerned about .Where() and .Sum() performance.
  2. Use a two-dimensional array, with one dimension for BuildingIDs and one for ImprovementIDs, and the Result as value. In addition, build two Dictionary<int, int> that map BuildingIDs and ImprovementIDs to their respective array row/column indexes. This could potentially mean max 1000+ Dictionarys, will this be a problem?
  3. Use a List<Tuple<int, int, Result>>. I think this may be the least efficient, with O(n) inserts, though I could be wrong.

Am I missing an obvious better option here?

EDIT: Turns out it is only the aggregated values (per Building and per Improvement) I am interested in; see my answer.

like image 355
Geir Sagberg Avatar asked Jun 03 '13 07:06

Geir Sagberg


2 Answers

Generally, the Dictionary is most lookup efficent. The both lookup efficency and manipulation efficency is constant O(1), when accessed via key. This will help for access, the first point.

In the second and third you need to walk through all of the items O(n), so there is no way to speed it except you want to walk them through in specified order O(n*n) - then you can use SortedDictionray O(n), but you compromise the lookup and manipulation efficency (O(log n)).

So I would go with the 1st solution you post.

like image 198
eMko Avatar answered Nov 17 '22 13:11

eMko


You could use a "dictionary of dictionaries" to hold the Result data, for example:

//             Building ID ↓               ↓ Improvement ID
var data = new Dictionary<int, Dictionary<int, Result>>();

This would let you quickly find the improvements for a particular building.

However, finding the buildings that contain a particular improvement would require iterating over all the buildings. Here's some sample code:

using System;
using System.Linq;
using System.Collections.Generic;

namespace Demo
{
    sealed class Result
    {
        public double Data;
    }

    sealed class Building
    {
        public int Id;
        public int Value;
    }

    sealed class Improvement
    {
        public int Id;
        public int Value;
    }

    class Program
    {
        void run()
        {
            //             Building ID ↓               ↓ Improvement ID
            var data = new Dictionary<int, Dictionary<int, Result>>();

            for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
            {
                var improvements = new Dictionary<int, Result>();

                for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
                    improvements.Add(improvementKey, new Result{ Data = buildingKey + improvementKey/1000.0 });

                data.Add(buildingKey, improvements);
            }

            // Aggregate data for all improvements for building with ID == 1500:

            int buildingId = 1500;
            var sum = data[buildingId].Sum(result => result.Value.Data);
            Console.WriteLine(sum);

            // Aggregate data for all buildings with a given improvement.

            int improvementId = 5010;

            sum = data.Sum(improvements =>
            {
                Result result;
                return improvements.Value.TryGetValue(improvementId, out result) ? result.Data : 0.0;
            });

            Console.WriteLine(sum);
        }

        static void Main()
        {
            new Program().run();
        }
    }
}

To speed up the second aggregation (for summing data for all improvements with a given ID) we can use a second dictionary:

//                      Improvment ID ↓               ↓ Building ID
var byImprovementId = new Dictionary<int, Dictionary<int, Result>>();

You would have an extra dictionary to maintain, but it's not too complicated. Having a few nested dictionaries like this might take too much memory though - but it's worth considering.

As noted in the comments below, it would be better to define types for the IDs and also for the dictionaries themselves. Putting that together gives:

using System;
using System.Linq;
using System.Collections.Generic;

namespace Demo
{
    sealed class Result
    {
        public double Data;
    }

    sealed class BuildingId
    {
        public BuildingId(int id)
        {
            Id = id;
        }

        public readonly int Id;

        public override int GetHashCode()
        {
            return Id.GetHashCode();
        }

        public override bool Equals(object obj)
        {
            var other = obj as BuildingId;

            if (other == null)
                return false;

            return this.Id == other.Id;
        }
    }

    sealed class ImprovementId
    {
        public ImprovementId(int id)
        {
            Id = id;
        }

        public readonly int Id;

        public override int GetHashCode()
        {
            return Id.GetHashCode();
        }

        public override bool Equals(object obj)
        {
            var other = obj as ImprovementId;

            if (other == null)
                return false;

            return this.Id == other.Id;
        }
    }

    sealed class Building
    {
        public BuildingId Id;
        public int Value;
    }

    sealed class Improvement
    {
        public ImprovementId Id;
        public int Value;
    }

    sealed class BuildingResults : Dictionary<BuildingId, Result>{}

    sealed class ImprovementResults: Dictionary<ImprovementId, Result>{}

    sealed class BuildingsById: Dictionary<BuildingId, ImprovementResults>{}

    sealed class ImprovementsById: Dictionary<ImprovementId, BuildingResults>{}

    class Program
    {
        void run()
        {
            var byBuildingId    = CreateTestBuildingsById();            // Create some test data.
            var byImprovementId = CreateImprovementsById(byBuildingId); // Create the alternative lookup dictionaries.

            // Aggregate data for all improvements for building with ID == 1500:

            BuildingId buildingId = new BuildingId(1500);

            var sum = byBuildingId[buildingId].Sum(result => result.Value.Data);
            Console.WriteLine(sum);

            // Aggregate data for all buildings with a given improvement.

            ImprovementId improvementId = new ImprovementId(5010);

            sum = byBuildingId.Sum(improvements =>
            {
                Result result;
                return improvements.Value.TryGetValue(improvementId, out result) ? result.Data : 0.0;
            });

            Console.WriteLine(sum);

            // Aggregate data for all buildings with a given improvement using byImprovementId.
            // This will be much faster than the above Linq.

            sum = byImprovementId[improvementId].Sum(result => result.Value.Data);
            Console.WriteLine(sum);
        }

        static BuildingsById CreateTestBuildingsById()
        {
            var byBuildingId = new BuildingsById();

            for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
            {
                var improvements = new ImprovementResults();

                for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
                {
                    improvements.Add
                    (
                        new ImprovementId(improvementKey),
                        new Result
                        {
                            Data = buildingKey + improvementKey/1000.0
                        }
                    );
                }

                byBuildingId.Add(new BuildingId(buildingKey), improvements);
            }

            return byBuildingId;
        }

        static ImprovementsById CreateImprovementsById(BuildingsById byBuildingId)
        {
            var byImprovementId = new ImprovementsById();

            foreach (var improvements in byBuildingId)
            {
                foreach (var improvement in improvements.Value)
                {
                    if (!byImprovementId.ContainsKey(improvement.Key))
                        byImprovementId[improvement.Key] = new BuildingResults();

                    byImprovementId[improvement.Key].Add(improvements.Key, improvement.Value);
                }
            }

            return byImprovementId;
        }

        static void Main()
        {
            new Program().run();
        }
    }
}

Finally, here's a modified version which determines the time it takes to aggregate data for all instances of a building/improvement combination for a particular improvement and compares the results for dictionary of tuples with dictionary of dictionaries.

My results for a RELEASE build run outside any debugger:

Dictionary of dictionaries took 00:00:00.2967741
Dictionary of tuples took 00:00:07.8164672

It's significantly faster to use a dictionary of dictionaries, but this is only of importance if you intend to do many of these aggregations.

using System;
using System.Diagnostics;
using System.Linq;
using System.Collections.Generic;

namespace Demo
{
    sealed class Result
    {
        public double Data;
    }

    sealed class BuildingId
    {
        public BuildingId(int id)
        {
            Id = id;
        }

        public readonly int Id;

        public override int GetHashCode()
        {
            return Id.GetHashCode();
        }

        public override bool Equals(object obj)
        {
            var other = obj as BuildingId;

            if (other == null)
                return false;

            return this.Id == other.Id;
        }
    }

    sealed class ImprovementId
    {
        public ImprovementId(int id)
        {
            Id = id;
        }

        public readonly int Id;

        public override int GetHashCode()
        {
            return Id.GetHashCode();
        }

        public override bool Equals(object obj)
        {
            var other = obj as ImprovementId;

            if (other == null)
                return false;

            return this.Id == other.Id;
        }
    }

    sealed class Building
    {
        public BuildingId Id;
        public int Value;
    }

    sealed class Improvement
    {
        public ImprovementId Id;
        public int Value;
    }

    sealed class BuildingResults : Dictionary<BuildingId, Result>{}

    sealed class ImprovementResults: Dictionary<ImprovementId, Result>{}

    sealed class BuildingsById: Dictionary<BuildingId, ImprovementResults>{}

    sealed class ImprovementsById: Dictionary<ImprovementId, BuildingResults>{}

    class Program
    {
        void run()
        {
            var byBuildingId    = CreateTestBuildingsById();            // Create some test data.
            var byImprovementId = CreateImprovementsById(byBuildingId); // Create the alternative lookup dictionaries.
            var testTuples      = CreateTestTuples();

            ImprovementId improvementId = new ImprovementId(5010);

            int count = 10000;

            Stopwatch sw = Stopwatch.StartNew();

            for (int i = 0; i < count; ++i)
                byImprovementId[improvementId].Sum(result => result.Value.Data);

            Console.WriteLine("Dictionary of dictionaries took " + sw.Elapsed);

            sw.Restart();

            for (int i = 0; i < count; ++i)
                testTuples.Where(result => result.Key.Item2.Equals(improvementId)).Sum(item => item.Value.Data);

            Console.WriteLine("Dictionary of tuples took " + sw.Elapsed);
        }

        static Dictionary<Tuple<BuildingId, ImprovementId>, Result> CreateTestTuples()
        {
            var result = new Dictionary<Tuple<BuildingId, ImprovementId>, Result>();

            for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
                for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
                    result.Add(
                        new Tuple<BuildingId, ImprovementId>(new BuildingId(buildingKey), new ImprovementId(improvementKey)),
                        new Result
                        {
                            Data = buildingKey + improvementKey/1000.0
                        });

            return result;
        }

        static BuildingsById CreateTestBuildingsById()
        {
            var byBuildingId = new BuildingsById();

            for (int buildingKey = 1000; buildingKey < 2000; ++buildingKey)
            {
                var improvements = new ImprovementResults();

                for (int improvementKey = 5000; improvementKey < 5030; ++improvementKey)
                {
                    improvements.Add
                    (
                        new ImprovementId(improvementKey),
                        new Result
                        {
                            Data = buildingKey + improvementKey/1000.0
                        }
                    );
                }

                byBuildingId.Add(new BuildingId(buildingKey), improvements);
            }

            return byBuildingId;
        }

        static ImprovementsById CreateImprovementsById(BuildingsById byBuildingId)
        {
            var byImprovementId = new ImprovementsById();

            foreach (var improvements in byBuildingId)
            {
                foreach (var improvement in improvements.Value)
                {
                    if (!byImprovementId.ContainsKey(improvement.Key))
                        byImprovementId[improvement.Key] = new BuildingResults();

                    byImprovementId[improvement.Key].Add(improvements.Key, improvement.Value);
                }
            }

            return byImprovementId;
        }

        static void Main()
        {
            new Program().run();
        }
    }
}
like image 2
Matthew Watson Avatar answered Nov 17 '22 14:11

Matthew Watson