Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Advanced: How to optimize my complex O(n²) algorithm

Tags:

iterator

c#

big-o

I have people and places data as:

  • Person entity has
    • IList<DateRangePlaces> each having
      • IList<Place> of possible places
    • Schedule day pattern as ie. 10 days available 4 unavailable

Within a particular DateRangePlaces date range one has to obey to Schedule pattern whether person can go to a particular place or not.

  • Place entity has
    • IList<DateRangeTiming> each defining opening/closing times within each date range

Overlapping date ranges work as LIFO. So for each day that has already been defined previously new timing definition takes preference.

The problem

Now I need to do something like this (in pseudo code):

for each Place
{
    for each Day between minimum and maximum date in IList<DateRangeTiming>
    {
        get a set of People applicable for Place and on Day
    }
}

This means that number of steps to execute my task is approx.:

(places)( ∑(days) × ∑(people) )

This to my understanding is

O(x × yx × z)

and likely approximates to this algorithm complexity:

O(n3)

I'm not an expert in theory so you can freely correct my assumptions. What is true is that this kind of complexity is definitely not acceptable especially given the fact that I will be operating over long date ranges with many places and people.

From the formula approximation we can see that people set would be iterated lots of times. Hence I would like to optimize at least this part. To ease things a bit I changed

Person.IList<DateRangePlaces>.IList<Place>

to

Person.IList<DateRangePlaces>.IDictionary<int, Place>

which would give me a faster result whether a person can go to some place on particular date, because I would only check whether Place.Id is present in the dictionary versus IList.Where() LINQ clause that would have to scan the whole list each and every time.

Question

  1. Can you suggest any additional optimizations I could implement into my algorithm to make it faster or even make it less complex in terms of the big O notation?

  2. Which memory structure types would you use where and why (lists, dictionaries, stacks, queues...) to improve performance?

Addendum: The whole problem is even more complex

There're also additional complexities that I didn't mention since I wanted to simplify my question to make it more clear. So. There's also:

Place.IList<Permission>
Person.IList<DateRangePermission>

So places require particular permissions and people have a limited time permission grants that expire.

Additional to that, there's also

Person.IList<DateRangeTimingRestriction>

which tells only particular times that person can go somewhere during particular date range. And

Person.IList<DateRangePlacePriorities>

Which defines place prioritization for a particular date range.

And during this process of getting applicable people I also have to calculate certain factor per every person per every place that's related to the:

  • number of places that a person can visit on particular day
  • person's place priority factor on that particular day

All these are the reasons why I decided to rather manipulate this data in memory than using a very complex stored procedure that would also be doing multiple table scans to get factors per person and place and day.

I think such stored procedure would be way to complex to handle and maintain. So I rather get all the data first (put it appropriate memory structures to aid performance) and then mangle with it in memory.

like image 588
Robert Koritnik Avatar asked Sep 20 '11 12:09

Robert Koritnik


2 Answers

I suggest using a relational database and writing a stored procedure to retrieve the "set of People applicable for Place and on Day".

The stored procedure approach would not be complex nor difficult to maintain if the model is architected properly. Additionally, relational databases have primary keys and indexing to avoid table scans.

The only way to speed things up using collections would be:

  1. change the collection type. You could use a KeyedCollection, IDictionary<> or even a disconnected recordset. Disconnected recordsets also give you the ability to set foreign keys to child recordsets, however I think this would be a fairly complex pattern to use.

  2. maintain a collection within a collection - basically the same concept as a parent / child relationship with a foreign key. The object references will only be pointers to the original object's memory space or, if you're using a keyed collection you could simply store the index of the other collection.

  3. maintain boolean properties that can allow you to skip iterations if true or false. For example, as you build your entities, set a boolean of "HasPlaceXPermission". if the value is false, you know not to retrieve information related to place X.

  4. maintain flags - flags can be a very good optimization technique when used properly. Similar to #3, flags can be used to determine permissions very quickly, for example if((person.PlacePermissions & (Place.Colorado | Place.Florida) > 0) // do date/time scan on Colorado and Florida, else don't.

It's difficult to know which collection types I would use based upon the information you have provided, I would need a larger scope of the application to determine that architecturally. For example, where is the data stored, how is it retrieved, how is it prepared and how is it presented? Knowing how the application is architected would help to determine its optimization points.

like image 121
Chris Gessler Avatar answered Nov 10 '22 10:11

Chris Gessler


You can't avoid O(n^2) as the minimal iteration you need is to pass every Place and every Date element to find a match for a given Person.

I think the best way is to use a DB similar to SQL server and run your query in SQL as a store procedure.

like image 36
NirMH Avatar answered Nov 10 '22 09:11

NirMH