I have people and places data as:
Person
entity has
IList<DateRangePlaces>
each having
IList<Place>
of possible places Schedule
day pattern as ie. 10 days available 4 unavailableWithin a particular DateRangePlaces
date range one has to obey to Schedule
pattern whether person can go to a particular place or not.
Place
entity has
IList<DateRangeTiming>
each defining opening/closing times within each date rangeOverlapping date ranges work as LIFO. So for each day that has already been defined previously new timing definition takes preference.
Now I need to do something like this (in pseudo code):
for each Place
{
for each Day between minimum and maximum date in IList<DateRangeTiming>
{
get a set of People applicable for Place and on Day
}
}
This means that number of steps to execute my task is approx.:
∑(places)( ∑(days) × ∑(people) )
This to my understanding is
O(x × yx × z)
and likely approximates to this algorithm complexity:
O(n3)
I'm not an expert in theory so you can freely correct my assumptions. What is true is that this kind of complexity is definitely not acceptable especially given the fact that I will be operating over long date ranges with many places and people.
From the formula approximation we can see that people set would be iterated lots of times. Hence I would like to optimize at least this part. To ease things a bit I changed
Person.IList<DateRangePlaces>.IList<Place>
to
Person.IList<DateRangePlaces>.IDictionary<int, Place>
which would give me a faster result whether a person can go to some place on particular date, because I would only check whether Place.Id
is present in the dictionary versus IList.Where()
LINQ clause that would have to scan the whole list each and every time.
Can you suggest any additional optimizations I could implement into my algorithm to make it faster or even make it less complex in terms of the big O notation?
Which memory structure types would you use where and why (lists, dictionaries, stacks, queues...) to improve performance?
There're also additional complexities that I didn't mention since I wanted to simplify my question to make it more clear. So. There's also:
Place.IList<Permission>
Person.IList<DateRangePermission>
So places require particular permissions and people have a limited time permission grants that expire.
Additional to that, there's also
Person.IList<DateRangeTimingRestriction>
which tells only particular times that person can go somewhere during particular date range. And
Person.IList<DateRangePlacePriorities>
Which defines place prioritization for a particular date range.
And during this process of getting applicable people I also have to calculate certain factor per every person per every place that's related to the:
All these are the reasons why I decided to rather manipulate this data in memory than using a very complex stored procedure that would also be doing multiple table scans to get factors per person and place and day.
I think such stored procedure would be way to complex to handle and maintain. So I rather get all the data first (put it appropriate memory structures to aid performance) and then mangle with it in memory.
I suggest using a relational database and writing a stored procedure to retrieve the "set of People applicable for Place and on Day".
The stored procedure approach would not be complex nor difficult to maintain if the model is architected properly. Additionally, relational databases have primary keys and indexing to avoid table scans.
The only way to speed things up using collections would be:
change the collection type. You could use a KeyedCollection, IDictionary<> or even a disconnected recordset. Disconnected recordsets also give you the ability to set foreign keys to child recordsets, however I think this would be a fairly complex pattern to use.
maintain a collection within a collection - basically the same concept as a parent / child relationship with a foreign key. The object references will only be pointers to the original object's memory space or, if you're using a keyed collection you could simply store the index of the other collection.
maintain boolean properties that can allow you to skip iterations if true or false. For example, as you build your entities, set a boolean of "HasPlaceXPermission". if the value is false, you know not to retrieve information related to place X.
maintain flags - flags can be a very good optimization technique when used properly. Similar to #3, flags can be used to determine permissions very quickly, for example if((person.PlacePermissions & (Place.Colorado | Place.Florida) > 0) // do date/time scan on Colorado and Florida, else don't.
It's difficult to know which collection types I would use based upon the information you have provided, I would need a larger scope of the application to determine that architecturally. For example, where is the data stored, how is it retrieved, how is it prepared and how is it presented? Knowing how the application is architected would help to determine its optimization points.
You can't avoid O(n^2) as the minimal iteration you need is to pass every Place
and every Date
element to find a match for a given Person
.
I think the best way is to use a DB similar to SQL server and run your query in SQL as a store procedure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With