I have a large array of objects ~10000 in memory (not in db). Object has five fields: id, type, status, dateFrom and dateTo. Field type has ~5 unique values, field status has ~5 unique values, dateFrom and dateTo may have different unique values.
Instead of use brute force search first thought was to build index by each field. For type and status we can group objects by each field, ok. For dateTo and dateFrom we can use kd-tree, ok.
Is this search strategy good enough? How I can merge the result of different indexes? Have any ideas?
Because Type and Status have only ~5 unique values each, searching on one of them (even when indexed) will only cut down your search to 1/5 of ~10000 (~=2000). In your search algorithm I would prioritize your dateTo and dateFrom since you can search for a unique date using binary search Log_2 of ~10000 (~=13).
For your indexes:
EDIT For date range searches (with 2 fields) the best you can do is pick the date index with the least results and filter those for the second date, Log N + Log N + N time.
Index1 = BSFirst(dateFromTuples, fromVal)
Index2 = BSLast(dateToTuples, toVal)
If Index1 = 0 AndAlso Index2 = (N - 1) Then
Result = AllObjects
Else If (N - 1) - Index1 < Index2 Then
Result = Filter(dateFromTuples, Index1, n - 1, obj => obj.dateTo <= toVal)
Else
Result = Filter(dateToTuples, 0, Index2, obj => obj.dateFrom >= fromVal)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With