I need to implement an in-memory tuple-of-strings matching feature in C. There will be large list of tuples associated with different actions and a high volume of events to be matched against the list.
List of tuples:
("one", "four")
("one")
("three")
("four", "five")
("six")
event ("one", "two", "three", "four") should match list item ("one", "four") and ("one") and ("three") but not ("four", "five") and not ("six")
my current approach uses a map of all tuple field values as keys for lists of each tuple using that value. there is a lot of redundant hashing and list insertion.
is there a right or classic way to do this?
If you only have a small number of possible tuple values it would make sense to write some sort of hashing function which could turn them into integer indexes for quick searching.
If there are < 32 values you could do something with bitmasks:
unsigned int hash(char *value){...}
typedef struct _tuple {
unsigned int bitvalues;
void * data
} tuple;
tuple a,b,c,d;
a.bitvalues = hash("one");
a.bitvalues |= hash("four");
//a.data = something;
unsigned int event = 0;
//foreach value in event;
event |= hash(string_val);
// foreach tuple
if(x->bitvalues & test == test)
{
//matches
}
If there are too many values to do a bitmask solution you could have an array of linked lists. Go through each item in the event. If the item matches key_one, walk through the tuples with that first key and check the event for the second key:
typedef struct _tuple {
unsigned int key_one;
unsigned int key_two;
_tuple *next;
void * data;
} tuple;
tuple a,b,c,d;
a.key_one = hash("one");
a.key_two = hash("four");
tuple * list = malloc(/*big enough for all hash indexes*/
memset(/*clear list*/);
//foreach touple item
if(list[item->key_one])
put item on the end of the list;
else
list[item->key_one] = item;
//foreach event
//foreach key
if(item_ptr = list[key])
while(item_ptr.next)
if(!item_ptr.key_two || /*item has key_two*/)
//match
item_ptr = item_ptr.next;
This code is in no way tested and probably has many small errors but you should get the idea. (one error that was corrected was the test condition for tuple match)
If event processing speed is of utmost importance it would make sense to iterate through all of your constructed tuples, count the number of occurrences and go through possibly re-ordering the key one/key two of each tuple so the most unique value is listed first.
A possible solution would be to assign a unique prime number to each of the words.
Then if you multiply the words together in each tuple, then you have a number that represents the words in the list.
Divide one list by another, and if you get an integer remainder, then the one list is contained in the other.
I don't know of any classical or right way to do this, so here is what I would do :P
It looks like you want to decide if A is a superset of B, using set theory jargon. One way you can do it is to sort A and B, and do a merge sort-esque operation on A and B, in that you try to find where in A a value in B goes. Those elements of B which are also in A, will have duplicates, and the other elements won't. Because both A and B are sorted, this shouldn't be too horrible.
For example, you take the first value of B, and walk A until you find its duplicate in A. Then you take the second value of B, and start walking A from where you left off previously. If you get to end of A without finding a match, then A is not a superset of B, and you return false.
If these tuples can stay sorted, then the sorting cost is only incurred once.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With