Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what index can be applied to this condition with high efficiency?


I met such a problem when I tried to finish my job.

Given a data set, for each item, there are D dimensionalities and C values can be set to each dimensionality.
for example, a data set THINGS(ID,owner, color, weight), ID is the primary key
the owner attribute can be alice, jack, zuck;
the color attribute can be red, yellow, green;
the weight attribute can be high, medium, low;
in this data set, D=3, C=3

now I want to do many queries many times like :
"is there any data with owner=red and color=red"?
"is there any data with weight=low"?
"is there any data with owner=red and color=red and weight=high"?
I only need "Yes or No" to answer this query.

I need to do this originally, I mean without database.
In a PC, I tried Bitmap and inverted index to accomplish the requirement, but the size of the data set will be million and Dimensionality will be 8~18, Cardinality will be 5~15. As a result, the efficiency is not good enough.

could you give me any suggestion to make it much efficient?
Thanks in advance!

like image 201
BigPotato Avatar asked Nov 04 '22 03:11

BigPotato


1 Answers

You'd probably want a sorted dictionary for each dimension where the KEY is the possible elements for the dimension and the VALUE is the list of IDs.

OWNER_DICTIONARY = {
    Bob: [1,5],
    Jim: [2],
    Sally: [3,4],
    Will: []
}
COLOR_DICTIONARY = {
    Blue: [5],
    Green: [2],
    Red: [],
    Yellow: [1,3,4]
}
WEIGHT_DICTIONARY = {
    Low: [1,2,4],
    High: [3,5]
}

Then you simple use a INTERSECT on the VALUES (list of IDs) of your dictionaries. If the intersection size is greater than 0 you have a positive match.

Owner=Bob AND Weight=High

([1,5] UNION [3,5]) = [5]

If one of the VALUES for your criteria (or one of the previous INTERSECTIONs) is [] empty you can short circuit (return false) right away without having to evaluate further.

In database terms you'd be putting a NON-CLUSTERED INDEX on each field/column. and doing

EXISTS(SELECT ID FROM Table WHERE Col1=@Val1 AND Col2=@Val2 AND Col3=@Val3)

EDIT UNION -> INTERSECTION good catch @ElKamina

like image 98
Louis Ricci Avatar answered Nov 15 '22 05:11

Louis Ricci