Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data structure to speed in-memory tagged object search on boolean function of tags?

If I have a set of tags (<100), and a set of objects (~25000), where each object has some sub-set of the tags, do you know of an existing data-structure that would allow for fast retrieval of those objects that satisfy some boolean function of the tags?

Addition/deletion of tags and objects need not be particularly fast, but selection of those objects with tags that satisfy the boolean function should be.

Now that I have written my question down, it looks as if I'm describing an in-memory database, but originally I was thinking of some binary tree like structure for the objects where, for each branch, taking the left/right branch would be equivalent to deciding on have/have-not some tag. But that would not allow don't-care tags? i am asking as I wondered if this has been done before and find it hard to google for data structures.

  • Thanks in advance - Paddy.
like image 882
Paddy3118 Avatar asked Aug 22 '10 10:08

Paddy3118


1 Answers

Here's a suggestion: Use a bit-array for each tag, with as many elements as there are objects; each index of which represents one object. The value at each index is 1 if that object has that tag.

Boolean functions on tags are then fast set operations on this bit-array. And the resulting bit-array gives you the documents that satisfy the criteria.

This not very efficient if the tags or objects keep changing frequently but is perhaps applicable for you.

like image 82
Miserable Variable Avatar answered Oct 10 '22 03:10

Miserable Variable