Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data structure with O(1) insertion time and O(log m) lookup?

Backstory (skip to second-to-last paragraph for data structure part): I'm working on a compression algorithm (of the LZ77 variety). The algorithm boils down to finding the longest match between a given string and all strings that have already been seen.

To do this quickly, I've used a hash table (with separate chaining) as recommended in the DEFLATE spec: I insert every string seen so far one at a time (one per input byte) with m slots in the chain for each hash code. Insertions are fast (constant-time with no conditional logic), but searches are slow because I have to look at O(m) strings to find the longest match. Because I do hundreds of thousands of insertions and tens of thousands of lookups in a typical example, I need a highly efficient data structure if I want my algorithm to run quickly (currently it's too slow for m > 4; I'd like an m closer to 128).

I've implemented a special case where m is 1, which runs very fast buts offers only so-so compression. Now I'm working on an algorithm for those who'd prefer improved compression ratio over speed, where the larger m is, the better the compression gets (to a point, obviously). Unfortunately, my attempts so far are too slow for the modest gains in compression ratio as m increases.

So, I'm looking for a data structure that allows very fast insertion (since I do more insertions than searches), but still fairly fast searches (better than O(m)). Does an O(1) insertion and O(log m) search data structure exist? Failing that, what would be the best data structure to use? I'm willing to sacrifice memory for speed. I should add that on my target platform, jumps (ifs, loops, and function calls) are very slow, as are heap allocations (I have to implement everything myself using a raw byte array in order to get acceptable performance).

So far, I've thought of storing the m strings in sorted order, which would allow O(log m) searches using a binary search, but then the insertions also become O(log m).

Thanks!

like image 823
Cameron Avatar asked Oct 14 '11 06:10

Cameron


People also ask

What data structure has a runtime of O 1?

Only a hash table with a perfect hash function will have a worst-case runtime of O(1). The ideal hash function is not practical, so some collisions and workarounds lead to a worst-case runtime of O(n).

Which data structure inserts and deletes in O 1 time?

The Insertion, deletion and searching will take O(1) amount of time. To solve this problem, we will use one Boolean array.

Can we perform insertion in O 1 time complexity?

Best Case - O(1) We can directly place the element in arr[size]. So, best case time complexity will be O(1). O(1) indicates that the insertion operation is not depended on the size of the array. Whatever the array size it will always take constant time.

Which data structure is best for insertion and deletion?

A linked list provides efficient insertion and deletion of arbitrary elements.


1 Answers

You might be interested in this match-finding structure :

http://encode.ru/threads/1393-A-proposed-new-fast-match-searching-structure

It's O(1) insertion time and O(m) lookup. But (m) is many times lower than a standard Hash Table for an equivalent match finding result. As en example, with m=4, this structure gets equivalent results than an 80-probes hash table.

like image 80
Cyan Avatar answered Sep 18 '22 20:09

Cyan