Data structure with O(1) insertion time and O(log m) lookup?

Tags:

Backstory (skip to second-to-last paragraph for data structure part): I'm working on a compression algorithm (of the LZ77 variety). The algorithm boils down to finding the longest match between a given string and all strings that have already been seen.

To do this quickly, I've used a hash table (with separate chaining) as recommended in the DEFLATE spec: I insert every string seen so far one at a time (one per input byte) with m slots in the chain for each hash code. Insertions are fast (constant-time with no conditional logic), but searches are slow because I have to look at O(m) strings to find the longest match. Because I do hundreds of thousands of insertions and tens of thousands of lookups in a typical example, I need a highly efficient data structure if I want my algorithm to run quickly (currently it's too slow for m > 4; I'd like an m closer to 128).

I've implemented a special case where m is 1, which runs very fast buts offers only so-so compression. Now I'm working on an algorithm for those who'd prefer improved compression ratio over speed, where the larger m is, the better the compression gets (to a point, obviously). Unfortunately, my attempts so far are too slow for the modest gains in compression ratio as m increases.

So, I'm looking for a data structure that allows very fast insertion (since I do more insertions than searches), but still fairly fast searches (better than O(m)). Does an O(1) insertion and O(log m) search data structure exist? Failing that, what would be the best data structure to use? I'm willing to sacrifice memory for speed. I should add that on my target platform, jumps (ifs, loops, and function calls) are very slow, as are heap allocations (I have to implement everything myself using a raw byte array in order to get acceptable performance).

So far, I've thought of storing the m strings in sorted order, which would allow O(log m) searches using a binary search, but then the insertions also become O(log m).

Thanks!

823

asked Oct 14 '11 06:10

Cameron

1 Answers

You might be interested in this match-finding structure :

http://encode.ru/threads/1393-A-proposed-new-fast-match-searching-structure

It's O(1) insertion time and O(m) lookup. But (m) is many times lower than a standard Hash Table for an equivalent match finding result. As en example, with m=4, this structure gets equivalent results than an 80-probes hash table.

answered Sep 18 '22 20:09

Cyan

Related questions
                            
                                Copy performance: list vs array
                            
                                SQL Server: Clustered index considerably slower than equivalent non-clustered index
                            
                                React Recharts render blocking with a lot of data
                            
                                How can I make my simple .NET LRU cache faster?
                            
                                Is there a way to see how much CPU usage per core a process is using?
                            
                                Is it possible to get sub-1-second latency with transactional replication?
                            
                                Open XML SDK v2.0 Performance issue when deleting a first row in 20,000+ rows Excel file
                            
                                severside processing vs client side processing + ajax?
                            
                                How is the performance of entity framework 4 vs entity framework 3.5?
                            
                                Ever any performance different between Java >> and >>> right shift operators?
                            
                                SIMD/SSE newbie: simple image filtering
                            
                                Does the placement of a try-catch block affect performance?
                            
                                Loading Javascript : HTTP Requests -v- Asynchronous Loading
                            
                                HTML5 video performance
                            
                                Image processing on the GPU with OpenGL, GLSL and Framebuffer Objects - questions about performance
                            
                                How to make boost::serialization deserialization faster?
                            
                                SQL Server 2008 indexes - performance gain on queries vs. loss on INSERT/UPDATE
                            
                                Move rectangles so they don't overlap
                            
                                Flush InnoDB cache
                            
                                What are the performance implications of using require_dependency in Rails 3 applications?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Data structure with O(1) insertion time and O(log m) lookup?

Tags:

performance

data-structures

Cameron

People also ask

1 Answers

Cyan

Recent Activity

Donate For Us