Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Appropriate data structure for faster retrieval process (data size: around 200,000 values all string)

I have a large data set of around 200, 000 values, all of them are strings. Which data structure should i use so that the searching and retrieval process is fast. Insertion is one time, so even if the insertion is slow it wouldn't matter much.

Hash Map could be one solution, but what are the other choices?? Thanks

Edit: some pointers 1. I am looking for exact matches and not the partial ones. 2. I have to accomplish this in PHP. 3. Is there any way i can keep such amount of data in cache in form of tree or in some other format?

like image 382
Elvis Avatar asked Feb 22 '26 00:02

Elvis


1 Answers

You really should consider not using maps or hash dictionaries if all you need is a string lookup. When using those, your complexity guaranties for N items in a lookup of string size M are O(M x log(N)) or, best amortised for the hash, O(M) with a large constant multiplier. It is much more efficient to use an acyclic deterministic finite automaton (ADFA) for basic lookups, or a Trie if there is a need to associate data. These will walk the data structure one character at a time, giving O(M) with very small multiplier complexity.

Basically, you want a data structure that parses your string as it is consumed by the data structure, not one that must do full string compares at each node of the lookup. The common orders of complexity you see thrown around around for red-black trees and such assume O(1) compare, which is not true for strings. Strings are O(M), and that propagates to all compares used.

like image 68
ex0du5 Avatar answered Feb 24 '26 15:02

ex0du5



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!