I need to store millions of string with common prefixes (they don't correspond to file system paths) in a Set like structure in memory, and query the Collection to see if a path exists.
e.g.
/path
/path/1
/path/2
/path/1/a
/path/1/b
I want to store these as efficiently as possible (they will be in memory) , given that there will be many common prefixes for all the strings involved would a Trie be a reasonable candidate?
I'm looking for a recommendation for an implementation of a suitable data structure in Java.
The longest common prefix for an array of strings is the common prefix between 2 most dissimilar strings. For example, in the given array {“apple”, “ape”, “zebra”}, there is no common prefix because the 2 most dissimilar strings of the array “ape” and “zebra” do not share any starting characters.
Given the array of strings S[], you need to find the longest string S which is the prefix of ALL the strings in the array. Longest common prefix (LCP) for a pair of strings S1 and S2 is the longest string S which is the prefix of both S1 and S2.
A Trie looks like the structure you need. Also similar structures are Radix Tries, that unlike tries, use sequence of characters to label edges. In plain tries edges are labeled with single characters, I am sure they behave better in your case where strings share quite a lot of prefixes.
see also ...
http://code.google.com/p/trie/
http://code.google.com/p/radixtree/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With