Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Space efficient collection for strings with common prefixes - Java implementation

I need to store millions of string with common prefixes (they don't correspond to file system paths) in a Set like structure in memory, and query the Collection to see if a path exists.

e.g.

/path
/path/1
/path/2
/path/1/a
/path/1/b

I want to store these as efficiently as possible (they will be in memory) , given that there will be many common prefixes for all the strings involved would a Trie be a reasonable candidate?

I'm looking for a recommendation for an implementation of a suitable data structure in Java.

like image 879
Joel Avatar asked Apr 08 '11 13:04

Joel


People also ask

What is common prefix string?

The longest common prefix for an array of strings is the common prefix between 2 most dissimilar strings. For example, in the given array {“apple”, “ape”, “zebra”}, there is no common prefix because the 2 most dissimilar strings of the array “ape” and “zebra” do not share any starting characters.

What is LCP in Java?

Given the array of strings S[], you need to find the longest string S which is the prefix of ALL the strings in the array. Longest common prefix (LCP) for a pair of strings S1 and S2 is the longest string S which is the prefix of both S1 and S2.


1 Answers

A Trie looks like the structure you need. Also similar structures are Radix Tries, that unlike tries, use sequence of characters to label edges. In plain tries edges are labeled with single characters, I am sure they behave better in your case where strings share quite a lot of prefixes.

see also ...

http://code.google.com/p/trie/

http://code.google.com/p/radixtree/

like image 55
Manuel Salvadores Avatar answered Sep 20 '22 01:09

Manuel Salvadores