What Kind of algorithms + data structures that would help me to do that?
Having a file contains like 10000~ lines loaded in memory in a ordered set. With a given search string I want to be able to get all the lines that have words prefixed with words found in search string. Well let me give an example to clarify this:
"A brow Fox flies."
"Dogs hates eagles", "Dolphins have eyes and teath"
(fast enough for me it took about 30ms~(including sorting the final result) on my pc on a set of 10k lines 3 words each line)
I think what you're probably wanting is a trie. Construct one for the set of all words in your document, and have each leaf point to a hashset containing the indices of the lines in which the key of the leaf appears.
To execute a search, you'd use each fragment of the search string to navigate to a node in the tree and take the union over the hashsets of all leaves in that node's subtree. Then you'd take the intersection of those unions over the set of fragments to get the list of lines satisfying the search string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With