Search String By SubWords

Question

What Kind of algorithms + data structures that would help me to do that?

Having a file contains like 10000~ lines loaded in memory in a ordered set. With a given search string I want to be able to get all the lines that have words prefixed with words found in search string. Well let me give an example to clarify this:

Lines:

"A brow Fox flies."
"Boxes are full of food."
"Cats runs slow"
"Dogs hates eagles"
"Dolphins have eyes and teath"

Cases 1:

search string = "fl b a"

"A brow Fox flies."

Explanation: search string have three words "fl", "b", and "a" and the only string that have some words that are prefixed with words from the search string is line 1.

Case 2:

search string "e do ha"

"Dogs hates eagles", "Dolphins have eyes and teath"

Solution

(fast enough for me it took about 30ms~(including sorting the final result) on my pc on a set of 10k lines 3 words each line)

I used a trie suggested in answer.
And some other hacky methods to be able to filter out duplicate and false positive results (mainly used hash sets for this).

Andy Jones · Accepted Answer

I think what you're probably wanting is a trie. Construct one for the set of all words in your document, and have each leaf point to a hashset containing the indices of the lines in which the key of the leaf appears.

To execute a search, you'd use each fragment of the search string to navigate to a node in the tree and take the union over the hashsets of all leaves in that node's subtree. Then you'd take the intersection of those unions over the set of fragments to get the list of lines satisfying the search string.

Search String By SubWords

Tags:

algorithm

search

Lines:

Cases 1:

search string = "fl b a"

Case 2:

search string "e do ha"

Solution

ToddlerXxX

1 Answers

Andy Jones

Recent Activity

Donate For Us

Search String By SubWords

Tags:

algorithm

search

Lines:

Cases 1:

search string = "fl b a"

Case 2:

search string "e do ha"

Solution

ToddlerXxX

1 Answers

Andy Jones

Related questions

Recent Activity

Donate For Us