Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

implementing a basic search engine with prefix tree

The problem is the implementing a prefix tree (Trie) in functional language without using any storage and iterative method.

I am trying to solve this problem. How should I approach this problem ? Can you give me exact algorithm or link which shows already implemented one in any functional language?

Why I am trying to do => creating a simple search engine with an feature of

  • adding word to tree
  • searching a word in tree
  • deleting a word in tree

Why I want to use functional language => I want improve my problem-solving ability a bit further.

NOTE : Since it is my hobby project, I will first implement basic features.

EDIT:

i.) What I mean about "without using storage" => I don't want use variable storage ( ex int a ), reference to a variable, array . I want calculate the result by recursively then showing result to the screen.

ii.) I have wrote some line but then I have erased because what I wrote is made me angry. Sorry for not showing my effort.

like image 829
john Avatar asked Apr 08 '12 07:04

john


2 Answers

Take a look at haskell's Data.IntMap. It is purely functional implementation of Patricia trie and it's source is quite readable.
bytestring-trie package extends this approach to ByteStrings

There is accompanying paper Fast Mergeable Integer Maps which is also readable and through. It describes implementation step-by-step: from binary tries to big-endian patricia trees.
Here is little extract from the paper.

At its simplest, a binary trie is a complete binary tree of depth equal to the number of bits in the keys, where each leaf is either empty, indicating that the corresponding key is unbound, or full, in which case it contains the data to which the corresponding key is bound. This style of trie might be represented in Standard ML as

datatype 'a Dict =
    Empty
  | Lf of 'a
  | Br of 'a Dict * 'a Dict

To lookup a value in a binary trie, we simply read the bits of the key, going left or right as directed, until we reach a leaf.

fun lookup (k, Empty) = NONE
  | lookup (k, Lf x) = SOME x
  | lookup (k, Br (t0,t1)) =
      if even k then lookup (k div 2, t0)
                else lookup (k div 2, t1)
like image 76
max taldykin Avatar answered Sep 18 '22 12:09

max taldykin


The key point in immutable data structure implementations is sharing of both data and structure. To update an object you should create new version of it with the most possible number of shared nodes. Concretely for tries following approach may be used.

Consider such a trie (from Wikipedia):

enter image description here

Imagine that you haven't added word "inn" yet, but you already have word "in". To add "inn" you have to create new instance of the whole trie with "inn" added. However, you are not forced to copy the whole thing - you can create only new instance of the root node (this without label) and the right banch. New root node will point to new right banch, but to old other branches, so with each update most of the structure is shared with the previous state.

However, your keys may be quite long, so recreating the whole branch each time is still both time and space consuming. To lessen this effect, you may share structure inside one node too. Normally each node is a vector or map of all possible outcomes (e.g. in a picture node with label "te" has 3 outcomes - "a", "d" and "n"). There are plenty of implementations for immutable maps (Scala, Clojure, see their repositories for more examples) and Clojure also has excellent implementation of an immutable vector (which is actually a tree).

All operations on creating, updating and searching resulting tries may be implemented recursively without any mutable state.

like image 22
ffriend Avatar answered Sep 22 '22 12:09

ffriend