Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Important data structures in search

I'm interested in teaching myself different data structures, something I currently know very little about. My plan is to implement a few key structures so I understand how they work. I'm looking for suggestions on important data structures to start with.

I'm primarily interested in data structures that are relevant to search applications (e.g. Google / Lucene) and the general trade-off between delayed computation and precomputation. I'm also interested in distributed data structures -- data structures that can scale across hundreds / thousands of servers -- and probabilistic data structures -- data structures that help finding an approximate answer but do not need to always be correct.

Wikipedia has a list of data structures. I am currently considering:

  • Hash table
  • B+-Tree
  • R-Tree
  • KD-Tree
  • Radix-Tree
  • Bloom filter

Are there better choices?

Finally, is there any (major) problem with implementing these structures in a language like F#?

like image 537
Tristan Avatar asked Dec 30 '09 14:12

Tristan


2 Answers

Very ambitious. I voted your question up just for its scope.

MIT has an on-line algorithms and data structures course. The companion book is a classic. I'm not sure if it addresses the distributed and probabilistic features, but they'll give you an excellent grounding in the fundamentals.

I'd add red-black tree, hash tables, patricia trie, and skip lists to your agenda.

Good luck.

like image 149
duffymo Avatar answered Sep 27 '22 20:09

duffymo


If you are going to tackle this sort of thing with a functional language you should have a look at Purely Functional Data Structures by Chris Okasaki. Basic lesson is: the data structures you are familiar with for imperative programming may not be the best choices for functional programming. I expect there's a lot of similar material to be Googled for.

like image 24
High Performance Mark Avatar answered Sep 27 '22 21:09

High Performance Mark