Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Building a full text search engine: where to start [closed]

I want to write a web application using Google App Engine (so the reference language would be Python). My application needs a simple search engine, so the users would be able to find data specifying keywords.

For example, if I have one table with those rows:

1 Office space
2 2001: A space odyssey
3 Brazil

and the user queries for "space", rows 1 and 2 would be returned. If the user queries for "office space", the result should be rows 1 and 2 too (row 1 first).

What are the technical guidelines/algorithms to do this in a simple way?
Can you give me good pointers to the theory behind this?

Thanks.

Edit: I'm not looking for anything complex here (say, indexing tons of data).

like image 574
friol Avatar asked Oct 06 '08 21:10

friol


People also ask

How do you make a full text search?

Go to any cluster and select the “Search” tab to do so. From there, you can click on “Create Search Index” to launch the process. Once the index is created, you can use the $search operator to perform full-text searches.

What is full text indexing?

A fulltext index uses internal tables called full-text index fragments to store the inverted index data. This view can be used to query the metadata about these fragments. This view contains a row for each full-text index fragment in every table that contains a full-text index.

What is FTS database?

Full-text search makes it easy to search the contents of a database. Users specify the search text criteria, such as keywords, and the system scans one or more indexes for matches.

What is Full Text Search vs LIKE?

Like uses wildcards only, and isn't all that powerful. Full text allows much more complex searching, including And, Or, Not, even similar sounding results (SOUNDEX) and many more items.


1 Answers

Read Tim Bray's series of posts on the subject.

  • Background
  • Usage of search engines
  • Basics
  • Precision and recall
  • Search engne intelligence
  • Tricky search terms
  • Stopwords
  • Metadata
  • Internationalization
  • Ranking results
  • XML
  • Robots
  • Requirements list
like image 157
Mark Cidade Avatar answered Oct 20 '22 16:10

Mark Cidade