Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement Tag search?

I've designed a news hub system which read Rss links and stores whole news in the database. Now I want to implement a search system using tags. Each news has it's own tags. There are lots of algorithms to implement this but I don't know what is the most common to have the best performance. Currently I'm using Elastic search database and I use multiple keyword search. Which one of these are the best?
1- to store tags in a list or a string with a separator and search among them? 2- work like a relational system and have a table of tags, and a table of news tags to have a record for each news tag. and 5 records for 5 tags of one news 3- another algorithm which I don't know

like image 361
ehsan shirzadi Avatar asked Jul 28 '14 07:07

ehsan shirzadi


1 Answers

Seems like you want something like the inverted index

This is an index, that for each term (hashtag in your case) holds a list of document ids which contain this hashtag.

For example, if you have 3 documents: d1,d2,d3 with the hash tags:

d1: #tag1, #tag2
d2: #tag3
d3: tag3, #tag2

The inverted index will be:

#tag1: d1
#tag2: d1,d3
#tag3: d2,d3

It is fairly easy using the inverted index to find all documents that contain a certain term (hashtag in your case), by simply going over the list the is attached to this term.
This datastructure is also very efficient for union (or queries) and intersection (and queries).

This DS is very popular for information retrieval for full text search and also is often used in semi-structured search.

For more information, you can read about Information Retrieval in general. Mannings Introduction to Information Retrieval represents this Data structure in the book's first chapter.

like image 121
amit Avatar answered Oct 17 '22 15:10

amit