Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Meta Search Engine Architecture

The question wasn't clear enough, I think; here's an updated straight to the point question:

What are the common architectures used in building a meta search engine and is there any libraries available to build that type of search engine?

I'm looking at building an "enterprise" type of search engine where the indexed data could be coming from proprietary (like Autonomy or a Google Box) or public search engines (like Google Web or Yahoo Web).

like image 445
Loki Avatar asked May 17 '10 15:05

Loki


People also ask

What is Meta in search engine?

A metasearch engine (or search aggregator) is an online information retrieval tool that uses the data of a web search engine to produce its own results. Metasearch engines take input from a user and immediately query search engines for results.

What is the architecture of search engine?

The search engine architecture comprises of the three basic layers listed below: Content collection and refinement. Search core. User and application interfaces.

Is Google a meta search engine?

In simple terms, a metasearch engine takes the query you've entered and gathers results from multiple search engines online, such as Google, Bing, Yahoo, and more. They aggregate the results for you so you can choose the best information from the search results provided.

What are the three main components of search engine architecture?

In general, a search engine consists of three main components as shown in Figure 1: a crawler, an offline processing system to accumulate data and produce searchable index, and an online engine for realtime query handling.


2 Answers

If you look at Garlic (pdf), you'll notice that its architecture is generic enough and can be adapted to a meta-search engine.

UPDATE:

The rough architectural sketch is something like this:

   +---------------------------+
   |                           |
   |    Meta-Search Engine     |         +---------------+
   |                           |         |               |
   |   +-------------------+   |---------| Configuration |
   |   | Query Processor   |   |         |               |
   |   |                   |   |         +---------------+
   |   +-------------------+   |
   +-------------+-------------+
                 |
      +----------+---------------+
   +--+----------+-------------+ |
   |             |             | |
   |     +-------+-------+     | |
   |     |    Wrapper    |     | |
   |     |               |     | |
   |     +-------+-------+     | |
   |             |             | |
   |             |             | |
   |     +-------+--------+    | |
   |     |                |    | |
   |     | Search Engine  |    | |
   |     |                |    +-+
   |     +----------------+    |
   +---------------------------+

The parts depicted are:

  • Meta-Search Engine - the engine, orchestrates the whole thing.
  • Query Processor - part of the engine, resolves capabilities, sends requests and aggregates results of specific search engines (through the wrappers).
  • Wrapper - bridges the meta-search engine API to specific search engines. Each wrapper works with a specific search engine. Exposes the external search engine capabilities to the meta-search engine, accepts and responds to search requests.
  • Search engine - external search engines to query, they're exposed to the meta-search engine through the wrappers.
  • Configuration - data that configures the meta-search engine, e.g., which wrappers to use, where to find more wrappers, etc. Can also configure the wrappers.
like image 182
Jordão Avatar answered Sep 26 '22 04:09

Jordão


Have a look at Lucene.

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

like image 39
bobah Avatar answered Sep 22 '22 04:09

bobah