Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to build a conceptual search engine?

I would like to build an internal search engine (I have a very large collection of thousands of XML files) that is able to map queries to concepts. For example, if I search for "big cats", I would want highly ranked results to return documents with "large cats" as well. But I may also be interested in having it return "huge animals", albeit at a much lower relevancy score.

I'm currently reading through the Natural Language Processing in Python book, and it seems WordNet has some word mappings that might prove useful, though I'm not sure how to integrate that into a search engine. Could I use Lucene to do this? How?

From further research, it seems "latent semantic analysis" is relevant to what I'm looking for but I'm not sure how to implement it.

Any advice on how to get this done?

like image 511
DevX Avatar asked Oct 23 '10 11:10

DevX


People also ask

Can I build my own search engine?

From the Programmable Search Engine homepage, click Create a custom search engine or New search engine. In the Sites to search box, type one or more sites you want to include in the search results. You can include any sites on the web, even sites you don't own. Don't worry, you can always add more later.

What is a conceptual search?

A concept search (or conceptual search) is an automated information retrieval method that is used to search electronically stored unstructured text (for example, digital archives, email, scientific literature, etc.)


1 Answers

I'm not sure how to integrate that into a search engine. Could I use Lucene to do this? How?

Step 1. Stop.

Step 2. Get something to work.

Step 3. By then, you'll understand more about Python and Lucene and other tools and ways you might integrate them.

Don't start by trying to solve integration problems. Software can always be integrated. That's what an Operating System does. It integrates software. Sometimes you want "tighter" integration, but that's never the first problem to solve.

The first problem to solve is to get your search or concept thing or whatever it is to work as a dumb-old command-line application. Or pair of applications knit together by passing files around or knit together with OS pipes or something.

Later, you can try and figure out how to make the user experience seamless.

But don't start with integration and don't stall because of integration questions. Set integration aside and get something to work.

like image 197
S.Lott Avatar answered Nov 12 '22 20:11

S.Lott