Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene.Net Best Practices

What are the best practices in using Lucene.Net? or where can I find a good lucene.net usage sample?

like image 459
Elias Haileselassie Avatar asked Jun 16 '09 15:06

Elias Haileselassie


5 Answers

If you're going to work with Lucene, I'd buy a good book that covers it from A to Z. Lucene has a very steep learning curve (in my opinion). It's not only knowing how to search your that's important - it's also about indexing it. Doing a basic search is easy, but creating an index that consists of millions of records of data and still being able to do a lightning fast search over it is possible but pretty hard. There's no tutorial that learns you that.

I'd recommend Lucene in Action, Second Edition by Michael McCandless, Erik Hatcher, and Otis Gospodnetić. Though it is written for Lucene and not Lucene.NET, that shouldn't be a problem as the termonology and api's are basically the same.

However, if you're just going to give it a quick try, you could read this site. The name says it all :-)

like image 78
Razzie Avatar answered Oct 19 '22 23:10

Razzie


We frequently use Lucene.NET when the data is huge and needs to have super fast response times for reading. We generally stick the data in that we need to search as well as the key to allow us to map our results back to the database table that has the remaining details. This then allows us to search for a user (in our case) checking for their past participation. This is not just a username search but a search that iterates over various details trying to find if there are any other instances of that user (albeit in a different form). An example of this, we look for the users ID (from one system), their ID from another system, perhaps an ID from a suppliers system, a flash cookie GUID, a sites cookie GUID, etc. And as we find one identifier we look for other instances of that identifier for other instances of users. This allow us to dedup the users entry into one of many systems (as their participation in any system is only allowed once per 24 hours). In SQL this alogrithm (which I was vague about) would take forever! In Lucene.NET it takes less than a second. Lucene has many more search possibilities than SQL Server does. The thing that it sucks at is writing to or updating your index. This is usually done as a job...all at once. However, if you need to write to the index updating it in real time you need to write some clever code to insure that it is written to in a locked fashion (think queueing with singleton) or your code will overlap and explode!

I cover the usage of Lucene.NET in my book (ASP.NET Social Networking) and you can find lots of help here.

like image 35
Andrew Siemer Avatar answered Oct 19 '22 21:10

Andrew Siemer


The trouble with Lucene.NET is that it does not have an active community like standard (java) Lucene - so it is like always effectively running an old version of Lucene. Although we prefer .NET we decided to use the Java version of Lucene for this reason. If you use Solr as well it is very easy to integrate.

like image 2
Hugh Avatar answered Oct 19 '22 21:10

Hugh


'Lucene in Action' is the best book to learn how to index and how to search. It even covers the advanced search techniques and writing custom analyzers. Even though the book is for Java...I have implemented searching and indexing in .net by using this book.

like image 1
devson Avatar answered Oct 19 '22 23:10

devson


Simon Green has a nice three part series about how he set up Lucene.Net to work with his NHibernate implementation. Part one introduces the series. Part two and part three discuss the technical details.

I found the Lucene.Net code samples very useful, even though my project isn't using NHibernate.

like image 1
dthrasher Avatar answered Oct 19 '22 22:10

dthrasher