Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to repair corrupted lucene index?

Tags:

lucene

My server was power loss and lucene index was corrupted. I runned IndexChecker but it fail:

java -cp /home/dthoai/programs/paesia/checker/lucene-core-3.5.0.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /mnt/peda/paesia/index -fix


Opening index @ /mnt/peda/paesia/index

ERROR: could not read any segments file in directory
java.io.IOException: read past EOF: MMapIndexInput(path="/mnt/peda/paesia/index/segments_ls0l")
at org.apache.lucene.store.MMapDirectory$MMapIndexInput.readByte(MMapDirectory.java:279)
at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
at org.apache.lucene.store.DataInput.readLong(DataInput.java:126)
at org.apache.lucene.index.SegmentInfo.<init>(SegmentInfo.java:202)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:286)
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:363)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:754)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:593)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359)
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:327)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1007)

How can I repair my lucene index?

like image 300
Tran Dinh Thoai Avatar asked Mar 29 '12 23:03

Tran Dinh Thoai


People also ask

Where is Lucene index stored?

Overview. When using the default Sitefinity CMS search service (Lucene), the search index definition (configurations which content to be indexed) is stored in your website database, and the actual search index files – on the file system. By default, the search index files are in the ~/App_Data/Sitefinity/Search/ folder ...

What is the Lucene index?

A Lucene Index Is an Inverted IndexA term combines a field name with a token. The terms created from the non-text fields in the document are pairs consisting of the field name and the field value. The terms created from text fields are pairs of field name and token.


1 Answers

It looks like the main directory file, segments_N is corrupted. This probably means that the power loss happened while a commit was running.

If this is the case, this means that there is some chance that an older segments_N file is present in your directory, and that the referenced segments are still present and valid. If there is such a file, try to remove your corrupted segments_ls0l file and see:

  • whether Lucene manages to open the index,
  • what data you are missing.

Otherwise, there are some threads one Lucene user mailing-list talking about regenerating the segments_N file.

  • http://www.gossamer-threads.com/lists/lucene/java-user/102493
  • http://www.gossamer-threads.com/lists/lucene/java-user/39744

Make sure to backup your directory before performing any modification.

like image 81
jpountz Avatar answered Oct 06 '22 00:10

jpountz