How do I estimate the size of a Lucene index?

Question

Is there a known math formula that I can use to estimate the size of a new Lucene index? I know how many fields I want to have indexed, and the size of each field. And, I know how many items will be indexed. So, once these are processed by Lucene, how does it translate into bytes?

Yuval F · Accepted Answer

Here is the lucene index format documentation. The major file is the compound index (.cfs file). If you have term statistics, you can probably get an estimate for the .cfs file size, Note that this varies greatly based on the Analyzer you use, and on the field types you define.

alchemical · Answer

The index stores each "token" or text field etc., only once...so the size is dependent on the nature of the material being indexed. Add to that whatever is being stored as well. One good approach might be to take a sample and index it, and use that to extrapolate out for the complete source collection. However, the ratio of index size to source size decreases over time as well, as the words are already there in the index, so you might want to make the sample a decent percentage of the original.

How do I estimate the size of a Lucene index?

Tags:

lucene

bpapa

2 Answers

Yuval F

alchemical

Recent Activity

Donate For Us

How do I estimate the size of a Lucene index?

Tags:

lucene

bpapa

2 Answers

Yuval F

alchemical

Related questions

Recent Activity

Donate For Us