I am using Lucene to store (as well as index) various documents. Each document needs a persistent unique identifier (to be used as part of a URL). If I was using a SQL database, I could use an <code>integer primary key auto_increment</code> (or similar) field to automatically generate a unique id for every record that was added. Is there any way of doing this with Lucene? I am aware that documents in Lucene are numbered, but have noted that these numbers are reallocated over time. (I'm using the Java version of Lucene 3.0.3.)

For similar situations, I use following algorithm (has nothing to do with Lucene, but you can use it anyway). <ul> <li>Create new <code>AtomicLong</code>. Start with initial value obtained from <code>System.currentTimeMillis()</code> or <code>System.nanoTime()</code> </li> <li>Each next ID is generated by calling <code>.incrementAndGet</code> or <code>.getAndIncrement</code> on that <code>AtomicLong</code>.</li> <li>if the system is restarted, <code>AtomicLong</code> is again initialized to current timestamp during the startup.</li> </ul> Pros: simple, effective, thread-safe, non-blocking. If you need clustered id support, just add space for hi/lo algorithm on top of existing long or sacrifice some high bytes. Cons: does not work if the frequency of adding new entities if more than 1/ms (for <code>System.currentTimeMillis()</code>) or 1/ns (for <code>System.nanoTime()</code>). Does not tolerate clock abnormalities. Can consider using UUID as yet another alternative. Probability of a duplicate in UUID is virtually non-existant.

How do I generate a unique id using Lucene?

2 Answers

As larsmans said, you need to store this in a separate field. I suggest that you make the field indexed as well as stored, and index it using a KeywordAnalyzer. You can keep a counter in memory and update it for each new document.

What remains is the problem of persistence - how to store the maximal id when the Lucene process stops. One possibility is to use a text file which saves the maximal id.

I believe Flexible Indexing will allow you to add the maximal id to the index as a "global" field. If you are willing to work with Lucene's trunk, you can try flexible indexing to see whether it fits the bill.

124

answered Oct 11 '22 12:10

Yuval F

For similar situations, I use following algorithm (has nothing to do with Lucene, but you can use it anyway).

Create new AtomicLong. Start with initial value obtained from System.currentTimeMillis() or System.nanoTime()
Each next ID is generated by calling .incrementAndGet or .getAndIncrement on that AtomicLong.
if the system is restarted, AtomicLong is again initialized to current timestamp during the startup.

Pros: simple, effective, thread-safe, non-blocking. If you need clustered id support, just add space for hi/lo algorithm on top of existing long or sacrifice some high bytes.

Cons: does not work if the frequency of adding new entities if more than 1/ms (for System.currentTimeMillis()) or 1/ns (for System.nanoTime()). Does not tolerate clock abnormalities.

Can consider using UUID as yet another alternative. Probability of a duplicate in UUID is virtually non-existant.

answered Oct 11 '22 10:10

mindas

Related questions
                            
                                NDK do not find the standard C++ libraries
                            
                                JPA ManyToMany ConcurrentModificationException issues
                            
                                Java Regex to get the text from HTML anchor (<a>...</a>) tags
                            
                                How to define a Map in a YAML file in the Play! framework?
                            
                                JSTL fmt:message and resource bundle
                            
                                How can I load a Hibernate-mapped set as an unmodifiable set?
                            
                                How to persist a HashMap with hibernate
                            
                                update the console window with java
                            
                                Cleaning up a @RequestScoped object?
                            
                                @MappedSuperclass and @OneToMany
                            
                                Struts 2 parameter coding problem during redirect to another action
                            
                                Will Oracle merge JRockIt and the Standard JDK?
                            
                                Java SWT: Wrap main loop in exception handler?
                            
                                Colours in Java console
                            
                                How to generate Hash of any file in java?
                            
                                How can I create a random BigDecimal in Java?
                            
                                JSR303: Trying to customize a constraint violation to be associated with a sub-path in a class-level relationship constraint validator
                            
                                Installing Java3D on Eclipse
                            
                                What does System property http.nonProxyHosts has been set to local|*.local... mean? And what are the implications?
                            
                                Adding JAR to CLASSPATH in Mac OS

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I generate a unique id using Lucene?

Tags:

java

lucene

dave4420

People also ask

2 Answers

Yuval F

mindas

Recent Activity

Donate For Us