Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ORM mapping to lucene

Tags:

java

orm

pojo

I would like to create an ORM for lucene. Here is what i am trying to do .I have to make an POJO to map lucene Index.

Lets say i have a class

public class Users {

    public String username;
    public String password;

    public String getUsername() {
        return username;
    }

    public void setUsername(String username) {
        this.username = username;
    }

    public String getPassword() {
        return password;
    }

    public void setPassword(String password) {
        this.password = password;
    }
}

I need to map this class to a lucene index. I have used ORM lite for SQL but here the data source is a custom class which creates index, update etc. Is there any existing solution available or which is the best way to achieve it?.

like image 521
Ramesh Avatar asked Jan 09 '23 13:01

Ramesh


2 Answers

I am not an expert in lucene, but I can answer you on a conceptual level.

The lucene index stores documents. A document consists of multiple fields. For each field you can instruct lucene to do different things, e.g. just store the field or to index it, which means, to make the field "searchable".

What you need to do:

Data conversion: Select a framework which converts your object into data you can store within lucene. You can use normal or improved serialization (e.g. kyro) and store the object as binary into lucene. You can also convert it to JSON or XML.

You add the serialized object value to the document and store it:

 document.add(new BinaryDocValuesField(name, new BytesRef(byteData)));

Now it is possible to store objects in lucence. However, you can only access them by iterating everything or by an unique document ID that lucene assigns. What you cannot do is the search for a user and return the password.

Extract index fields: To be able to search for a user or other contents of the object. You need to select the properties e.g. the username and add it as additional indexed field together with the binary data to each document.

A remark:

Lucene is designed as an indexing framework, not as a durable data storage. The typical use is that you use lucene for searching in data, that is stored at another place, e.g. in a traditional transactional database.

You can, of course, store the original data within lucene. But you there will be "challenges". For example a new lucene version may, or may be not, backwards compatible and read the index format of the old version. It will also be more likely that your index will be corrupt and unreadable if you have a power outage. In short: The lucene index does not replace a robust and durable database storage technology.

like image 191
cruftex Avatar answered Jan 17 '23 18:01

cruftex


I have worked with Solr/Lucene before and I wouldn't try to write an ORM for Lucene because it's just not great for storing and manipulating data. You just get a single 'table', no real data types, no foreign keys, no unique constraints. You do get excellent search capabilities, but usually just a fraction of my data needs to be searchable. So it's simpler to store all the data in a place that is better suited for such a purpose, for example an RDBMS. Then you can use an existing ORM to manipulate your data, and configure Solr to read its index from that database by specifying appropriate queries in a configuration file. Hibernate Search can do similar things if you don't care about the Solr wrapper for Lucene.

like image 41
FelixM Avatar answered Jan 17 '23 19:01

FelixM