Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing and retrieving Json object to/from lucene indexes

I have store a set of json object into the lucene indexes and also want to retrieve it from the index. I am using lucene-3.4.

So is there any library or easy mechanism to make this happen in lucene.

For sample: Json object

{
    BOOKNAME1: {
        id:1,
        name:"bname1",
        price:"p1"
    },
    BOOKNAME2: {
        id:2,
        name:"bname2",
        price:"p2"
    },
    BOOKNAME3: {
        id:3,
        name:"bname3",
        price:"p3"
    }
}

Any sort of help will be appreciated. Thanks in advance,

like image 417
Mahesh More Avatar asked Mar 19 '13 13:03

Mahesh More


People also ask

How does Lucene store index?

Lucene indexes terms, which means that Lucene search searches over terms. A term combines a field name with a token. The terms created from the non-text fields in the document are pairs consisting of the field name and the field value. The terms created from text fields are pairs of field name and token.

How do I index a JSON file?

You can index JSON data as you would any data of the type that you use to store it. In particular, you can use a B-tree index or a bitmap index for SQL/JSON function json_value , and you can use a bitmap index for SQL/JSON conditions is json , is not json , and json_exists .

How do I save a JSON object to a JSON file?

To save the JSON object to a file, we stringify the json object jsonObj and write it to a file using Node FS's writeFile() function.


2 Answers

I would recommend you to index your json object by:

1) Parse your json file. I usually use json simple.

2) Open an index using IndexWriterConfig

3) Add documents to the index.

4) Commit changes and close the index

5) Run your queries

If you would like to use Lucene Core instead of elasticsearch, I have created a sample project, which gets as an input a file with JSON objects and creates an Index. Also, I have added a test to query the index.

I am using the latest Lucene version (4.8), please have a look here:

http://ignaciosuay.com/getting-started-with-lucene-and-json-indexing/

If you have time, I think it is worth reading "Lucene in Action".

Hope it helps.

like image 134
ignacio.suay Avatar answered Sep 18 '22 12:09

ignacio.suay


If you don't want to search within the json but only store it, you just need to extract the id, which will hopefully be unique. Then your lucene document would have two fields:

  • the id (indexed, not necessarily stored)
  • the json itself, as it is (only stored)

Once you stored your json in lucene you can retrieve it filtering by id.

On the other hand this is pretty much what elasticsearch does with your documents. You just send some json to it via a REST api. elasticsearch will keep the json as it is and also make it searchable by default. That means you can either retrieve the json by id or search against it, out of the box without having to write any code.

Also, with lucene your documents wouldn't be available till you commit your documents or reopen the index reader, while elasticsearch adds a handy transaction log to it, so that the GET is always real time.

Also, elasticsearch offers a lot more: a nice distributed infrastructure, faceting, scripting and more. Check it out!

like image 34
javanna Avatar answered Sep 19 '22 12:09

javanna