Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to update multiple documents in Solr with JSON?

Tags:

json

solr

solr4

How to update multiple documents in Solr 4.5.1 with JSON? I tried this but it does not work:

POST /solr/mycore/update/json:

{
  "commit": {},
  "add": {
    "overwrite": true,
    "doc": [{
        "thumbnail": "/images/404.png",
        "url": "/404.html?1",
        "id": "demo:/404.html?1",
        "channel": "demo",
        "display_name": "One entry",
        "description": "One entry is not enough."
      }, {
        "thumbnail": "/images/404.png",
        "url": "/404.html?2",
        "id": "demo:/404.html?2",
        "channel": "demo",
        "display_name": "Another entry",
        "description": "Another entry is required."
      }
    ]
  }
}
like image 846
burnersk Avatar asked Nov 27 '13 14:11

burnersk


People also ask

What are the correct ways of updating Solr index?

When you index a document to solr, it will overwrite any existing document with the same <uniqueKey/> which is usually the id. So yes, it overwrites existing data. When you want to change a single field of a document you will have to reindex the whole document, as solr does not support updating of a field only.

What is commit in Solr?

In Solr, a commit is an action which asks Solr to "commit" those changes to the Lucene index files. By default commit actions result in a "hard commit" of all the Lucene index files to stable storage (disk).

What is indexing in Solr?

By adding content to an index, we make it searchable by Solr. A Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data extracted from tables in a database, and files in common file formats such as Microsoft Word or PDF.


2 Answers

I understand that (at least) from versions 4.0 and older of solr, this has been fixed. Look at http://wiki.apache.org/solr/UpdateJSON.

In ./exampledocs/books.json there is an example of a json file with multiple documents.

[
{
"id" : "978-0641723445",
"cat" : ["book","hardcover"],
"name" : "The Lightning Thief",
"author" : "Rick Riordan",
"series_t" : "Percy Jackson and the Olympians",
"sequence_i" : 1,
"genre_s" : "fantasy",
"inStock" : true,
"price" : 12.50,
"pages_i" : 384
}
,
{
"id" : "978-1423103349",
"cat" : ["book","paperback"],
"name" : "The Sea of Monsters",
"author" : "Rick Riordan",
"series_t" : "Percy Jackson and the Olympians",
"sequence_i" : 2,
"genre_s" : "fantasy",
"inStock" : true,
"price" : 6.49,
"pages_i" : 304
}, 
...
]

While @fiskfisk answer is still a valid JSON, it is not easy to be serializable from a data structure. This one is.

like image 41
elachell Avatar answered Sep 20 '22 04:09

elachell


Solr expects one "add"-key in the JSON-structure for each document (which might seem weird, if you think about the original meaning of the key in the object), since it maps directly to the XML format when doing the indexing - and this way you can have metadata for each document by itself.

{
    "commit": {},
    "add": {
        "doc": {
            "id": "321321",
            "name": "barfoo"
        }
    },
    "add": {
        "doc": {
            "id": "123123",
            "name": "Foobar"        
        }
    }
}

.. works. I think allowing an array as the element referenced by "add" would make more sense, but I haven't dug further into the source or know the reasoning behind this.

like image 165
MatsLindh Avatar answered Sep 18 '22 04:09

MatsLindh