Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Failed to add documents to Solr: Solr responded with an error (HTTP 400) (django + haystack + solr)

I currently have Solr 4.2.0 working in production (set up around 2012). I have set up a new development environment where I upgraded all packages (Django 1.8.10, PySolr 3.4.0, Haystack 2.4.1) and set up Solr 5.5.0

In short

I have Solr running, my core/collection created with 'basic_configs' and it seems to work well, except that during indexing I get a lot of errors similar to these:

All documents removed.
Indexing 9604 contracts
Failed to add documents to Solr: Solr responded with an error (HTTP 400): [Reason: ERROR: [d
oc=accounting.contract.22] unknown field 'status']
Failed to add documents to Solr: Solr responded with an error (HTTP 400): [Reason: ERROR: [d
oc=accounting.contract.70556] unknown field 'date_signed']
Failed to add documents to Solr: Solr responded with an error (HTTP 400): [Reason: ERROR: [d
oc=accounting.contract.72059] unknown field 'date_signed']
Failed to add documents to Solr: Solr responded with an error (HTTP 400): [Reason: ERROR: [d
oc=accounting.contract.73458] unknown field 'date_signed']

Looking at the id's, it seems most documents are fine, but frequent enough (the list goes on) these errors appear throughout all tables/indexes.

Eventually I followed this promising github project guide, but unfortunately it did not solve the problems for me.

What I did, step by step

  1. Succesfully installed Solr 5.5.0 (web interface working at
    localhost:8983), using this guide
  2. Created a collection called 'spng', using the following command: sudo su - solr -c '/opt/solr/bin/solr create -c spng -d basic_configs'
  3. Overwritten my solr.xml (/srv/spng/src/django-haystack/haystack/templates/search_configuration/solr.xml) with the solr.xml from the earlier mentioned github project guide
  4. Just to be sure I gave the solr.xml file 777 rights.

My settings.py has the following entry:

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
        'URL': 'http://localhost:8983/solr/spng',
        'DEFAULT_OPERATOR': 'AND',
        'INCLUDE_SPELLING': True,
    },
}
  1. I created a schema.xml (python manage.py build_solr_schema) and placed it in /var/solr/data/spng/conf/schema.xml
  2. Again, just to be sure I gave the schema.xml file also 777 rights.
  3. I used the curl command to reload the core: curl 'http://localhost:8983/solr/admin/cores?action=RELOAD&core=spng&wt=json&indent=true'

The response was:

{
  "responseHeader":{
    "status":0,
    "QTime":300}}
  1. I also restarted uwsgi and solr just to make sure
  2. At this point I try to run the python manage.py rebuild_index command

I end up with the following errors, as mentioned before:

All documents removed.
Indexing 9604 contracts
Failed to add documents to Solr: Solr responded with an error (HTTP 400): [Reason: ERROR: [d
oc=accounting.contract.22] unknown field 'status']
Failed to add documents to Solr: Solr responded with an error (HTTP 400): [Reason: ERROR: [d
oc=accounting.contract.70556] unknown field 'date_signed']
Failed to add documents to Solr: Solr responded with an error (HTTP 400): [Reason: ERROR: [d
oc=accounting.contract.72059] unknown field 'date_signed']
Failed to add documents to Solr: Solr responded with an error (HTTP 400): [Reason: ERROR: [d
oc=accounting.contract.73458] unknown field 'date_signed']

Does anyone have any idea what might be wrong? The indexing works without errors on my production server, running 4.2.0. Did I miss a setting or is Solr 5.5.0 causing these errors?

like image 667
Jos van Leeuwen Avatar asked Mar 16 '16 10:03

Jos van Leeuwen


3 Answers

Special thanks to elyograg for helping me out on Solr's IRC channel (#solr on freenode).

elyograg: if you're using the stock solrconfig.xml from basic_configs, then your schema is located in a file named "managed-schema" -- ALL example configs are using the managed schema by default as of 5.5.

elyograg: put it (schema.xml contents) into managed-schema. You could potentially change the solrconfig.xml, but life will be easier for people trying to help you if you keep the defaults.

In other words, instead of schema.xml, as of version 5.5 the schema file is called 'managed-schema' when creating a collection with basic_configs (in my case located in /var/solr/data//conf/managed-schema)

After updating the file and reloading the core, indexing finished without errors.

Be wary in future versions, because elyograg also noted:

elyograg: It might also be a good idea to add the .xml extension. I don't think the lack of an extension is going to be much of a deterrent to hand-editing.

So in the future it may be called managed-schema.xml

like image 71
Jos van Leeuwen Avatar answered Nov 12 '22 01:11

Jos van Leeuwen


Solr Index Update consists of 4 steps:

  1. add valid fields in search_index.py

  2. Generate schema by running:

    python manage.py build_solr_schema > schema.xml

  3. update your django by:

    python manage.py update_index

  4. restart server.

If all above steps complete without any error then your fields are successfully updated

like image 30
Deepak Sharma Avatar answered Nov 12 '22 00:11

Deepak Sharma


Check the schema file at

http://localhost:8983/solr/#/spng/files?file=schema.xml

and compare with the schema from build_solr_schema to make sure solr is using the right schema

like image 21
ewianda Avatar answered Nov 12 '22 01:11

ewianda