Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Solr's schema-less feature work? How to revert it to classic schema?

Tags:

solr

solr5

Just found that Solr 5 doesn't require a schema file to be predefined and it generates the schema, based on the indexing being performed. I would like to know how does this work in the background?

And whether it's a good practice or not? Is there any way to disable it?

like image 901
Krunal Avatar asked Apr 23 '15 09:04

Krunal


People also ask

Where is Solr schema XML?

The solrconfig. xml file is located in the conf/ directory for each collection. Several well-commented example files can be found in the server/solr/configsets/ directories demonstrating best practices for many different types of installations.


2 Answers

The schemaless feature has been in Solr since version 4.3. But it might be more stable only now as a concurrency issue with it was fixed in 4.10.

It is also called managed schema. When you configure Solr to use managed schema, Solr uses a special UpdateRequestProcessor to intercept document indexing requests and it guesses field types.

Solr starts with your schema.xml file and creates a new file called, by default, managed-schema to store all the inferred schema information. This file is automatically overwritten by Solr as it detects changes to the schema.

You should then use the Schema API if you want to make changes to the Schema. See also the Schemaless Mode documentation.

How to change Solr managed schema to classic schema

Stop Solr: $ bin/solr stop

Go to server/solr/mycore/conf, where "mycore" is the name of your core/collection.

Edit solrconfig.xml:

  • search for <schemaFactory class="ManagedIndexSchemaFactory"> and comment the whole element
  • search for <schemaFactory class="ClassicIndexSchemaFactory"/> and uncomment it
  • search for the <initParams> element that refers to add-unknown-fields-to-the-schema and comment out the whole <initParams>...</initParams>

Rename managed-schema to schema.xml and you are done.

You can now start Solr again: $ bin/solr start, go to http://localhost:8983/solr/#/mycore/documents and check that Solr now refuses to index a document with a new field not yet specified in schema.xml.

Is it a good practice? When to use it?

It depends on what you want. If you want to enforce a specific document structure (e.g. to make sure that all docs are "well-formed" according to your definition), then you want to use the classical schema management.

If on the other hand you don't know upfront what the doc structure is then you might want to use the schema-less feature.

Limits

While it is called schema-less, there are limits to the kinds of structures that you can index. This is true both for Solr and Elasticsearch, by the way. For example, if you first index this doc:

{"name":"John Doe"}

then you will get an error if you try to index a doc like that next:

{"name": {
   "first": "Daniel",
   "second": "Dennett"
   }
}

That is because in the first case the field name was of type string while in the second case it is an object.

If you would like to use indexing which goes beyond these limitations then you could use SIREn - it is an open source semi-structured information retrieval engine which is implemented as a plugin for both Solr and Elasticsearch. (Disclaimer: I worked for the company that develops SIREn)

like image 197
Jakub Kotowski Avatar answered Oct 18 '22 19:10

Jakub Kotowski


This is so called schemaless mode in Solr. I don't know about internal details, how it's implemented, etc.

bin/solr start -e schemaless

This snippet above will start Solr in schemaless mode, if you don't do that, it will work as usual.

For more information on schemaless, take a look here - https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode

like image 43
Mysterion Avatar answered Oct 18 '22 19:10

Mysterion