Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Sunspot modify Solr's schema.xml? Does it modify it at all?

Let me know if I am wrong, but I think solr only expects fields that are already mentioned in the schema.xml. So, if I have a field called 'title', I need to mention this in the schema.

There is no mentioning about modifying the schema.xml in the Sunspot's documentation. I am just wondering how Sunspot modifies schema.xml allowing custom fields to be entered to the index.

I also know that Sunspot uses RSolr to do things. So if there is a way to modify the schema and reload data from DB to Solr using RSolr, please let me know.

like image 930
denniss Avatar asked Aug 25 '11 19:08

denniss


People also ask

What is schema XML in Solr?

The Solr search engine uses a schema. xml file to describe the structure of each data index. This XML files determines how Solr will build indexes from input documents, and how to perform index and query time processing. As well as describing the structure of the index, schema.

What is sunspot Solr?

Sunspot is a Solr client written in Ruby and based on the RSolr project. Sunspot provides an interface between an application(usually Ruby on Rails) and a Solr index. This interface allows the application to send and query data very easily, using Solr as the search engine.


1 Answers

As karmajunkie alludes to, Sunspot uses its own standard schema. I'll go in to how that works in a bit more detail here.

Solr Schema 101

For the purposes of this discussion, Solr schemas are mostly comprised of two things: type definitions, and field definitions.

A type definition sets up a type by specifying its name, the Java class for the type, and in the case of some types (notably text), a subordinate block of XML configuring how that type is handled.

A field definition allows you to define the name of a field, and the name of the type of value contained in that field. This allows Solr to correlate the name of a field in a document with its type, and a handful of other options, and thus how that field's value should be processed in your index.

Solr also supports a dynamicField definition, which, instead of a static field name, lets you specify a pattern with a glob in it. Incoming fields can have their names matched against these patterns in order to determine their types.

Sunspot's conventional schema

Sunspot's schema has a handful of field definitions for internally used fields, such as the ID and model name. Additionally, Sunspot makes liberal use of dynamicField definitions to establish naming conventions based on types.

This use of field naming conventions allows Sunspot to define a configuration DSL that creates a mapping from your model into an XML document ready to be indexed by Solr.

For example, this simple configuration block in your model…

searchable do
  text :body
end

…will be used by Sunspot to create a field name of body_text. This field name is matched against the *_text pattern for the following dynamicField definition in the schema:

<dynamicField name="*_text" type="text" indexed="true" stored="false" multiValued="true"/>

This maps any field with the suffix _text to Sunspot's definition of the text type. If you take a look at Sunspot's schema.xml, you'll see many other similar conventions for other types and options. The :stored => true option, for example, will typically add an s on that type's suffix (e.g., _texts).

Modifying Sunspot's schema in practice

In my experience with clients', and my own, projects, there are two good cases for modifying Sunspot's schema. First, for making changes to the text field's analyzers based on the different features your application might need. And, second, for creating brand new types (usually based on the text type) for a more fine-grained application of Solr analyzers.

For example, widening search matches with "fuzzy" searches can be done with matches against a special text-based field that also uses linguistic stems, or NGrams. The tokens in the original text field may be used to populate spellcheck, or to boost exact matches. And the tokens in the custom text_ngram or text_en can serve to broaden search results when the stricter matching fails.

Sunspot's DSL provides one final feature for mapping your fields to these custom fields. Once you have set up the type and its corresponding dynamicField definition(s), you can use Sunspot's :as option to override the convention-based name generation.

For example, adding a custom ngram type for the above, we might end up processing the body again with NGrams with the following Ruby code:

searchable do
  text :body
  text :body_ngram, :as => 'body_ngram'
end
like image 144
Nick Zadrozny Avatar answered Oct 10 '22 03:10

Nick Zadrozny