Let me know if I am wrong, but I think solr only expects fields that are already mentioned in the schema.xml. So, if I have a field called 'title', I need to mention this in the schema.
There is no mentioning about modifying the schema.xml in the Sunspot's documentation. I am just wondering how Sunspot modifies schema.xml allowing custom fields to be entered to the index.
I also know that Sunspot uses RSolr to do things. So if there is a way to modify the schema and reload data from DB to Solr using RSolr, please let me know.
The Solr search engine uses a schema. xml file to describe the structure of each data index. This XML files determines how Solr will build indexes from input documents, and how to perform index and query time processing. As well as describing the structure of the index, schema.
Sunspot is a Solr client written in Ruby and based on the RSolr project. Sunspot provides an interface between an application(usually Ruby on Rails) and a Solr index. This interface allows the application to send and query data very easily, using Solr as the search engine.
As karmajunkie alludes to, Sunspot uses its own standard schema. I'll go in to how that works in a bit more detail here.
For the purposes of this discussion, Solr schemas are mostly comprised of two things: type definitions, and field definitions.
A type
definition sets up a type by specifying its name, the Java class for the type, and in the case of some types (notably text), a subordinate block of XML configuring how that type is handled.
A field
definition allows you to define the name of a field, and the name of the type of value contained in that field. This allows Solr to correlate the name of a field in a document with its type, and a handful of other options, and thus how that field's value should be processed in your index.
Solr also supports a dynamicField
definition, which, instead of a static field name, lets you specify a pattern with a glob in it. Incoming fields can have their names matched against these patterns in order to determine their types.
Sunspot's schema has a handful of field
definitions for internally used fields, such as the ID and model name. Additionally, Sunspot makes liberal use of dynamicField
definitions to establish naming conventions based on types.
This use of field naming conventions allows Sunspot to define a configuration DSL that creates a mapping from your model into an XML document ready to be indexed by Solr.
For example, this simple configuration block in your model…
searchable do
text :body
end
…will be used by Sunspot to create a field name of body_text
. This field name is matched against the *_text
pattern for the following dynamicField
definition in the schema:
<dynamicField name="*_text" type="text" indexed="true" stored="false" multiValued="true"/>
This maps any field with the suffix _text
to Sunspot's definition of the text
type. If you take a look at Sunspot's schema.xml, you'll see many other similar conventions for other types and options. The :stored => true
option, for example, will typically add an s
on that type's suffix (e.g., _texts
).
In my experience with clients', and my own, projects, there are two good cases for modifying Sunspot's schema. First, for making changes to the text
field's analyzers based on the different features your application might need. And, second, for creating brand new types (usually based on the text type) for a more fine-grained application of Solr analyzers.
For example, widening search matches with "fuzzy" searches can be done with matches against a special text-based field that also uses linguistic stems, or NGrams. The tokens in the original text
field may be used to populate spellcheck, or to boost exact matches. And the tokens in the custom text_ngram
or text_en
can serve to broaden search results when the stricter matching fails.
Sunspot's DSL provides one final feature for mapping your fields to these custom fields. Once you have set up the type
and its corresponding dynamicField
definition(s), you can use Sunspot's :as
option to override the convention-based name generation.
For example, adding a custom ngram
type for the above, we might end up processing the body again with NGrams with the following Ruby code:
searchable do
text :body
text :body_ngram, :as => 'body_ngram'
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With