I store different kinds of documents in a single index with strict predefined mapping. All of them have some field (say, "body"), but I'd want them to be analyzed slightly differently when indexed (for example, to use different token filters for specific documents) and treaten the same way while searched. As far as I know, analyzers can't be specified per document.
What I also considered to use:
{"mail":"smth"}
to use a specific index analyzer, then search by "query":{"body":"smth"}
to use generic search analyzer)._all
, and set copy_to
to a single body
field. I'm not sure, but it will add a substantial index overhead due to copying.To create a mapping, you will need the Put Mapping API that will help you to set a specific mapping definition for a specific type, or you can add multiple mappings when you create an index.
Text field typeedit These fields are analyzed , that is they are passed through an analyzer to convert the string into a list of individual terms before being indexed. The analysis process allows Elasticsearch to search for individual words within each full text field.
Mapping is the process of defining how a document, and the fields it contains, are stored and indexed. Each document is a collection of fields, which each have their own data type. When mapping your data, you create a mapping definition, which contains a list of fields that are pertinent to the document.
I think you can use multi-field. With multi-field you can define analyzers (both indexing & searching) for each sub fields, and do the search on corresponding fields base on applications requirements. In general, index analyzer can be difference from field to field, the same for search analyzer.
{ "your_type" : { "properties":{ "body" : { "type" : "string", "index" : "analyzed", "index_analyzer" : "index_body_analyzer", "search_analyzer" : "search_body_analyzer", "fields" : { "mail" : { "type" : "string", "index" : "analyzed", "index_analyzer" : "index_bodymail_analyzer", "search_analyzer" : "search_bodymail_analyzer" }, "html": { "type" : "string", "index" : "analyzed", "index_analyzer" : "index_bodyhtml_analyzer", "search_analyzer" : "search_bodyhtml_analyzer" } } } } }
As I mentioned in the comments, what you want is not possible. Your requirement, in one sentence, is: have the same data analyzed in multiple ways, but searched as a single field because this would break the existing application.
-- body.html
-- body.email
body field ---- body.content --- all searched as "body"
...
-- body.destination
-- body.whatever
Your first option is multi-fields which has this exact purpose in mind: have the same data analyzed multiple ways. The problem is that you cannot search for "body"
and expect ES to search body.html
, body.email
... Even if this would be possible, you want to be searched with different analyzers. Again, not possible. This option requires you to change the application and search for each field in a multi_match
or in a query_string
.
Your second option - reincarnation of multi-fields
- will again not work because you cannot refer to body
and ES, in the background, to match mail
, content
etc.
Third option - using copy_to
- will not work because copying to another field "X" means indexing the data being copied will be analyzed with X
's analyzer, and this breaks your requirement of having the same data analyzed differently.
There could be a fourth option - "path": "just_name"
from multi_fields
- which at a first look it should work. Meaning, you can have 3 multi-fields (email, content, html) which all three have a body
sub-field. Having "path": "just_name"
allows you to search just for body
even if body
is a sub-field of multiple other fields. But this is not possible because this type of multi-fields will not accept different analyzers for the same body
.
Either way, you need to change something in your requirements, because they will not work they way you want it.
These being said, I'm curious to see what queries are you using in your application. It would be a simple change (yes, you will need to change your app) from querying body
field to querying body.*
in a multi_match
.
And I have another solution for you: create multiple indices, one index for each analyzer of your body
. For example, for mail
, content
and html
you define three indices:
PUT /multi_fields1
{
"mappings": {
"test": {
"properties": {
"body": {
"type": "string",
"index_analyzer": "whitespace",
"search_analyzer": "standard"
}
}
}
}
}
PUT /multi_fields2
{
"mappings": {
"test": {
"properties": {
"body": {
"type": "string",
"index_analyzer": "standard",
"search_analyzer": "standard"
}
}
}
}
}
PUT /multi_fields3
{
"mappings": {
"test": {
"properties": {
"body": {
"type": "string",
"index_analyzer": "keyword",
"search_analyzer": "standard"
}
}
}
}
}
You see that all of them have the same type
and the same field name - body
- but different index_analyzer
s. Then you define an alias:
POST _aliases
{
"actions": [
{"add": {
"index": "multi_fields1",
"alias": "multi"}},
{"add": {
"index": "multi_fields2",
"alias": "multi"}},
{"add": {
"index": "multi_fields3",
"alias": "multi"}}
]
}
Name your alias the same as your current index. The application doesn't need to change, it will use the same name for index search, but this name will not point to an index, but to an alias which in turn refers to your multiple indices. What needs to change is how you index the documents, because a html
documents needs to go in multi_fields1
index for example, an email
document needs to be index in multi_fields2
index etc.
Whatever solution you find/choose, your requirements need to change because the way you want it is not possible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With