I am trying to search for the word blue in the below list of text
"BlueSaphire","Bluo","alue","blue", "BLUE", "Blue","Blue Black","Bluo","Saphire Blue", "black" , "green","bloo" , "Saphireblue"
SearchQuery searchQuery = new NativeSearchQueryBuilder().withIndices("color")
.withQuery(matchQuery("colorDescriptionCode", "blue")
.fuzziness(Fuzziness.ONE)
)
.build();
This works fine and the search result returns the below records along with the scores
alue 2.8718023
Bluo 1.7804208
Bluo 1.7804208
BLUE 1.2270637
blue 1.2270637
Blue 1.2270637
Blue Black 1.1082436
Saphire Blue 0.7669148
But I am not able to make wild card work . "SaphireBlue" and "BlueSaphire" is also expected to be part of the result
I tried the below setting but it does not work .
SearchQuery searchQuery = new NativeSearchQueryBuilder().withIndices("color")
.withQuery(matchQuery("colorDescriptionCode", "(.*?)blue")
.fuzziness(Fuzziness.ONE)
)
.build();
In stack overflow , I observed a solution to specify analyze wild card .
QueryBuilder queryBuilder = boolQuery().should(
queryString("blue").analyzeWildcard(true)
.field("colorDescriptionCode", 2.0f);
I dont find the queryString static method . I am using spring-data-elasticsearch 2.0.0.RELEASE .
Let me know how i can specify the wild card so the all words containing blue will also be returned in the search results
I know that working examples are always better than theory, but still, I would first like to tell a little theory. The heart of the Elasticsearch is Lucene. So before document will be written to Lucene index, he goes through analysis stage. The analysis stage can be divided into 3 parts:
In the first stage, we can throw away unwanted characters, for example, HTML tags. More information about character filters, you can find on official site. Next stage is far more interesting. Here we split input text to tokens, which will be used later for searching. A few very useful tokenizers:
"fo", "or", "r ", " e", "ex", "for", "or ex"
etc. The length of n-gram is variable and can be configured by min_gram and max_gram params. "fo", "for", "for ", "for e", "for ex", "for exa"
etc.
More information about tokenizers you can find on the official site. Unfortunately, I can't post more links because of low reputation.The next stage is also damn interesting. After we split text to tokens, we can do a lot of interesting things with this. Again I give a few very useful examples of token filters:
If you are interested in looking at the result of the analyzer, you can use this _termvectors endpoint
curl [ELASTIC_URL]:9200/[INDEX_NAME]/[TYPE_NAME]/[DOCUMENT_ID]/_termvectors?pretty
Now talk about queries. Queries are divided into 2 large groups. These groups have 2 significant differences:
Examples are the match query and term query. The first will pass the stage of analysis, the second not. The first will not give us a specific answer (but give us a score), the second will does. When creating mappings for a document, we can specify both the index of the analyzer and the search analyzer separately per field.
Now information regarding spring data elasticsearch. Here it makes sense to talk about concrete examples. Suppose that we have a document with a title field and we want to search for information on this field. First, create a file with settings for elasticsearch.
{
"analysis": {
"analyzer": {
"ngram_analyzer": {
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase"
]
},
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": [
"lowercase"
]
},
"english_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"english_stop",
"unique",
"english_possessive_stemmer",
"english_stemmer"
]
"keyword_analyzer": {
"tokenizer": "keyword",
"filter": ["lowercase"]
}
},
"tokenizer": {
"ngram_tokenizer": {
"type": "ngram",
"min_gram": 2,
"max_gram": 20
},
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20
}
},
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
}
}
}
You can save this settings to your resource folder. Now let's see to our document class
@Document(indexName = "document", type = "document")
@Setting(settingPath = "document_index_setting.json")
public class Document {
@Id
private String id;
@MultiField(
mainField = @Field(type = FieldType.String,
index = not_analyzed),
otherFields = {
@InnerField(suffix = "edge_ngram",
type = FieldType.String,
indexAnalyzer = "edge_ngram_analyzer",
searchAnalyzer = "keyword_analyzer"),
@InnerField(suffix = "ngram",
type = FieldType.String,
indexAnalyzer = "ngram_analyzer"),
searchAnalyzer = "keyword_analyzer"),
@InnerField(suffix = "english",
type = FieldType.String,
indexAnalyzer = "english_analyzer")
}
)
private String title;
// getters and setters omitted
}
So here field title with three inner fields:
title.edge_ngram
for searching by edge n-grams with keyword search analyzer. We need this because we don't need that our query be splitted to edge n-grams;title.ngram
for searching by n-grams;title.english
for searching with the nuances of a natural language
And main field title. We don't analyze this because sometimes we want to sort by this field.
Let's use simple multi match query for searching through all this fields:String searchQuery = "blablabla"; MultiMatchQueryBuilder queryBuilder = multiMatchQuery(searchQuery) .field("title.edge_ngram", 2) .field("title.ngram") .field("title.english"); NativeSearchQueryBuilder searchBuilder = new NativeSearchQueryBuilder() .withIndices("document") .withTypes("document") .withQuery(queryBuilder) .withPageable(new PageRequest(page, pageSize)); elasticsearchTemplate.queryForPage(searchBuilder.build, Document.class, new SearchResultMapper() { //realisation omitted });
Search is a very interesting and voluminous topic. I tried to answer as briefly as possible, it is possible that because of this there were confusing moments - do not hesitate to ask.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With