The first error I encountered when indexing my data in ES 5.1 was my Completion Suggestion mapping which contained an output field.
message [MapperParsingException[failed to parse]; nested: IllegalArgumentException[unknown field name [output], must be one of [input, weight, contexts]];]
So I removed it but now a lot of my Auto completions are incorrect because it returns the input it matched instead of the single output String.
After some googling I found this article from ES which mentioned the following:
As suggestions are document-oriented, suggestion metadata (e.g. output) should now be specified as a field in the document. The support for specifying output when indexing suggestion entries has been removed. Now suggestion result entry’s text is always the un-analyzed value of the suggestion’s input (same as not specifying output while indexing suggestions in pre-5.0 indices).
I've found that the original value is withing the _source field that is returned with the suggestion, but it's not really a solution for me because the key and structure changes based on the original object it comes from.
I can add an extra 'output' field on the original object to but this isn't a solution for me either because in some cases I have a structure like this:
{
"id": "c2358e0c-7399-4665-ac2c-0bdd44597ac0",
"synonyms": ["All available colours", "Colors"],
"autoComplete": [{
"input": ["colours available all", "available colours all", "available all colours", "colours all available", "all available colours", "all colours available"]
}, {
"input": ["colors"]
}]
}
in ES 2.4 the structure was like this:
{
"id": "c2358e0c-7399-4665-ac2c-0bdd44597ac0",
"synonyms": ["All available colours", "Colors"],
"SmartSynonym": [{
"input": ["colours available all", "available colours all", "available all colours", "colours all available", "all available colours", "all colours available"],
"output": ["All available colours"]
}, {
"input": ["colors"],
"output": ["Colors"]
}]
}
This wasn't any problem when the 'output' field was present in every Autocomplete object.
How can I return the original value in ES 5.1 (ex. All available colours) when asking "colours available all" in an easy way without doing to much manual lookups.
Related Question from other user: Output field in autocomplete suggestion
We ended up removing the custom plugin from the original answer because it was hard to get it working in Elastic Cloud. Instead we just created a separate document for the autocompletions and removed them from all our other documents.
The object
public class Suggest{
/*
* Contains the actual value it needs to return
* iphone 8 plus, plus iphone 8, 8 plus iphone, ...
* will all result into iphone 8 plus for example
*/
private String autocompleteOutput;
/*
* Contains the field and all the values of that field to autocomplete
*/
private Map<String, AutoComplete> autoComplete;
@JsonCreator
Suggest() {
}
public Suggest(String autocompleteOutput, Map<String, AutoComplete> autoComplete) {
this.autocompleteOutput = autocompleteOutput;
this.autoComplete = autoComplete;
}
public String getAutocompleteOutput() {
return autocompleteOutput;
}
public void setAutocompleteOutput(String autocompleteOutput) {
this.autocompleteOutput = autocompleteOutput;
}
public Map<String, AutoComplete> getAutoComplete() {
return autoComplete;
}
public void setAutoComplete(Map<String, AutoComplete> autoComplete) {
this.autoComplete = autoComplete;
}
}
public class AutoComplete {
/*
* Contains the permutation values from the lucene filter (see original answer
*/
private String[] input;
@JsonCreator
AutoComplete() {
}
public AutoComplete(String[] input) {
this.input = input;
}
public String[] getInput() {
return input;
}
}
with the following mapping
{
"suggest": {
"dynamic_templates": [
{
"autocomplete": {
"path_match": "autoComplete.*",
"match_mapping_type": "*",
"mapping": {
"type": "completion",
"analyzer": "lowercase_keyword_analyzer"
}
}
}
],
"properties": {}
}
}
This allows us to use the autocompleteOutput field from the _source
After some research I ended up creating a new Elasticsearch 5.1.1 plugin
import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute;
import java.io.IOException;
import java.util.*;
/**
* Created by glenn on 13.01.17.
*/
public class PermutationTokenFilter extends TokenFilter {
private final CharTermAttribute charTermAtt;
private final PositionIncrementAttribute posIncrAtt;
private final OffsetAttribute offsetAtt;
private Iterator<String> permutations;
private int origOffset;
/**
* Construct a token stream filtering the given input.
*
* @param input
*/
protected PermutationTokenFilter(TokenStream input) {
super(input);
this.charTermAtt = addAttribute(CharTermAttribute.class);
this.posIncrAtt = addAttribute(PositionIncrementAttribute.class);
this.offsetAtt = addAttribute(OffsetAttribute.class);
}
@Override
public final boolean incrementToken() throws IOException {
while (true) {
//see if permutations have been created already
if (permutations == null) {
//see if more tokens are available
if (!input.incrementToken()) {
return false;
} else {
//Get value
String value = String.valueOf(charTermAtt);
//permute over buffer value and create iterator
permutations = permutation(value).iterator();
origOffset = posIncrAtt.getPositionIncrement();
}
}
//see if there are remaining permutations
if (permutations.hasNext()) {
//Reset the attribute to starting point
clearAttributes();
//use the next permutation
String permutation = permutations.next();
//add te permutation to the attributes and remove old attributes
charTermAtt.setEmpty().append(permutation);
posIncrAtt.setPositionIncrement(origOffset);
offsetAtt.setOffset(0,permutation.length());
//remove permutation from iterator
permutations.remove();
origOffset = 0;
return true;
}
permutations = null;
}
}
/**
* Changes the order of a multi value keyword so the completion suggester still knows the original value without
* tokenizing it if the users asks the words in a different order.
*
* @param value unpermuted value ex: Yellow Crazy Banana
* @return Permuted values ex:
* Yellow Crazy Banana,
* Yellow Banana Crazy,
* Crazy Yellow Banana,
* Crazy Banana Yellow,
* Banana Crazy Yellow,
* Banana Yellow Crazy
*/
private Set<String> permutation(String value) {
value = value.trim().replaceAll(" +", " ");
// Use sets to eliminate semantic duplicates (a a b is still a a b even if you switch the two 'a's in case one word occurs multiple times in a single value)
// Switch to HashSet for better performance
Set<String> set = new HashSet<String>();
String[] words = value.split(" ");
// Termination condition: only 1 permutation for a array of 1 word
if (words.length == 1) {
set.add(value);
} else if (words.length <= 6) {
// Give each word a chance to be the first in the permuted array
for (int i = 0; i < words.length; i++) {
// Remove the word at index i from the array
String pre = "";
for (int j = 0; j < i; j++) {
pre += words[j] + " ";
}
String post = " ";
for (int j = i + 1; j < words.length; j++) {
post += words[j] + " ";
}
String remaining = (pre + post).trim();
// Recurse to find all the permutations of the remaining words
for (String permutation : permutation(remaining)) {
// Concatenate the first word with the permutations of the remaining words
set.add(words[i] + " " + permutation);
}
}
} else {
Collections.addAll(set, words);
set.add(value);
}
return set;
}
}
This filter will take the original input token "All available colours" and permute it into all the possible combinations (see original question)
import org.apache.lucene.analysis.TokenStream;
import org.elasticsearch.index.analysis.AbstractTokenFilterFactory;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.index.IndexSettings;
/**
* Created by glenn on 16.01.17.
*/
public class PermutationTokenFilterFactory extends AbstractTokenFilterFactory {
public PermutationTokenFilterFactory(IndexSettings indexSettings, Environment environment, String name, Settings settings) {
super(indexSettings, name, settings);
}
public PermutationTokenFilter create(TokenStream input) {
return new PermutationTokenFilter(input);
}
}
This class is needed to provide the filter to the Elasticsearch plugin.
Follow this guide to setup the needed configuration for the Elasticsearch plugin.
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>be.smartspoken</groupId>
<artifactId>permutation-plugin</artifactId>
<version>5.1.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>Plugin: Permutation</name>
<description>Permutation plugin for elasticsearch</description>
<properties>
<lucene.version>6.3.0</lucene.version>
<elasticsearch.version>5.1.1</elasticsearch.version>
<java.version>1.8</java.version>
<log4j2.version>2.7</log4j2.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>${log4j2.version}</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>${log4j2.version}</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-test-framework</artifactId>
<version>${lucene.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>${lucene.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers-common</artifactId>
<version>${lucene.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>${elasticsearch.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
<build>
<resources>
<resource>
<directory>src/main/resources</directory>
<filtering>false</filtering>
<excludes>
<exclude>*.properties</exclude>
</excludes>
</resource>
</resources>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.6</version>
<configuration>
<appendAssemblyId>false</appendAssemblyId>
<outputDirectory>${project.build.directory}/releases/</outputDirectory>
<descriptors>
<descriptor>${basedir}/src/main/assemblies/plugin.xml</descriptor>
</descriptors>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.3</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
</configuration>
</plugin>
</plugins>
</build>
</project>
Make sure you use the correct Elasticsearch, Lucene and Log4J(2) version.in you pom.xml file and provide the correct configuration files
import be.smartspoken.plugin.permutation.filter.PermutationTokenFilterFactory;
import org.elasticsearch.index.analysis.TokenFilterFactory;
import org.elasticsearch.indices.analysis.AnalysisModule;
import org.elasticsearch.plugins.AnalysisPlugin;
import org.elasticsearch.plugins.Plugin;
import java.util.HashMap;
import java.util.Map;
/**
* Created by glenn on 13.01.17.
*/
public class PermutationPlugin extends Plugin implements AnalysisPlugin{
@Override
public Map<String, AnalysisModule.AnalysisProvider<TokenFilterFactory>> getTokenFilters() {
Map<String, AnalysisModule.AnalysisProvider<TokenFilterFactory>> extra = new HashMap<>();
extra.put("permutation", PermutationTokenFilterFactory::new);
return extra;
}
}
provide the factory to the plugin.
After you installed your new plugin you need to restart your Elasticsearch.
Add a new custom analyzer that "mocks" the functionality of 2.x
Settings.builder()
.put("number_of_shards", 2)
.loadFromSource(jsonBuilder()
.startObject()
.startObject("analysis")
.startObject("analyzer")
.startObject("permutation_analyzer")
.field("tokenizer", "keyword")
.field("filter", new String[]{"permutation","lowercase"})
.endObject()
.endObject()
.endObject()
.endObject().string())
.loadFromSource(jsonBuilder()
.startObject()
.startObject("analysis")
.startObject("analyzer")
.startObject("lowercase_keyword_analyzer")
.field("tokenizer", "keyword")
.field("filter", new String[]{"lowercase"})
.endObject()
.endObject()
.endObject()
.endObject().string())
.build();
Now the only you have to do is provide the custom analyzers to your object mapping
{
"my_object": {
"dynamic_templates": [{
"autocomplete": {
"path_match": "my.autocomplete.object.path",
"match_mapping_type": "*",
"mapping": {
"type": "completion",
"analyzer": "permutation_analyzer", /* custom analyzer */
"search_analyzer": "lowercase_keyword_analyzer" /* custom analyzer */
}
}
}],
"properties": {
/*your other properties*/
}
}
}
This will also improve performace because you don't have to wait for building the permutations anymore.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With