Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a Elasticsearch plugin similar to the Solr analysis tool?

Solr has the built-in "Analysis Screen", which helps to debug the interplay between tokenizers and filters for specific field types:

enter image description here

Is there a plugin for ElasticSearch that does something similar? Specifically, I want to see the input/ output of each filter, not only the end result of the analysis chain. I used Google quite intensively on this, but didn't find anything.

https://www.found.no/play/#analysis contains exactly the feature I want (scroll down to "myAnalyzer"), but unfortunately it's not something I can run on my index. But it shows that such a feature is possible.

Edit: I know there are many plugins that show me the output for a complete chain of filters, for example kopf as suggested by user @Bass:

enter image description here

This is not what I want! I want to see the output of each filter, not only the end result.

like image 236
Martin Loetzsch Avatar asked Dec 03 '14 08:12

Martin Loetzsch


People also ask

Is Elasticsearch better than Solr?

Solr has more advantages when it comes to the static data, because of its caches and the ability to use an uninverted reader for faceting and sorting – for example, e-commerce. On the other hand, Elasticsearch is better suited – and much more frequently used – for timeseries data use cases, like log analysis use cases.

Is Solr faster than Elasticsearch?

Performance-wise, they are roughly the same. Operationally, Elasticsearch is a bit simpler to work with, it has just a single process. Solr, in its Elasticsearch-like fully distributed deployment mode known as SolrCloud, depends on Apache ZooKeeper.

What is the main architectural difference between Elasticsearch and Solr?

1 Ingest and Query services. The Elasticsearch query process is structured very similarly to the Solr service. The main difference lies in the microservice architecture of the system, and the exits to the Elasticsearch and the ZooKeeper administrative functions, rather than to Solr and the monolithic search server.

Is Solr still used?

In February 2021, Solr was established as a separate Apache project (TLP), independent from Lucene. In May 2022, Solr 9.0 was released, as the first release independent from Lucene, requiring Java 11, and with highlights such as KNN "Neural" search, better modularization, more security plugins and more.


2 Answers

There is one standalone tool called elyzer made by the nice folks at OpenSource Connections. That tool will show you the state of your tokens at any step (char filter, tokenizer, token filter) of the analysis process and it is very simple to use.

Installing it is very simple via pip install elyzer and then you can use it as a command-line tool, e.g.

$ elyzer --es "http://localhost:9200" --index tmdb --analyzer english_bigrams --text "Mary had a little lamb"
TOKENIZER: standard
{1:Mary}    {2:had} {3:a}   {4:little}  {5:lamb}    
TOKEN_FILTER: standard
{1:Mary}    {2:had} {3:a}   {4:little}  {5:lamb}    
TOKEN_FILTER: lowercase
{1:mary}    {2:had} {3:a}   {4:little}  {5:lamb}    
TOKEN_FILTER: porter_stem
{1:mari}    {2:had} {3:a}   {4:littl}   {5:lamb}    
TOKEN_FILTER: bigram_filter
{1:mari had}    {2:had a}   {3:a littl} {4:littl lamb}  
like image 108
Val Avatar answered Sep 28 '22 01:09

Val


I've used Inquisitor in the past to test out tokenizers and filters. It sits on top of the Elasticsearch analyze API and can be used from a web front end.

You should also try another plugin called elasticsearch-extended-analyze which returns the same token-level information as the Solr analysis page (though without the web front end).

like image 41
Peter Dixon-Moses Avatar answered Sep 28 '22 03:09

Peter Dixon-Moses