Solr has the built-in "Analysis Screen", which helps to debug the interplay between tokenizers and filters for specific field types:
Is there a plugin for ElasticSearch that does something similar? Specifically, I want to see the input/ output of each filter, not only the end result of the analysis chain. I used Google quite intensively on this, but didn't find anything.
https://www.found.no/play/#analysis contains exactly the feature I want (scroll down to "myAnalyzer"), but unfortunately it's not something I can run on my index. But it shows that such a feature is possible.
Edit: I know there are many plugins that show me the output for a complete chain of filters, for example kopf as suggested by user @Bass:
This is not what I want! I want to see the output of each filter, not only the end result.
Solr has more advantages when it comes to the static data, because of its caches and the ability to use an uninverted reader for faceting and sorting – for example, e-commerce. On the other hand, Elasticsearch is better suited – and much more frequently used – for timeseries data use cases, like log analysis use cases.
Performance-wise, they are roughly the same. Operationally, Elasticsearch is a bit simpler to work with, it has just a single process. Solr, in its Elasticsearch-like fully distributed deployment mode known as SolrCloud, depends on Apache ZooKeeper.
1 Ingest and Query services. The Elasticsearch query process is structured very similarly to the Solr service. The main difference lies in the microservice architecture of the system, and the exits to the Elasticsearch and the ZooKeeper administrative functions, rather than to Solr and the monolithic search server.
In February 2021, Solr was established as a separate Apache project (TLP), independent from Lucene. In May 2022, Solr 9.0 was released, as the first release independent from Lucene, requiring Java 11, and with highlights such as KNN "Neural" search, better modularization, more security plugins and more.
There is one standalone tool called elyzer made by the nice folks at OpenSource Connections. That tool will show you the state of your tokens at any step (char filter, tokenizer, token filter) of the analysis process and it is very simple to use.
Installing it is very simple via pip install elyzer
and then you can use it as a command-line tool, e.g.
$ elyzer --es "http://localhost:9200" --index tmdb --analyzer english_bigrams --text "Mary had a little lamb"
TOKENIZER: standard
{1:Mary} {2:had} {3:a} {4:little} {5:lamb}
TOKEN_FILTER: standard
{1:Mary} {2:had} {3:a} {4:little} {5:lamb}
TOKEN_FILTER: lowercase
{1:mary} {2:had} {3:a} {4:little} {5:lamb}
TOKEN_FILTER: porter_stem
{1:mari} {2:had} {3:a} {4:littl} {5:lamb}
TOKEN_FILTER: bigram_filter
{1:mari had} {2:had a} {3:a littl} {4:littl lamb}
I've used Inquisitor in the past to test out tokenizers and filters. It sits on top of the Elasticsearch analyze API and can be used from a web front end.
You should also try another plugin called elasticsearch-extended-analyze which returns the same token-level information as the Solr analysis page (though without the web front end).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With