Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between fieldname and fieldname.raw in ELK?

I have been experimenting with ELK stack for a while now following a few resources on the web. But I didn't find any significant resource that clearly explains the difference between fieldname and fieldname.raw for a field with say name fieldname.

There is nothing much to try in this context but I did try and search this but no luck. The only primary understanding that I have on this is form Kibana window (which I don't know how to reproduce, sadly) that said: fieldname is an analyzed field. There was not such info regarding fieldname.raw

One other thing I noticed is that when I use fieldname.raw: "value" in the Kibana4 Discover it shows little more results than what I see fieldname: "value". I could not see which ones were missing since I had 559 and 554 results form these inputs, respectively.

I am guessing the suffix .raw says what it means - It might be a field from the logs itself without any intervention by Logstash. But I want to make sure if that is what it means. If so, then how (and more importantly, why?) did I get less results in an analyzed field? Is there anything that Logstash isn't doing right or is it some kind of misconfiguration? Any pointers are appreciated.

like image 519
mathakoot Avatar asked Jan 07 '23 23:01

mathakoot


1 Answers

Each field in elasticsearch has a mapping that describes the type and how it is to be analyzed for indexing.

By default, fields are strings and are analyzed (punctuation removed, words separated into token, etc). For example, a field named "path" with:

/var/log/messages

would become

["var", "log", "messages"]

which means you can no longer search for the original string, and any meaning in the punctuation has been lost.

This is a side-effect of using a text engine for log data.

Since every logstash user hits this almost immediately, the logstash team created a template that will configure a mapping for any index named "logstash-*".

This template defines a multi-field called "raw", which is set to "not_analyzed". So, you end up with two items in your index:

path: ["var", "log", "messages"]
path.raw: "/var/log/messages"

Very useful, especially for those previously-mentioned first-time users. You can use "path.raw" in kibana or other queries.

EDIT: a quick note about kibana: if you use an analyzed field, it will create an item for each token, so you'd end up with a pie chart with slices for "var", "log", and "messages".

Once you become more familiar with mappings and templates, you might consider making your basic fields not_analyzed, thus removing the need for ".raw" altogether. This also would allow you to use doc_values, which is another fun topic.

Good luck!

like image 147
Alain Collins Avatar answered Jan 17 '23 06:01

Alain Collins