I am learning elasticsearch
and have written a couple of simple programs to insert, update, delete data.
I have read that elastic search always stores data in json
format.
I looked at the "data
" folder in my elasticsearch installation and I could not find any json format files even though I did a few insertion operations.
I could see some files with .st
extenstion.
So where does elasticsearch actually store the data in json format ?
By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure. For example, text fields are stored in inverted indices, and numeric and geo fields are stored in BKD trees.
Elasticsearch does not store all data on the heap. Instead data is read from disk when required and the heap is basically used as working memory. This is why the heap should be as most 50% of available RAM (ideally as small as the use case allows).
Elasticsearch uses Lucene StandardAnalyzer for indexing for automatic type guessing and more precision. When you use Elasticsearch you store data in JSON document form. Then you query them for retrieval. It is schema-less, using some defaults to index the data unless you provide mapping as per your need.
There are two types of data you might want to store in Elasticsearch: Your JSON documents, containing numbers, lists, text, geo coordinates, and all the other formats Elasticsearch supports. Binary data.
Elastic uses lucene (https://lucene.apache.org/core/) under the hood.
Lucene is a text search engine. It stores text in a custom binary format optimized for retrieval purposes. The format is highly optimized and complicated.
Lucenes uses the concept of "indices containing documents". Internally every index consists of several segments. Segments are saved in several files in the file system. Documents are split up in several lookup structures, residing in the files.
When you browse the data folder of elastic you see this lucene index and segment structure. There is no storage of json formatted data on the file system level. Instead the files contain optimized binary data and you need to pass through the elastic API to get a JSON representation of a document.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With