Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does elasticsearch store data

I am learning elasticsearch and have written a couple of simple programs to insert, update, delete data.

I have read that elastic search always stores data in json format.

I looked at the "data" folder in my elasticsearch installation and I could not find any json format files even though I did a few insertion operations. I could see some files with .st extenstion.

So where does elasticsearch actually store the data in json format ?

like image 791
user496934 Avatar asked Aug 02 '19 14:08

user496934


People also ask

Where is Elasticsearch data stored?

By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure. For example, text fields are stored in inverted indices, and numeric and geo fields are stored in BKD trees.

Does Elasticsearch store data in memory?

Elasticsearch does not store all data on the heap. Instead data is read from disk when required and the heap is basically used as working memory. This is why the heap should be as most 50% of available RAM (ideally as small as the use case allows).

How does Elasticsearch work with database?

Elasticsearch uses Lucene StandardAnalyzer for indexing for automatic type guessing and more precision. When you use Elasticsearch you store data in JSON document form. Then you query them for retrieval. It is schema-less, using some defaults to index the data unless you provide mapping as per your need.

What kind of data can be stored in Elasticsearch?

There are two types of data you might want to store in Elasticsearch: Your JSON documents, containing numbers, lists, text, geo coordinates, and all the other formats Elasticsearch supports. Binary data.


1 Answers

Elastic uses lucene (https://lucene.apache.org/core/) under the hood.

Lucene is a text search engine. It stores text in a custom binary format optimized for retrieval purposes. The format is highly optimized and complicated.

Lucenes uses the concept of "indices containing documents". Internally every index consists of several segments. Segments are saved in several files in the file system. Documents are split up in several lookup structures, residing in the files.

When you browse the data folder of elastic you see this lucene index and segment structure. There is no storage of json formatted data on the file system level. Instead the files contain optimized binary data and you need to pass through the elastic API to get a JSON representation of a document.

like image 113
c_froehlich Avatar answered Oct 13 '22 14:10

c_froehlich