Elastic search, multiple indexes vs one index and types for different data sets?

People also ask

Is it a good idea to have multiple indexes?

Yes you can have too many indexes as they do take extra time to insert and update and delete records, but no more than one is not dangerous, it is a requirement to have a system that performs well.

Can index have multiple types Elasticsearch?

You can't. One type per index only.

What is the difference between index and type in Elasticsearch?

In RDBMS terms, index is a database and type can be a table which contains many rows( document in elasticsearch).

How do I query multiple indices in Elasticsearch?

To search multiple data streams and indices, add them as comma-separated values in the search API's request path. The following request searches the my-index-000001 and my-index-000002 indices. You can also search multiple data streams and indices using an index pattern.

There are different implications to both approaches.

Assuming you are using Elasticsearch's default settings, having 1 index for each model will significantly increase the number of your shards as 1 index will use 5 shards, 5 data models will use 25 shards; while having 5 object types in 1 index is still going to use 5 shards.

Implications for having each data model as index:

Efficient and fast to search within index, as amount of data should be smaller in each shard since it is distributed to different indices.
Searching a combination of data models from 2 or more indices is going to generate overhead, because the query will have to be sent to more shards across indices, compiled and sent back to the user.
Not recommended if your data set is small since you will incur more storage with each additional shard being created and the performance gain is marginal.
Recommended if your data set is big and your queries are taking a long time to process, since dedicated shards are storing your specific data and it will be easier for Elasticsearch to process.

Implications for having each data model as an object type within an index:

More data will be stored within the 5 shards of an index, which means there is lesser overhead issues when you query across different data models but your shard size will be significantly bigger.
More data within the shards is going to take a longer time for Elasticsearch to search through since there are more documents to filter.
Not recommended if you know you are going through 1 terabytes of data and you are not distributing your data across different indices or multiple shards in your Elasticsearch mapping.
Recommended for small data sets, because you will not waste storage space for marginal performance gain since each shard take up space in your hardware.

If you are asking what is too much data vs small data? Typically it depends on the processor speed and the RAM of your hardware, the amount of data you store within each variable in your mapping for Elasticsearch and your query requirements; using many facets in your queries is going to slow down your response time significantly. There is no straightforward answer to this and you will have to benchmark according to your needs.

Although Jonathan's answer was correct at the time, the world has moved on and it now seems that the people behind ElasticSearch have a long term plan to drop support for multiple types:

Where we want to get to: We want to remove the concept of types from Elasticsearch, while still supporting parent/child.

So for new projects, using only a single type per index will make the eventual upgrade to ElasticSearch 6.x be easier.

Jonathan's answer is great. I would just add few other points to consider:

number of shards can be customized per solution you select. You may have one index with 15 primary shards, or split it to 3 indexes for 5 shards - performance perspective won't change (assuming data are distributed equally)
think about data usage. Ie. if you use kibana to visualize, it's easier to include/exclude particular index(es), but types has to be filtered in dashboard
data retention: for application log/metric data, use different indexes if you require different retention period

Both the above answers are great!

I am adding an example of several types in an index. Suppose you are developing an app to search for books in a library. There are few questions to ask to the Library owner,

Questions:

How many books are you planning to store?
What kind of books are you going to store in the library?
How are you going to search for books?

Answers:

I am planning to store 50 k – to 70 k books (approximately)
I will have 15 k -20 k technology related books (computer science, mechanical engineering, chemical engineering and so on), 15 k of historical books, 10 k of medical science books. 10 k of language related books (English, Spanish and so on)
Search by authors first name, author last name, year of publish, name of the publisher. (This gives you the idea about what information you should store in the index)

From the above answers we can say the schema in our index should look somewhat like this.

//This is not the exact mapping, just for the example

            "yearOfPublish":{
                "type": "integer"
            },
            "author":{
                "type": "object",
                "properties": {
                    "firstName":{
                        "type": "string"
                    },
                    "lastName":{
                        "type": "string"
                    }
                }
            },
            "publisherName":{
                "type": "string"
            }
        }

In order to achieve the above we can create one index called Books and can have various types.

Index: Book

Types: Science, Arts

(Or you can create many types such as Technology, Medical Science, History, Language, if you have lot more books)

Important thing to note here is the schema is similar but the data is not identical. And the other important thing is the total data you are storing.

Hope the above helps when to go for different types in an Index, if you have different schema you should consider different index. Small index for less data . big index for big data :-)

Related questions
                            
                                Reset the database (purge all), then seed a database
                            
                                What is the difference between a database and a data warehouse?
                            
                                What's the recommended way to connect to MySQL from Go?
                            
                                What is database pooling?
                            
                                Difference between 3NF and BCNF in simple terms (must be able to explain to an 8-year old)
                            
                                What is NoSQL, how does it work, and what benefits does it provide? [closed]
                            
                                How do ACID and database transactions work?
                            
                                What is the difference between an ORM and an ODM?
                            
                                Implementing Comments and Likes in database
                            
                                Is there any NoSQL data store that is ACID compliant?
                            
                                Postgres: clear entire database before re-creating / re-populating from bash script
                            
                                How to do a batch insert in MySQL
                            
                                Bulk insert with SQLAlchemy ORM
                            
                                Foreign Key naming scheme
                            
                                Pros and Cons of SQLite and Shared Preferences [closed]
                            
                                Right query to get the current number of connections in a PostgreSQL DB
                            
                                Find duplicate records in MongoDB
                            
                                How to empty a redis database?
                            
                                IN vs OR in the SQL WHERE Clause
                            
                                SQL, Postgres OIDs, What are they and why are they useful?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Elastic search, multiple indexes vs one index and types for different data sets?

Tags:

database

search

elasticsearch

People also ask

Recent Activity

Donate For Us