Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store data in elasticsearch _source but not index it?

I am searching only by couple of fields but I want to be able to store the whole document in ES in order not to additional DB (MySQL) queries.

I tried adding index: no, store: no to whole objects/properties in the mapping but I'm still not sure if the fields are being indexed and add unnecessary overhead.

Let's say I've got books and each has an author. I want to search only by book title, but I want to be able to retrieve the whole document.

Is this okay:

mappings:
properties:
    title:
        type: string
        index: analyzed
    author:
        type: object
        index: no
        store: no
        properties:
            first_name:
                type: string
            last_name:
                type: string

Or should I rather do:

mappings:
properties:
    title:
        type: string
        index: analyzed
    author:
        type: object
        properties:
            first_name:
                index: no
                store: no
                type: string
            last_name:
                index: no
                store: no
                type: string

Or maybe I am doing it completely wrong? And what about nested properties that should not be indexed?

like image 641
pinkeen Avatar asked Apr 10 '15 12:04

pinkeen


People also ask

Can we store data in Elasticsearch?

Instead of storing information as rows of columnar data, Elasticsearch stores complex data structures that have been serialized as JSON documents. When you have multiple Elasticsearch nodes in a cluster, stored documents are distributed across the cluster and can be accessed immediately from any node.

How does Elasticsearch store its data?

Elasticsearch stores data as JSON documents. Each document correlates a set of keys (names of fields or properties) with their corresponding values (strings, numbers, Booleans, dates, arrays of values, geolocations, or other types of data).

Can Elasticsearch persist data?

So yes: you are able to store your data in Elasticsearch and retrieve it too. It's a document store as well.

Does Elasticsearch store data in memory or disk?

Elasticsearch indexes are just files and they effectively cached in RAM by system. Usually if you have enough RAM Elasticsearch should work as fast as possible, especially for GET queries.


1 Answers

By default the _source of the document is stored regardless of the fields that you choose to index. The _source is used to return the document in the search results, whereas the fields that are indexed are used for searching.

You can't set index: no on an object to prevent all fields in an object being indexed, but you can do what you want with Dynamic Templates using path_match property to apply the index: no setting to every field within an object. Here is a simple example.

Create an index with your mapping that includes the dynamic templates for the author object and the nested categories object:

POST /shop
{
    "mappings": {
        "book": {
            "dynamic_templates": [
                {
                    "author_object_template": {
                        "path_match": "author.*",
                        "mapping": {
                            "index": "no"
                        }
                    }
                },
                {
                    "categories_object_template": {
                        "path_match": "categories.*",
                        "mapping": {
                            "index": "no"
                        }
                    }
                }
            ],
            "properties": {
                "categories": {
                    "type": "nested"
                }
            }
        }
    }
}

Index a document:

POST /shop/book/1
{
    "title": "book one",
    "author": {
        "first_name": "jon",
        "last_name": "doe"
    },
    "categories": [
        {
            "cat_id": 1,
            "cat_name": "category one"
        },
        {
            "cat_id": 2,
            "cat_name": "category two"
        }
    ]
}

If you searched on the title field with the search term book the document would be returned. If you search on the author.first_name or author.last_name, there won't be a match because this fields were not indexed:

POST /shop/book/_search
{
    "query": {
        "match": {
            "author.first_name": "jon"
        }
    }
}

The same would be the case for a nested query on the category fields:

POST /shop/book/_search
{
    "query": {
        "nested": {
            "path": "categories",
            "query": {
                "match": {
                    "categories.cat_name": "category"
                }
            }
        }
    }
}

Also you can use the Luke tool to expect the Lucene index and see what fields have been indexed.

like image 51
Dan Tuffery Avatar answered Sep 18 '22 15:09

Dan Tuffery