I am developing an AngularJS app with a Java/Spring Boot API. It uses Spring Data Elasticsearch to provide access to Elasticsearch's Search API for searching. Here is an example: <pre class="prettyprint"><code>Page<Address> page = addressSearchRepository.search(simpleQueryStringQuery(query), pageable); </code></pre> The variable <code>query</code> is a user's search string. <code>pageable</code> is an object that specifies page number, page size, and sorting. I can use <code>QueryBuilders</code> to build other Elasticsearch queries and expose them as different API endpoints. Another option is to use <code>QueryBuilders.wrapperQuery</code> and send Elasticsearch queries directly from JavaScript. Here is an example where <code>jsonQuery</code> is a string containing a full Elasticsearch query: <pre class="prettyprint"><code>Page<Address> page = addressSearchRepository.search(wrapperQuery(jsonQuery), pageable); </code></pre> This would be a secure endpoint that only authenticated users can access. This seems to be equivalent to exposing an Elasticsearch index's Search API directly. Assuming that any data in the index is safe to show the user, would this be a security risk? In my research so far I've found that it may be possible to crash Elasticsearch using a query, but it isn't that big of a problem in newer versions: https://www.elastic.co/blog/found-crash-elasticsearch#arbitrary-large-size-parameter Maybe limiting the page size or using the scan and scroll API when the page size is very large would mitigate this. I know that script fields should be avoided at all costs, but they are disabled by default (as of v1.4.3).

You can still crash Elasticsearch if you know how to do it. For example, if you start building a 10 deep nested aggregations, you might very well go and take a break. It will either take a lot of time, or be very expensive, use a lot of memory, make the JVM do a lot of garbage collection (which basically freezes all other threads running in the JVM), reclaim back small amounts of memory. It can make the cluster unresponsive in this way. I'm not saying that whatever aggregations you take and create a 10 deep nested aggregations you'll cripple the cluster, but under normal circumstances a cluster built for a certain SLA and deal with a certain amount of data, given some heavy aggregations (for example <code>terms</code> on <code>analyzed</code> string fields), will be very highly computational for the nodes. Maybe the nodes will not run out of memory, but the nodes will barely be responsive. Elastic's team is trying to implement other circuit breakers and to add default limits to certain types of queries and aggregations (a huge task). But if your aim is for your users not to crash ES, while they have full access to all queries, I think there are ways to crash it. I, personally, wouldn't expose ES and let my users do whatever they want with whatever queries they create. Depending on how your <code>wrapper</code> is configured, I'd only allow my users certain types of queries/aggregations and for those I'd impose some limits (applicable for those queries/aggs that accept limits).

Is it safe to expose the Elasticsearch Search API directly through your application's API?

Tags:

elasticsearch

I am developing an AngularJS app with a Java/Spring Boot API. It uses Spring Data Elasticsearch to provide access to Elasticsearch's Search API for searching. Here is an example:

Page<Address> page = addressSearchRepository.search(simpleQueryStringQuery(query), pageable);

The variable query is a user's search string. pageable is an object that specifies page number, page size, and sorting. I can use QueryBuilders to build other Elasticsearch queries and expose them as different API endpoints.

Another option is to use QueryBuilders.wrapperQuery and send Elasticsearch queries directly from JavaScript. Here is an example where jsonQuery is a string containing a full Elasticsearch query:

Page<Address> page = addressSearchRepository.search(wrapperQuery(jsonQuery), pageable);

This would be a secure endpoint that only authenticated users can access. This seems to be equivalent to exposing an Elasticsearch index's Search API directly. Assuming that any data in the index is safe to show the user, would this be a security risk?

In my research so far I've found that it may be possible to crash Elasticsearch using a query, but it isn't that big of a problem in newer versions: https://www.elastic.co/blog/found-crash-elasticsearch#arbitrary-large-size-parameter

Maybe limiting the page size or using the scan and scroll API when the page size is very large would mitigate this.

I know that script fields should be avoided at all costs, but they are disabled by default (as of v1.4.3).

494

asked Aug 23 '16 14:08

geraldhumphries

1 Answers

You can still crash Elasticsearch if you know how to do it. For example, if you start building a 10 deep nested aggregations, you might very well go and take a break. It will either take a lot of time, or be very expensive, use a lot of memory, make the JVM do a lot of garbage collection (which basically freezes all other threads running in the JVM), reclaim back small amounts of memory. It can make the cluster unresponsive in this way.

I'm not saying that whatever aggregations you take and create a 10 deep nested aggregations you'll cripple the cluster, but under normal circumstances a cluster built for a certain SLA and deal with a certain amount of data, given some heavy aggregations (for example terms on analyzed string fields), will be very highly computational for the nodes.

Maybe the nodes will not run out of memory, but the nodes will barely be responsive.

Elastic's team is trying to implement other circuit breakers and to add default limits to certain types of queries and aggregations (a huge task). But if your aim is for your users not to crash ES, while they have full access to all queries, I think there are ways to crash it. I, personally, wouldn't expose ES and let my users do whatever they want with whatever queries they create.

Depending on how your wrapper is configured, I'd only allow my users certain types of queries/aggregations and for those I'd impose some limits (applicable for those queries/aggs that accept limits).

193

answered Nov 04 '22 23:11

Andrei Stefan

Related questions
                            
                                Elasticsearch Spring Data with RestHighLevelClient
                            
                                Timelion split multiple times
                            
                                pod has unbound immediate PersistentVolumeClaims ECK (Elasticsearch on Kubernetes)
                            
                                Geo distance range filter in NEST?
                            
                                Microsoft Power BI and ElasticSearch
                            
                                Bulk insert to ElasticSearch with NEST
                            
                                line breaks or punctuation marks as position gaps in elasticsearch
                            
                                How add sorting to spring data elasticsearch
                            
                                Nest Elasticsearch, combining bool query of Must and Should
                            
                                my spark sql limit is very slow
                            
                                How can I retrieve matching children only?
                            
                                ElasticSearch full text search using Java API
                            
                                define analyzer globally (ES)
                            
                                hierarchical faceting with Elasticsearch
                            
                                Elasticsearch - Efficiency of search across multiple types
                            
                                elasticsearch-rails VS (re)tire gem (Elasticsearch and Rails 3.2)
                            
                                How to connect to remote server using Elasticsearch Node Client Java
                            
                                Elasticsearch search fails in field with special character and wildcard
                            
                                How to delete unassigned shards in elasticsearch?
                            
                                Very slow elasticsearch term aggregation. How to improve?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With