Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it safe to expose the Elasticsearch Search API directly through your application's API?

I am developing an AngularJS app with a Java/Spring Boot API. It uses Spring Data Elasticsearch to provide access to Elasticsearch's Search API for searching. Here is an example:

Page<Address> page = addressSearchRepository.search(simpleQueryStringQuery(query), pageable);

The variable query is a user's search string. pageable is an object that specifies page number, page size, and sorting. I can use QueryBuilders to build other Elasticsearch queries and expose them as different API endpoints.

Another option is to use QueryBuilders.wrapperQuery and send Elasticsearch queries directly from JavaScript. Here is an example where jsonQuery is a string containing a full Elasticsearch query:

Page<Address> page = addressSearchRepository.search(wrapperQuery(jsonQuery), pageable);

This would be a secure endpoint that only authenticated users can access. This seems to be equivalent to exposing an Elasticsearch index's Search API directly. Assuming that any data in the index is safe to show the user, would this be a security risk?

In my research so far I've found that it may be possible to crash Elasticsearch using a query, but it isn't that big of a problem in newer versions: https://www.elastic.co/blog/found-crash-elasticsearch#arbitrary-large-size-parameter

Maybe limiting the page size or using the scan and scroll API when the page size is very large would mitigate this.

I know that script fields should be avoided at all costs, but they are disabled by default (as of v1.4.3).

like image 494
geraldhumphries Avatar asked Aug 23 '16 14:08

geraldhumphries


People also ask

What is search API in Elasticsearch?

Elasticsearch provides search API, which is used for searching the data across indexes and all types. It helps to search the data in Elasticsearch by executing the search query and get back the search result matched with the query. This API enables you to search the data within Elasticsearch.

Does Elasticsearch have an API?

One of the great things about Elasticsearch is its extensive REST API which allows you to integrate, manage and query the indexed data in countless different ways. Examples of using this API to integrate with Elasticsearch are abundant, spanning different companies and use cases.


1 Answers

You can still crash Elasticsearch if you know how to do it. For example, if you start building a 10 deep nested aggregations, you might very well go and take a break. It will either take a lot of time, or be very expensive, use a lot of memory, make the JVM do a lot of garbage collection (which basically freezes all other threads running in the JVM), reclaim back small amounts of memory. It can make the cluster unresponsive in this way.

I'm not saying that whatever aggregations you take and create a 10 deep nested aggregations you'll cripple the cluster, but under normal circumstances a cluster built for a certain SLA and deal with a certain amount of data, given some heavy aggregations (for example terms on analyzed string fields), will be very highly computational for the nodes.

Maybe the nodes will not run out of memory, but the nodes will barely be responsive.

Elastic's team is trying to implement other circuit breakers and to add default limits to certain types of queries and aggregations (a huge task). But if your aim is for your users not to crash ES, while they have full access to all queries, I think there are ways to crash it. I, personally, wouldn't expose ES and let my users do whatever they want with whatever queries they create.

Depending on how your wrapper is configured, I'd only allow my users certain types of queries/aggregations and for those I'd impose some limits (applicable for those queries/aggs that accept limits).

like image 193
Andrei Stefan Avatar answered Nov 04 '22 23:11

Andrei Stefan