I have 6 instances of type m3.large.elasticsearch and storage type instance.
I don't really get what does Average, Minimum, Maximum ..mean here?
I am not getting any logs into my cluster right now although it shows FreeStorageSpace as 14.95GB here:
But my FreeStorageSpace graph for "Minimum" has reached zero!
What is happening here?
The Amazon Elasticsearch Service was renamed to Amazon OpenSearch Service on September 8th 2021 according to the official AWS open-source blog.
Running the cloudWatch metricset requires settings in AWS account, AWS credentials, and a running Elastic Stack. Elastic Stack includes Elasticsearch for storing and indexing the data, and Kibana for data exploration.
Amazon OpenSearch Service makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more. OpenSearch is an open source, distributed search and analytics suite derived from Elasticsearch.
Elasticsearch is a distributed search and analytics engine built on Apache Lucene. Since its release in 2010, Elasticsearch has quickly become the most popular search engine and is commonly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence use cases.
I was also confused by this. Minimum means size on single data node - one which has least free space. And Sum means size of entire cluster (summation of free space on all data nodes). Got this info from following link
http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-managedomains.html
We ran into the same confusion. Avg, Min, Max spreads the calculation across all nodes and Sum combines the Free/Used space for the whole cluster.
We had assumed that Average FreeStorageSpace means average free storage space of the whole cluster and set an alarm keeping the following calculation in mind:
Hence we had an average utilization of 10 TB at any point of time. Assuming, we will go 2x - i.e. 20 TB our actual storage need as per https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/sizing-domains.html#aes-bp-storage was with replication factor of 2 is:
(20 * 2 * 1.1 / 0.95 / 0.8) = 57.89 =~ 60 TB
So we provisioned 18 X 3.8 TB instances =~ 68 TB to accomodated 2x = 60 TB
So we had set an alarm that if we go below 8 TB free storage - it means we have hit our 2x limit and should scale up. Hence we set the alarm
FreeStorageSpace <= 8388608.00 for 4 datapoints within 5 minutes + Statistic=Average + Duration=1minute
FreeStorageSpace is in MB hence - 8 TB = 8388608 MB.
But we immediately got alerted because our average utilization per node was below 8 TB.
After realizing that to get accurate storage you need to do FreeStorageSpace sum for 1 min - we set the alarm as
FreeStorageSpace <= 8388608.00 for 4 datapoints within 5 minutes + Statistic=Sum + Duration=1minute
The above calculation checked out and we were able to set the right alarms.
The same applies for ClusterUsedSpace calculation.
You should also track the actual free space percent using Cloudwatch Math:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With