Good setup on AWS for ELK

Tags:

We are looking into getting an ELK stack setup on Amazon but we don't really know what we need of machines to handle it smoothly. Now I know that it will become obvious if it doesn't run smooth but still we hoped to get an idea on what we would need for our situation.

So we 4 servers that generate log files in a custom format. About ~45 million lines of logs each day, generating about 4 files of 600mb (gzipped) so around ~24GB of logs each day.

Now we are looking into the ELK stack and would like the dashboards of Kibana display realtime data, so I was thinking of logging using syslog to logstash.

4 Servers -> Rsyslog (on those 4 servers) -> Logstash (AWS) -> ElasticSearch (AWS) -> Kibana (AWS)

So now we need to figure out what kind of hardware we would need in AWS to handle this.

I read somewhere 3 masters for ElasticSearch and 2 datanodes at minimum. So that would total 5 servers + 1 server for Kibana and 1 for Logstash? So I would need a total of 7 servers to get started, but that kinda seems overkill? I would like to keep my data for 1 month, so 31 days at most, so I would have around ~1.4TB of raw logdata in Elastic Search (~45GB x 31)

But since I don't really have a clue on what the best setup would be, any hints/tips/info would be welcome.

Also a system or tool that would handle this for me (node failure, etc) could be useful.

Thanks in advance,

darkownage

893

asked Jul 01 '16 06:07

darkownage

2 Answers

Here's how I've architected my cloud clusters:

3 Master nodes - these nodes coordinate the cluster and keeping three of them helps tolerate failure. Ideally these will spread across availability zones. These can be fairly small and ideally do not receive any requests - their only job is to maintain the cluster. In this case set discovery.zen.minimum_master_nodes = 2 to maintain quorum. These IPs and these IPs only are what you should provide to all cluster nodes in discovery.zen.ping.unicast.hosts

Indexes: you should probably take advantage of daily indexes - see https://www.elastic.co/guide/en/elasticsearch/guide/current/time-based.html This will make more sense below but will also be beneficial if you begin to scale up - you can increase shard count over time without re-indexing.

Data Nodes: Depending on your scale or performance requirements there are a few options - i2.xlarge or d2.xlarge will work well but r3.2xlarge are also a good option. Make sure to keep the JVM heap <30GB. Keep the data paths on ephemeral drives local to the instances - EBS is not really so ideal for this use case but depending on your requirements might be sufficient. Be sure you have multiple data nodes so the replica shards can split across availability zones. As your data requirements increase, just scale these up.

Hot/Warm: Depending on the use case - it sometimes is beneficial to split your data nodes into Hot/Warm (Fast SSD/Slow HDD). This is mainly due to the fact that all writes are in realtime, and the majority of reads are on the past few hours. If you can move yesterday's data onto cheaper, slower drives, it helps out quite a bit. This is a little more involved but you can read more at https://www.elastic.co/blog/hot-warm-architecture. This requires adding some tags and using curator on a nightly basis but is generally worth it due to the cost savings of moving largely unsearched data off of more expensive SSD.

In production, I run ~20 r3.2xlarge for the hot tier and 4-5 d2.xlarge for the warm tier with a replication factor of 2 - this allows ~TB per day of ingest and a decent amount of retention. We scale Hot for volume and Warm for retention.

Overall - good luck! It's a fun stack to build and operate once everything is running smoothly.

PS - Depending on the time/resources you have available, you can run the managed elasticsearch service on AWS, but the last time i looked its ~60% more expensive than running it on your own instances, and YMMV.

112

answered Nov 04 '22 10:11

Matt Helgen

Seems like you need something to start with ELK Stack on AWS

Did u tried this couple of CloudFormation scripts, It would ease your installation process and will help you setup your environment in one go.

ELK-Cookbook - CloudFormation Script

ELK-Stack with Google OAuth in Private VPC

Comment below if this doesn't solves your problem.

answered Nov 04 '22 09:11

Murtaza Kanchwala

Related questions
                            
                                how to use ftp on amazon aws
                            
                                List instances in auto scaling group with boto
                            
                                How to create a windows instance from snapshot in AWS
                            
                                How to migrate EC2 Snapshots to other EC2 account [closed]
                            
                                Redis/Jedis no single point of failure and automated failover
                            
                                Upload to S3 from HTTPWebResponse.GetResponseStream() in c#
                            
                                Amazon S3: Strange Error -' Sometimes' SignatureDoesNotMatch, sometimes it does
                            
                                How to run a GPU instance using Amazon EC2 Panel?
                            
                                SSL support in AWS ElastiCache
                            
                                Amazon ElasticBeanStalk Worker Tier cannot connect to SQS
                            
                                AWS CLI delete cloudfront distribution - InvalidIfMatchVersion
                            
                                Reusing AWS::CloudFormation::Init (and userdata?) for multiple instances
                            
                                search text in dynamodb, break up tables
                            
                                Query all items in DynamoDB from a given hash key with a hash-range schema using java sdk
                            
                                AWS Confused Deputy - is "external id" really just a "password"?
                            
                                Does AWS Elastic Cache support Pub/Sub on Redis Cluster?
                            
                                Simple Amazon IAM policy for s3 using Rails and Paperclip
                            
                                Can I specify a default AWS configuration profile?
                            
                                Using AWS ebextensions, what is the proper way to pass an array of commands per their docs?
                            
                                Disable pm2 log creation in node js

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Good setup on AWS for ELK

Tags:

amazon-web-services

amazon-ec2

hardware

elastic-stack

darkownage

People also ask

2 Answers

Matt Helgen

Murtaza Kanchwala

Recent Activity

Donate For Us