My app is hosted on an Amazon EC2 cluster. Each instance writes events to log files. I need to collect (and data mine) over these logs at the end of each day. What's a recommended way to collect these logs in a central location? I have thought of several options, not sure which way to go:
You can use Amazon CloudWatch Logs to monitor, store, and access your log files from Amazon Elastic Compute Cloud (Amazon EC2) instances, AWS CloudTrail, Route 53, and other sources.
We use Logstash on each host (deployed via Puppet) to gather and ship log events to a message queue (RabbitMQ, but could be Redis) on a central host. Another Logstash instance retrieves the events, processes them and stuffs the result into ElasticSearch. A Kibana web interface is used to search through this database.
It's very capable, scales easily and is very flexible. Logstash has tons of filters to process events from various inputs, and can output to lots of services, ElasticSearch being one of them. We currently ship about 1,2 million log events per day from our EC2 instances, on light hardware. The latency for a log event from event to searchable is about 1 second in our setup.
Here's some documentation on this kind of setup: https://www.elastic.co/guide/en/logstash/current/getting-started-with-logstash.html, and a demo of the Kibana search interface with some live data.
This question is old now (December 2014) but still ranks highly during a Google search on this topic.
Amazon now provides a way to do some of this through CloudWatch. It has the capability to pattern-match the log message and trigger alarms based on things happening in the application. Depending on the nature of the data-mining that needs to be done, it may be possible to use their API to fetch the desired, aggregate events. See http://aws.amazon.com/blogs/aws/cloudwatch-log-service/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With