I'm parsing access logs generated by Apache, Nginx, Darwin (video streaming server) and aggregating statistics for each delivered file by date / referrer / useragent.
Tons of logs generated every hour and that number likely to be increased dramatically in near future - so processing that kind of data in distributed manner via Amazon Elastic MapReduce sounds reasonable.
Right now I'm ready with mappers and reducers to process my data and tested the whole process with the following flow:
I've done that manually according to thousands of tutorials that are googlable on the Internet about Amazon ERM.
What should I do next? What is a best approach to automate this process?
I think that this topic can be useful for many people who try to process access logs with Amazon Elastic MapReduce but were not able to find good materials and/or best practices.
UPD: Just to clarify here is the single final question:
What are best practices for logs processing powered by Amazon Elastic MapReduce?
Related posts:
Getting data in and out of Elastic MapReduce HDFS
Amazon EMR (previously called Amazon Elastic MapReduce) is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark , on AWS to process and analyze vast amounts of data.
Amazon EMR (previously known as Amazon Elastic MapReduce) is an Amazon Web Services (AWS) tool for big data processing and analysis. Amazon markets EMR as an expandable, low-configuration service that provides an alternative to running on-premises cluster computing.
How is Amazon's Elastic Map Reduce (EMR) different from a traditional database? O Queries are run in real time O Big data is stored in large object tables O Queries are dynamic O It applies the schema at the time of the query See what the community says and unlock a badge.
That's a very very wide open question, but here are some thoughts you could consider:
Hope that gives you some clues.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With