Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is better for log analysis

I have to analyze Gzip compressed log files which are stored on a production server using Hadoop related tools .

I can't decide on how to do that, and what to use, here are some of the methods i thought about using (Feel free to recommend something else):

  • Flume
  • Kafka
  • Map reduce

Before i could do anything, i need to get the compressed files from the production server and process them then push them into Apache HBase

like image 807
Yaswanth Avatar asked Dec 21 '25 13:12

Yaswanth


1 Answers

Depending on the size of your logs (assuming that the computation won't fit on a single machine, i.e. requires a "big data" product), I think it might be most appropriate to go with Apache Spark. Given that you don't know much about the ecosystem it might be best to go with Databricks Cloud, which will give you a straightforward way of reading your logs from HDFS and analyzing using Spark transformations in a visual way (with a Notebook).

You can find this video on the link above.
There's a free trial so you can see how that would go and then decide.

PS I'm in no way affiliated with Databricks. Just think they have a great product, that's all :)

like image 86
Marko Bonaci Avatar answered Dec 24 '25 03:12

Marko Bonaci



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!