We are dealing with large log files from several servers that we add on HDFS. We currently have a good, batch solution (mainly moving and writing the files each day), and want to implement a realtime solution with Kafka.
Basically, we need to put the logs from Nginx into Kafka, then write a consumer to write on HDFS (this could be done with the HDFS consumer https://github.com/kafka-dev/kafka/tree/master/contrib/hadoop-consumer).
Which approach would you recommend to move logs into Kafka ?
any other idea ?
By default, the Nginx access log is located at /var/log/nginx/access. log and the error log is located at /var/log/nginx/error. log . Nginx logs file default path depends on the operating system and installation.
By default, NGINX writes its events in two types of logs - the error log and the access log. In most of the popular Linux distro like Ubuntu, CentOS or Debian, both the access and error log can be found in /var/log/nginx , assuming you have already enabled the access and error logs in the core NGINX configuration file.
Kafka Connect and other Confluent Platform components use the Java-based logging utility Apache Log4j to collect runtime data and record component events.
I know this is an old question. But recently, I need to also do the same thing.
The problem of tail -f producer
is on log rotation and when tail dies, you don't really know which lines has been sent to Kafka.
As of nginx 1.7.1, access_log directive can log to syslog. Please see http://nginx.org/en/docs/syslog.html. We leverage that to log to rsyslog and from rsyslog to Kafka. http://www.rsyslog.com/doc/master/configuration/modules/omkafka.html
It's a little round-about way to doing it, but this way, there's no less chance for logs to be missing. Also if you're using CentOS, rsyslog comes with it standard anyway.
So in short, here's the setup I feel best option to put nginx log to kafka:
nginx -> rsyslog -> kafka
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With