Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

best option to put Nginx logs into Kafka?

We are dealing with large log files from several servers that we add on HDFS. We currently have a good, batch solution (mainly moving and writing the files each day), and want to implement a realtime solution with Kafka.

Basically, we need to put the logs from Nginx into Kafka, then write a consumer to write on HDFS (this could be done with the HDFS consumer https://github.com/kafka-dev/kafka/tree/master/contrib/hadoop-consumer).

Which approach would you recommend to move logs into Kafka ?

  • We could write a nginx module, but it isn't that simple. This https://github.com/DemandCube/Sparkngin could give some clues.
  • Reading the logfiles (tail ...) looks like a bad idea as there is a useless write operation. Logstash would also require write/read operations before pushing to Kafka, which seems unnecessary.

any other idea ?

like image 724
Pixou Avatar asked Aug 22 '14 17:08

Pixou


People also ask

Where are NGINX logs kept?

By default, the Nginx access log is located at /var/log/nginx/access. log and the error log is located at /var/log/nginx/error. log . Nginx logs file default path depends on the operating system and installation.

What logging does NGINX use?

By default, NGINX writes its events in two types of logs - the error log and the access log. In most of the popular Linux distro like Ubuntu, CentOS or Debian, both the access and error log can be found in /var/log/nginx , assuming you have already enabled the access and error logs in the core NGINX configuration file.

Is Kafka using Log4j?

Kafka Connect and other Confluent Platform components use the Java-based logging utility Apache Log4j to collect runtime data and record component events.


1 Answers

I know this is an old question. But recently, I need to also do the same thing.

The problem of tail -f producer is on log rotation and when tail dies, you don't really know which lines has been sent to Kafka.

As of nginx 1.7.1, access_log directive can log to syslog. Please see http://nginx.org/en/docs/syslog.html. We leverage that to log to rsyslog and from rsyslog to Kafka. http://www.rsyslog.com/doc/master/configuration/modules/omkafka.html

It's a little round-about way to doing it, but this way, there's no less chance for logs to be missing. Also if you're using CentOS, rsyslog comes with it standard anyway.

So in short, here's the setup I feel best option to put nginx log to kafka:

nginx -> rsyslog -> kafka

like image 192
maresa Avatar answered Sep 28 '22 06:09

maresa