Duplicate log entries with Google Cloud Stackdriver logging of Python code on Kubernetes Engine

Tags:

I have a simple Python app running in a container on Google Kubernetes Engine. I am trying to connect the standard Python logging to Google Stackdriver logging using this guide. I have almost succeeded, but I am getting duplicate log entries with one always at the 'error' level...

Screenshot of Stackdriver logs showing duplicate entries

This is my python code that set's up the logging according to the above guide:

import webapp2
from paste import httpserver
import rpc

# Imports the Google Cloud client library
import google.cloud.logging
# Instantiates a client
client = google.cloud.logging.Client()
# Connects the logger to the root logging handler; by default this captures
# all logs at INFO level and higher
client.setup_logging()

app = webapp2.WSGIApplication([('/rpc/([A-Za-z]+)', rpc.RpcHandler),], debug=True)
httpserver.serve(app, host='0.0.0.0', port='80')

Here's the code that triggers the logs from the screenshot:

import logging

logging.info("INFO Entering PostEchoPost...")
logging.warning("WARNING Entering PostEchoPost...")
logging.error("ERROR Entering PostEchoPost...")
logging.critical("CRITICAL Entering PostEchoPost...")

Here is the full Stackdriver log, expanded from the screenshot, with an incorrectly interpreted ERROR level:

{
 insertId:  "1mk4fkaga4m63w1"  
 labels: {
  compute.googleapis.com/resource_name:  "gke-alg-microservice-default-pool-xxxxxxxxxx-ttnz"   
  container.googleapis.com/namespace_name:  "default"   
  container.googleapis.com/pod_name:  "esp-alg-xxxxxxxxxx-xj2p2"   
  container.googleapis.com/stream:  "stderr"   
 }
 logName:  "projects/projectname/logs/algorithm"  
 receiveTimestamp:  "2018-01-03T12:18:22.479058645Z"  
 resource: {
  labels: {
   cluster_name:  "alg-microservice"    
   container_name:  "alg"    
   instance_id:  "703849119xxxxxxxxxx"   
   namespace_id:  "default"    
   pod_id:  "esp-alg-xxxxxxxxxx-xj2p2"    
   project_id:  "projectname"    
   zone:  "europe-west1-b"    
  }
  type:  "container"   
 }
 severity:  "ERROR"  
 textPayload:  "INFO Entering PostEchoPost...
"  
 timestamp:  "2018-01-03T12:18:20Z"  
}

Here is the the full Stackdriver log, expanded from the screenshot, with a correctly interpreted INFO level:

{
 insertId:  "1mk4fkaga4m63w0"  
 jsonPayload: {
  message:  "INFO Entering PostEchoPost..."   
  thread:  140348659595008   
 }
 labels: {
  compute.googleapis.com/resource_name:  "gke-alg-microservi-default-pool-xxxxxxxxxx-ttnz"   
  container.googleapis.com/namespace_name:  "default"   
  container.googleapis.com/pod_name:  "esp-alg-xxxxxxxxxx-xj2p2"   
  container.googleapis.com/stream:  "stderr"   
 }
 logName:  "projects/projectname/logs/algorithm"  
 receiveTimestamp:  "2018-01-03T12:18:22.479058645Z"  
 resource: {
  labels: {
   cluster_name:  "alg-microservice"    
   container_name:  "alg"    
   instance_id:  "703849119xxxxxxxxxx"    
   namespace_id:  "default"    
   pod_id:  "esp-alg-xxxxxxxxxx-xj2p2"    
   project_id:  "projectname"    
   zone:  "europe-west1-b"    
  }
  type:  "container"   
 }
 severity:  "INFO"  
 timestamp:  "2018-01-03T12:18:20.260099887Z"  
}

So, this entry might be the key:

container.googleapis.com/stream:  "stderr"

It looks like in addition to my logging set-up working, all logs from the container are being send to stderr in the container, and I believe that by default, at least on Kubernetes Container Engine, all stdout/stderr are picked up by Google Stackdriver via FluentD... Having said that, I'm out of my depth at this point.

Any ideas why I am getting these duplicate entries?

216

asked Jan 03 '18 13:01

bplisted

1 Answers

I solved this problem by overwriting the handlers property on my root logger immediately after calling the setup_logging method

import logging
from google.cloud import logging as gcp_logging
from google.cloud.logging.handlers import CloudLoggingHandler, ContainerEngineHandler, AppEngineHandler

logging_client = gcp_logging.Client()
logging_client.setup_logging(log_level=logging.INFO)
root_logger = logging.getLogger()
# use the GCP handler ONLY in order to prevent logs from getting written to STDERR
root_logger.handlers = [handler
                        for handler in root_logger.handlers
                        if isinstance(handler, (CloudLoggingHandler, ContainerEngineHandler, AppEngineHandler))]

To elaborate on this a bit, the client.setup_logging method sets up 2 handlers, a normal logging.StreamHandler and also a GCP-specific handler. So, logs will go to both stderr and Cloud Logging. You need to remove the stream handler from the handlers list to prevent the duplication.

EDIT: I have filed an issue with Google to add an argument to to make this less hacky.

170

answered Sep 29 '22 05:09

Andy Carlson

Related questions
                            
                                Simplest way to run Sphinx on one python file
                            
                                Getting the function for a compiled function object
                            
                                PyOpenCl: how to debug segmentation fault?
                            
                                Memory leak when using strings < 128KB in Python?
                            
                                TF-IDF implementations in python
                            
                                Copy file if it doesn't already exist [duplicate]
                            
                                How does the python interpreter know when to compile and update a .pyc file?
                            
                                Sharing static global data among processes in a Gunicorn / Flask app
                            
                                Pylint: Avoid checking INSIDE DOCSTRINGS (global directive / rcfile)
                            
                                Does Python support object literal property value shorthand, a la ECMAScript 6?
                            
                                Using lxml to parse namepaced HTML?
                            
                                How to prepare a dataset for Keras?
                            
                                Python scikit learn n_jobs
                            
                                Python glob.glob always returns empty list
                            
                                Python ggplot- ggsave function not defined
                            
                                Persistence Database(MySQL/MongoDB/Cassandra/BigTable/BigData) Vs Non-Persistence Array (PHP/PYTHON)
                            
                                Python - Cerberus, jsonschema, voluptous - Which one will be appropriate? [closed]
                            
                                Integrate Python based TensorFlow into a .NET application [closed]
                            
                                Access webcam using OpenCV (Python) in Docker?
                            
                                Why doesn't tempfile.SpooledTemporaryFile implement readable, writable, seekable?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Duplicate log entries with Google Cloud Stackdriver logging of Python code on Kubernetes Engine

Tags:

python

logging

google-kubernetes-engine

google-cloud-stackdriver

bplisted

People also ask

1 Answers

Andy Carlson

Recent Activity

Donate For Us