Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting logs twice in AWS lambda function

I'm attempting to create a centralized module to set up my log formatter to be shared across a number of python modules within my lambda function. This function will ultimately be run on AWS Greengrass on a local on-premise device.

For some reason, when I add in my own handler to format the messages the logs are being outputted twice - once at the correct log level and the second time at an incorrect level.

If I use the standard python logger without setting up any handlers it works fine e.g.

main.py:

import logging

logging.debug("test1")

cloudwatch logs:

12:28:42 [DEBUG]-main.py:38,test1

My objective is to have one formatter on my code which will format these log messages into JSON. They will then get ingested into a centralized logging database. However, when I do this I get the log messages twice.

loghelper.py:

def setup_logging(name):

    formatter = logging.Formatter("%(name)s, %(asctime)s, %(message)s")

    handler = logging.StreamHandler(sys.stdout)
    handler.setFormatter(formatter)

    logger = logging.getLogger(name)

    if logger.handlers:
        for handler in logger.handlers:
            logger.removeHandler(handler)

    logger.setLevel(logging.DEBUG)
    logger.addHandler(handler)
    return logger

main.py:

import logging

logger = loghelper.setup_logging('main.test_function')

def test_function():
    logger.debug("test function log statement")

test_function()

When the lambda function is now run I get the debug message twice in the cloud watch logs as follows:

cloudwatch logs:

12:22:53 [DEBUG]-main.py:5, test function log statement
12:22:53 [INFO]-__init__.py:880,main.test_function,2018-06-18 12:22:53,099, test function log statement

Notice that:

  • The first entry is at the correct level but in the wrong format.
  • The second entry reports the wrong level, the wrong module but is in the correct format.

I cannot explain this behavior and would appreciate any thoughts on what could be causing it. I also don't know which constructor exists at line 880. This may shed some light on what is happening.

References:

  1. Setting up a global formatter: How to define a logger in python once for the whole program?

  2. Clearing the default lambda log handlers: Using python Logging with AWS Lambda

  3. Creating a global logger: Python: logging module - globally

like image 319
Remotec Avatar asked Jun 18 '18 12:06

Remotec


2 Answers

AWS Lambda also sets up a handler, on the root logger, and anything written to stdout is captured and logged as level INFO. Your log message is thus captured twice:

  • By the AWS Lambda handler on the root logger (as log messages propagate from nested child loggers to the root), and this logger has its own format configured.
  • By the AWS Lambda stdout-to-INFO logger.

This is why the messages all start with (asctime) [(levelname)]-(module):(lineno), information; the root logger is configured to output messages with that format and the information written to stdout is just another %(message) part in that output.

Just don't set a handler when you are in the AWS environment, or, disable propagation of the output to the root handler and live with all your messages being recorded as INFO messages by AWS; in the latter case your own formatter could include the levelname level information in the output.

You can disable log propagation with logger.propagate = False, at which point your message is only going to be passed to your handler, not to to the root handler as well.

Another option is to just rely on the AWS root logger configuration. According to this excellent reverse engineering blog post the root logger is configured with:

logging.Formatter.converter = time.gmtime
logger = logging.getLogger()
logger_handler = LambdaLoggerHandler()
logger_handler.setFormatter(logging.Formatter(
    '[%(levelname)s]\t%(asctime)s.%(msecs)dZ\t%(aws_request_id)s\t%(message)s\n',
    '%Y-%m-%dT%H:%M:%S'
))
logger_handler.addFilter(LambdaLoggerFilter())
logger.addHandler(logger_handler)

This replaces the time.localtime converter on logging.Formatter with time.gmtime (so timestamps use UTC rather than locatime), sets a custom handler that makes sure messages go to the Lambda infrastructure, configures a format, and adds a filter object that only adds aws_request_id attribute to records (so the above formatter can include it) but doesn't actually filter anything.

You could alter the formatter on that handler by updating the attributes on the handler.formatter object:

for handler in logging.getLogger().handlers:
    formatter = handler.formatter
    if formatter is not None and 'aws_request_id' in formatter._fmt:
        # this is the AWS Lambda formatter
        # formatter.datefmt => '%Y-%m-%dT%H:%M:%S'
        # formatter._style._fmt => 
        #    '[%(levelname)s]\t%(asctime)s.%(msecs)dZ'
        #    '\t%(aws_request_id)s\t%(message)s\n'

and then just drop your own logger handler entirely. You do want to be careful with this; AWS Lambda infrastructure could well be counting on a specific format being used. The output you show in your question doesn't include the date component (the %Y-%m-%dT part of the formatter.datefmt string) which probably means that the format has been parsed out and is being presented to you in a web application view of the data.

like image 170
Martijn Pieters Avatar answered Sep 23 '22 01:09

Martijn Pieters


I'm not sure whether this is the cause of your problem, but by default, Python's loggers propagate their messages up to logging hierarchy. As you probably know, Python loggers are organized in a tree, with the root logger at the top and other loggers below it. In logger names, a dot (.) introduces a new hierarchy level. So if you do

logger = logging.getLogger('some_module.some_function`)

then you actually have 3 loggers:

The root logger (`logging.getLogger()`)
    A logger at module level (`logging.getLogger('some_module'))
        A logger at function level (`logging.getLogger('some_module.some_function'))

If you emit a log message on a logger and it is not discarded based on the loggers minimum level, then the message is passed on to the logger's handlers and to its parent logger. See this flowchart for more information.

If that parent logger (or any logger higher up in the hierarchy) also has handlers, then they are called, too.

I suspect that in your case, either the root logger or the main logger somehow ends up with some handlers attached, which leads to the duplicate messages. To avoid that, you can set propagate in your logger to False or only attach your handlers to the root logger.

like image 43
Florian Brucker Avatar answered Sep 22 '22 01:09

Florian Brucker