I am trying to set up a logger for my AWS Glue job using Python's logging
module. I have a Glue job with the type as "Python Shell" using Python version 3.
Logging works fine if I instantiate the logger without any name
, but if I give my logger a name
, it no longer works, and I get an error which says: Log stream not found
.
I have the following code in an example Glue job:
import sys
import logging
# Version 1 - this works fine
logger = logging.getLogger()
log_format = "[%(asctime)s %(levelname)-8s %(message)s"
# Version 2 - this fails
logger = logging.getLogger(name = "foobar")
log_format = "[%(name)s] %(asctime)s %(levelname)-8s %(message)s"
date_format = "%a, %d %b %Y %H:%M:%S %Z"
log_stream = sys.stdout
if logger.handlers:
for handler in logger.handlers:
logger.removeHandler(handler)
logging.basicConfig(level = logging.INFO, format = log_format, stream =
log_stream, datefmt = date_format)
logger.info("This is a test.")
Note that I'm removing the handlers based on this post.
If I instantiate the logger using Version 1 of the code, it runs successfully and I am able to view the logs, as well as query them in CloudWatch
.
If I run Version 2, giving the logger a name, the Glue job still succeeds. However, if I try to view the logs, I get the following error message:
Log stream not found
The log stream jr_f137743545d3d242618ac95d859b9146fd15d15a0aadce64d8f3ba991ffed012 could not be found. Check if it was correctly created and retry.
And I am also not able to query these logs in CloudWatch
.
I have tried running this code locally using python version 3.6.0
, and both versions work. Additionally, both versions of this logging code work inside of a Lambda funcvtion. They only fail in Glue.
getLogger(name) is typically executed. The getLogger() function accepts a single argument - the logger's name. It returns a reference to a logger instance with the specified name if provided, or root if not. Multiple calls to getLogger() with the same name will return a reference to the same logger object.
The inbuilt logging module in python requires some handful of lines of code to configure log4j-like features viz - file appender, file rotation based on both time & size. For one-liner implementation of the features in your code, you can use the package autopylogger .
This code worked for me:
import sys
root = logging.getLogger()
root.setLevel(logging.DEBUG)
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
root.addHandler(handler)
root.info("check")
I had a similar issue but fixed it with a combination of correct roles and looking in the right place in Cloudwatch. Make sure you're using the GlueServiceRole. Both Steve and your logging code is fine but the place that you are taken in Cloudwatch when you click on the "logs" button in Glue isn't the correct logging folder.
Go back into log groups then go into /aws-glue/python-jobs/error and that is the where the logger write to, while the stdout writes to the folder /aws-glue/python-jobs/output. It's not a very intuitive setup writing the logs to the error logs folder but hey ho, I'm sure there is a way of configuring it to get it to write where expected.
You should be able to name the log stream by using the following (replace "logger-name-here" with your desired log stream name):
import logging
MSG_FORMAT = '%(asctime)s %(levelname)s %(name)s: %(message)s'
DATETIME_FORMAT = '%Y-%m-%d %H:%M:%S'
logging.basicConfig(format=MSG_FORMAT, datefmt=DATETIME_FORMAT)
logger = logging.getLogger(<logger-name-here>)
logger.setLevel(logging.INFO)
logger.info("Test log message")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With