Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I write messages to the output log on AWS Glue?

AWS Glue jobs log output and errors to two different CloudWatch logs, /aws-glue/jobs/error and /aws-glue/jobs/output by default. When I include print() statements in my scripts for debugging, they get written to the error log (/aws-glue/jobs/error).

I have tried using:

log4jLogger = sparkContext._jvm.org.apache.log4j  log = log4jLogger.LogManager.getLogger(__name__)  log.warn("Hello World!") 

but "Hello World!" doesn't show up in either of the logs for the test job I ran.

Does anyone know how to go about writing debug log statements to the output log (/aws-glue/jobs/output)?

TIA!

EDIT:

It turns out the above actually does work. What was happening was that I was running the job in the AWS Glue Script editor window which captures Command-F key combinations and only searches in the current script. So when I tried to search within the page for the logging output it seemed as if it hadn't been logged.

NOTE: I did discover through testing the first responder's suggestion that AWS Glue scripts don't seem to output any log message with a level less than WARN!

like image 462
Jesse Clark Avatar asked Feb 21 '18 19:02

Jesse Clark


People also ask

What is continuous logging in AWS Glue?

AWS Glue now provides continuous logs to track real-time progress of executing Apache Spark stages in ETL jobs . You can access different log streams for Apache Spark driver and executors in Amazon CloudWatch and filter out highly verbose Apache Spark log messages making it easier to monitor and debug your ETL jobs.


2 Answers

Try to use built-in python logger from logging module, by default it writes messages to standard output stream.

import logging  MSG_FORMAT = '%(asctime)s %(levelname)s %(name)s: %(message)s' DATETIME_FORMAT = '%Y-%m-%d %H:%M:%S' logging.basicConfig(format=MSG_FORMAT, datefmt=DATETIME_FORMAT) logger = logging.getLogger(<logger-name-here>)  logger.setLevel(logging.INFO)  ...  logger.info("Test log message") 
like image 144
Alexey Bakulin Avatar answered Sep 18 '22 16:09

Alexey Bakulin


I know the article is not new but maybe it could be helpful for someone: For me logging in glue works with the following lines of code:

# create glue context glueContext = GlueContext(sc) # set custom logging on logger = glueContext.get_logger() ... #write into the log file with: logger.info("s3_key:" + your_value) 
like image 42
Lars Avatar answered Sep 19 '22 16:09

Lars