Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Cloud Run Jobs logging splits Exception MESSAGES with multiple lines into multiple log entries

Several days ago, a Python RuntimeError was raised in one of my company's Cloud Run jobs which stopped it.

Cloud Run's logging, unfortunately, handled that RuntimeError in a bad way.

It correctly put all the lines from the RuntimeError's stack trace in the first log entry.

But the RuntimeError's message, which had multiple lines of text (each one carrying important diagnostic information) was mutilated. The first line of that message was in the first log entry that contained the stack trace. But each of the remaining lines (except for blank ones) was put into its own subsequent log entry.

Below is a screenshot from the Google Cloud Run LOGS tab for the job that shows this. In the first log entry (the top one), you can see the full stack trace plus the 1st line of the RuntimeError's message ("Caught some Exception (see cause); was processing...") But after that come many log entries, each one of them being a single subsequenbt line from the RuntimeError's message. The screenshot only includes the first 4 of those subsequent lines, the first one being the text "{'broker_order_id': '196056769652',".

enter image description here

That RuntimeError message handling is obviously a disaster: you have to know that the subsequent lines come later (I first thought they did not print at all), it is hard to read them, their log level is no longer ERROR but is absent, etc.

Does anyone know if

  1. there is a cure
  2. we are not doing Cloud Run logging correctly
  3. this is a known bug, or a bug that I need to report to Google

?


Before submitting this question, I did web searches.

I found many people reporting that Exception stack traces were printing on multiple lines up thru 2022: Python, Java, Java.

But the stack trace multi line/multi log entry issue reported in those links seems to have been solved by now. The problem that I am reporting is if your Exception's text message, not its stack trace, has multiple lines.


My company set up Cloud Run Jobs logging > 1 year ago, back when Cloud Run Jobs was in beta, and not fully supported by the Cloud logging facility.

In abbreviated form, our Cloud Run Jobs logging configuration is like the Python code shown below.

Is it possible that this logging config is out of date and causing this problem?

LOG_FORMAT: str = (
    "%(asctime)s.%(msecs)03dZ "
    "%(levelname)s "
    "%(name)s.%(funcName)s "
    "#%(lineno)d "
    "- "
    "%(message)s"
)

DATE_FORMAT: str = "%Y-%m-%d %H:%M:%S"

_is_logging_configured: bool = False


def get_logger(name: str) -> Logger:
    config_logging()
    
    return getLogger(name)


def config_logging() -> None:
    if _is_logging_configured:
        return
    
    config_gcp_cloud_run_job_logging()
    
    _is_logging_configured = True


def config_gcp_cloud_run_job_logging() -> None:
    root_logger = getLogger()
    root_logger.setLevel(os.environ.get("LOG_LEVEL", "WARNING"))
    formatter = get_logging_formatter()
    
    # get metadata about the execution environment
    region = retrieve_metadata_server(_REGION_ID)
    project = retrieve_metadata_server(_PROJECT_NAME)
    
    # build a manual resource object
    cr_job_resource = Resource(
        type = "cloud_run_job",
        labels = {
            "job_name": os.environ.get("CLOUD_RUN_JOB", "unknownJobId"),
            "location": region.split("/")[-1] if region else "",
            "project_id": project,
        },
    )
    
    # configure library using CloudLoggingHandler with custom resource
    client = Client()
    
    # use labels to assign logs to execution
    labels = {"run.googleapis.com/execution_name": os.environ.get("CLOUD_RUN_EXECUTION", "unknownExecName")}
    
    handler = CloudLoggingHandler(client, resource = cr_job_resource, labels = labels)
    handler.setFormatter(formatter)
    setup_logging(handler)


def get_logging_formatter() -> Formatter:
    formatter = Formatter(fmt = LOG_FORMAT, datefmt = DATE_FORMAT)
    Formatter.converter = time.gmtime
    return formatter
like image 739
HaroldFinch Avatar asked Oct 27 '25 06:10

HaroldFinch


1 Answers

  • This is a known bug and it is reported to Google Cloud in Public issue tracker, Currently the Product engineering team is working on this and there is no ETA. This is probably an issue with the appserver log exception grouper not grouping this stack trace into a single log line. If you want you can add your inputs or more details as a comment in that PIT link. Still if you feel a separate request is needed then you can raise it by using this PIT link.

  • To overcome this, you can use Fluent Bit provides the “Multiline” filter which helps to concatenate messages that originally belong to one context but were split across multiple records or log lines. Common examples are stack traces or applications that print logs in multiple lines.

  • As per this SO by Maxim Ponomarev, to avoid duplicate messages about the same exception in the log. It's better to log a message about RuntimeException (or any other Exception) so that it will be easier to identify the exception and can avoid the duplicate messages.

like image 149
Hemanth Kumar Avatar answered Oct 28 '25 21:10

Hemanth Kumar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!