I am using Python3 to query Stackdriver for GCP logs. Unfortunately, the log entries that have important data are returned to me as "NoneType" instead of as a "dict" or a "str". The resulting "entry.payload" is type "None" and the "entry.payload_pb" has the data I want, but it is garbled.
Is there a way to get Stackdriver to return this data in a clean format, or is there a way I can parse it? If not, is there a way I should query this data that is better than what I am doing and yields clean data?
My code looks something like this:
#!/usr/bin/python3
from google.cloud.logging import Client, ASCENDING, DESCENDING
from google.oauth2.service_account import Credentials
projectName = 'my_project'
myFilter = 'logName="projects/' + projectName + '/logs/compute.googleapis.com%2Factivity_log"'
client = Client(project = projectName)
entries = client.list_entries(order_by=DESCENDING, page_size = 500, filter_ = myFilter)
for entry in entries:
if isinstance(entry.payload, dict):
print(entry.payload)
if isinstance(entry.payload, str):
print(entry.payload)
if isinstance(entry.payload, None):
print(entry.payload_pb)
The "entry.payload_pb" data always starts like this:
type_url: "type.googleapis.com/google.cloud.audit.AuditLog"
value: "\032;\[email protected]"I\n\r129.105.16.28\0228
It looks like something is broken in python library related to parsing protobuf for logging. I found two old issues
that seems to be resolved sometime ago - but I believe problem was reintroduced. I have ticket opened for google support on this issue and they are looking into it.
As workaround - you can use two options:
You can use gcloud command. Especially
gcloud logging read
It is very powerful (supports filters, timestamps) - but its output format is yaml. You can install and use PyYAML library to convert logs to dictionary.
The LogEntry.proto_payload
is an Any message, which encodes some other proto buffer message. The type of proto message is indicated by type_url
, and the body of the message is serialized into the value
field. After identifying the type, you can de-serialize it with something like
from google.cloud.audit import AuditLog
...
audit_log = AuditLog()
audit_log.ParseFromString(entry.payload_pb.value)
The AuditLog
message is available at https://github.com/googleapis/googleapis/blob/master/google/cloud/audit/audit_log.proto and the corresponding Python definitions can be built using the protoc compiler
Note that some fields of the AuditLog
message can contain other Any
messages too. There are more details at https://cloud.google.com/logging/docs/audit/api/
In case anyone has the same issue that I had, here's how I solved it:
1) Download and install protobuf. I did this on a mac with brew (brew install protobuf
)
2) Download and install grpcio. I used pip install grpcio
3) Download the "Google APIs" to a known directory. I used /tmp, and this command git clone https://github.com/googleapis/googleapis
4) Change directories to the root directory of the repository you just downloaded in Step 3
5) Use protoc
to build the python repository. This command worked for me protoc -I=/tmp/googleapis/ --python_out=/tmp/ /tmp/googleapis/google/cloud/audit/audit_log.proto
6) Your audit_log_pb2.py file should exist in /tmp/audit_log_pb2.py
7) Place this file in the proper path OR in the same directory as your script.
8) Add this line to the imports in your script:import audit_log_pb2
9) After I did this, the entry.payload
portion of the Protobuf entry was consistently populated with dicts.
PLEASE NOTE: You should verify what version of protoc
you are using with the following command protoc --version
. You really want to use protoc 3.x, because the file we are building from is from version 3 of the spec. The Ubuntu package I installed on a Linux box was version 2, and this was kind of frustrating. Also, although this file was built for Python 2.x, it seems to work fine with Python 3.x.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With