I have a web application which uses Google's Datastore, and has been running out of memory after enough requests.
I have narrowed this down to a Datastore query. A minimum PoC is provided below, a slightly longer version which includes memory measuring is on Github.
from google.cloud import datastore
from google.oauth2 import service_account
def test_datastore(entity_type: str) -> list:
creds = service_account.Credentials.from_service_account_file("/path/to/creds")
client = datastore.Client(credentials=creds, project="my-project")
query = client.query(kind=entity_type, namespace="my-namespace")
query.keys_only()
for result in query.fetch(1):
print(f"[+] Got a result: {result}")
for n in range(0,100):
test_datastore("my-entity-type")
Profiling the process RSS shows approximately 1 MiB growth per iteration. This happens even if no results are returned. The following is the output from my Github gist:
[+] Iteration 0, memory usage 38.9 MiB bytes
[+] Iteration 1, memory usage 45.9 MiB bytes
[+] Iteration 2, memory usage 46.8 MiB bytes
[+] Iteration 3, memory usage 47.6 MiB bytes
..
[+] Iteration 98, memory usage 136.3 MiB bytes
[+] Iteration 99, memory usage 137.1 MiB bytes
But at the same time, Python's mprof shows a flat graph (run like mprof run python datastore_test.py
):
The question
Am I doing something wrong with how I call Datastore, or is this likely an underlying problem with a library?
Environment is Python 3.7.4 on Windows 10 (also tested on 3.8 on Debian in Docker) with google-cloud-datastore==1.11.0
and grpcio==1.28.1
.
Edit 1
Clarification this isn't typical Python allocator behaviour, where it requests memory from the OS but doesn't immediately free it from internal arenas / pools. Below is a graph from Kubernetes where my affected application runs:
This shows:
Edit 2
To be absolutely sure about Python's memory usage, I checked the status of the garbage collector using gc. Before exit, the program reports:
gc: done, 15966 unreachable, 0 uncollectable, 0.0156s elapsed
I also forced garbage collection manually using gc.collect()
during each iteration of the loop, which made no difference.
As there are no uncollectable objects, it seems unlikely the memory leak is coming from objects allocated using Python's internal memory management. Therefore it is more likely that an external C library is leaking memory.
Potentially related
There is an open grpc issue that I can't be sure is related, but has a number of similarities to my problem.
I have narrowed down the memory leak to creation of the datastore.Client
object.
For the following proof-of-concept code, memory usage does not grow:
from google.cloud import datastore
from google.oauth2 import service_account
def test_datastore(client, entity_type: str) -> list:
query = client.query(kind=entity_type, namespace="my-namespace")
query.keys_only()
for result in query.fetch(1):
print(f"[+] Got a result: {result}")
creds = service_account.Credentials.from_service_account_file("/path/to/creds")
client = datastore.Client(credentials=creds, project="my-project")
for n in range(0,100):
test_datastore(client, "my-entity-type")
This makes sense for a small script where the client
object can be created once and shared between requests safely.
In many other applications it's harder (or impossible) to safely pass around the client object. I'd expect the library to free memory when the client goes out of scope, otherwise this problem could arise in any long running program.
Edit 1
I have narrowed this down to grpc. The environment variable GOOGLE_CLOUD_DISABLE_GRPC
can be set (to any value) to disable grpc.
Once this has been set, my application in Kubernetes looks like:
Further investigation with valgrind shows it likely relates to OpenSSL usage in grpc, which I documented in this ticket on the bug tracker.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With