I have a trouble with following code. The most difficult to understand is that the expression alway happened when the many query operation happened in a short time interval.
The experssion is as follows:
2017-03-05 15:03:59,053 data_sync_worker.py[line:83] ERROR An error occurred (ValidationException) when calling the Query operation: KeyConditionExpressions must only contain one condition per key
ClientError: An error occurred (ValidationException) when calling the Query operation: KeyConditionExpressions must only contain one condition per key
And here is my code:
response = self.record_tb.query(
KeyConditionExpression=Key(self.partition_key).eq(user_id) &
Key(self.sort_key).between(
begin_time+Decimal(CACHE_TIMESTAMP_MIN_STEP),
endtime))
And here is the table key schema:
"KeySchema": [
{
"KeyType": "HASH",
"AttributeName": "user_id"
},
{
"KeyType": "RANGE",
"AttributeName": "timestamp"
}
]
So, has anyone met this?
This error message above can occur from having a malformed query in code, and for many people, that's a reasonable explanation. However, I've confirmed that you can also get this mysterious and very misleading error message if you run dynamodb query
using a shared Table
resource with multiple threads or tasks under a lot of load. That's what I think is happening in the OP's case. I've seen this happen both with boto3
1.9.82 along with the pathos
library and with asyncio
in python 3.6.
To wit, this is something many of us have suspected for a long time - boto3 isn't completely thread-safe even if it does often work in practice.
In this particular case, I suspect there's some state that gets corrupted during the query-building process such that the query that actually gets submitted to the service endpoint is invalid. I've not been able to reproduce this on demand; re-running the same code a second time always seems to work. It would be possible to use the botocore
logger to capture the actual payloads sent to AWS - that would prove my theory. However it's really expensive on my end to capture such a large volume of logs, so I just stopped using shared Table
resources and I stopped seeing the error.
@killthrush answer turned out to be the cause of this error for us. Basically, as he points out, it appears boto3 is not thread-safe and we were reading from s3 from multiple concurrent threads.
If you are looking for a quick fix, I found the comment by @pedros007 in the link he supplied to work. Basically when you setup the boto3 s3 client, set the max_pool_connecetions to the amount of workers you are running and so far we have stopped getting the ValidationException error.
# code from pedros007
num_threads = 16
cfg = botocore.config.Config(max_pool_connections=num_threads)
client = boto3.client("s3", config=cfg)
futures = {}
with ThreadPoolExecutor(max_workers = num_threads) as executor:
for key in keys:
f = executor.submit(my_head_object_function, key, client)
futures[f] = key
Others have suggested starting a new session in each thread, which I've tried, and does work but there is a large performance hit. With the above method the performance is the same.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With