I have created a data lake with AWS Lake Formation and an AWS Glue Crawler to create a catalog from DynamoDB table (size: 130 GB, ItemCount: 739,013,546). It's been 12hrs since I started the crawler run but it still shows Starting
as its Status
.
Is it normal for it to take this much time?
PS: The role assigned to the crawler has permission to scan the DynamoDB table I want.
EDIT:
The only log event in CloudWatch is
{
"events": [
{
"timestamp": 1582560218096,
"message": "[6a56a417-0617-4253-a6be-091cc367328b] BENCHMARK : Running Start Crawl for Crawler dynamodb-crawler",
"ingestionTime": 1582560344705
}
]
}
This might be a different issue, but it may just be taking a long time to scan if your table is very large.
I had the same problem trying to crawl an on-premise Oracle database. I stopped it after an hour with no logs other than the starting log:
BENCHMARK : Running Start Crawl for Crawler
Then all the logs came through with timestamps ranging from when the crawl started to when I stopped it. I am not sure why they weren't showing up before, or why the crawler was still in the Starting
status, but in my instance it actually was running.
It is strange that I took so much time. Are the crawler logs in cloud watch spitting anything out
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With