Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS Glue Crawler to crawl DynamoDB stuck at starting

I have created a data lake with AWS Lake Formation and an AWS Glue Crawler to create a catalog from DynamoDB table (size: 130 GB, ItemCount: 739,013,546). It's been 12hrs since I started the crawler run but it still shows Starting as its Status.

Is it normal for it to take this much time?

PS: The role assigned to the crawler has permission to scan the DynamoDB table I want.

EDIT:

The only log event in CloudWatch is

{
    "events": [
        {
            "timestamp": 1582560218096,
            "message": "[6a56a417-0617-4253-a6be-091cc367328b] BENCHMARK : Running Start Crawl for Crawler dynamodb-crawler",
            "ingestionTime": 1582560344705
        }
    ]
}
like image 743
Tanmay Avatar asked Sep 06 '25 02:09

Tanmay


2 Answers

This might be a different issue, but it may just be taking a long time to scan if your table is very large.

I had the same problem trying to crawl an on-premise Oracle database. I stopped it after an hour with no logs other than the starting log:

BENCHMARK : Running Start Crawl for Crawler

Then all the logs came through with timestamps ranging from when the crawl started to when I stopped it. I am not sure why they weren't showing up before, or why the crawler was still in the Starting status, but in my instance it actually was running.

like image 83
thetimbecker Avatar answered Sep 07 '25 21:09

thetimbecker


It is strange that I took so much time. Are the crawler logs in cloud watch spitting anything out

like image 20
Emerson Avatar answered Sep 07 '25 22:09

Emerson