Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS Glue Crawler Not Creating Table

I have a crawler I created in AWS Glue that does not create a table in the Data Catalog after it successfully completes.

The crawler takes roughly 20 seconds to run and the logs show it successfully completed. CloudWatch log shows:

  • Benchmark: Running Start Crawl for Crawler
  • Benchmark: Classification Complete, writing results to DB
  • Benchmark: Finished writing to Catalog
  • Benchmark: Crawler has finished running and is in ready state

I am at a loss as to why the tables in the data catalog are not being created. AWS Docs are not of much help debugging.

like image 976
Vince Avatar asked Nov 01 '17 17:11

Vince


People also ask

Why is glue crawler not making table?

Most likely you don't have correct permission. When you create the crawler, if you choose to create an IAM role(the default setting), then it will create a policy for S3 object you specified only. if later you edit the crawler and change the S3 path only.

Does glue crawler Create the table?

You can use a crawler to populate the AWS Glue Data Catalog with tables. This is the primary method used by most AWS Glue users. A crawler can crawl multiple data stores in a single run. Upon completion, the crawler creates or updates one or more tables in your Data Catalog.

How do you Create a table in AWS Glue?

To get started, sign in to the AWS Management Console and open the AWS Glue console at https://console.aws.amazon.com/glue/ . Choose the Tables tab, and use the Add tables button to create tables either with a crawler or by manually typing attributes.

How do you make a table using a crawler?

Crawlers create tables in your Data Catalog. Tables are contained in a database in the Data Catalog. First, choose Add database to create a database. In the pop-up window, enter test-flights-db for the database name, and then choose Create.


2 Answers

check the IAM role associated with the crawler. Most likely you don't have correct permission.

When you create the crawler, if you choose to create an IAM role(the default setting), then it will create a policy for S3 object you specified only. if later you edit the crawler and change the S3 path only. The role associated with the crawler won't have permission to the new S3 path.

like image 131
Ray Avatar answered Sep 23 '22 12:09

Ray


I had the same issue, as advised by others I tried to revise the existing IAM role, to include the new S3 bucket as the resource, but for some reason it did not work. Then I created a completely new role from scratch... this time it worked. Also, one big question I have for AWS is "why this access denied error due to a wrong attached IAM policy does not show up in Cloud watch log??" That makes it difficult to debug.

like image 32
Mohammad Sadoughi Avatar answered Sep 22 '22 12:09

Mohammad Sadoughi