I am trying to run a crawler across an s3 datastore in my account which contains two csv files. However, when I try to run the crawler, no tables are loaded, and I see the following errors in cloudwatch for the each of the files:
This is especially odd as the IAM role has the AdministratorAccess policy attached, so there should not be any access denied issue.
Any help would be appreciated.
No. you don't need to create a crawler to run Glue Job. Crawler can read multiple datasources and keep Glue Catalog up to date.
Error: Could not find S3 endpoint or NAT gateway for subnetId in VPC. Check the subnet ID and VPC ID in the message to help you diagnose the issue. Check that you have an Amazon S3 VPC endpoint set up, which is required with AWS Glue. In addition, check your NAT gateway if that's part of your configuration.
Check to see if the files you are crawling are encrypted. If they are, then your Glue role probably doesn't have a policy that allows it to decrypt.
If so, it might need something like this:
{
"Version": "2012-10-17",
"Statement": {
"Effect": "Allow",
"Action": [
"kms:Decrypt"
],
"Resource": [
"arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab",
"arn:aws:kms:us-west-2:111122223333:key/0987dcba-09fe-87dc-65ba-ab0987654321"
]
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With