Read Headers from Data Source in an AWS Glue Job

Tags:

I have an AWS Glue job that reads from a data source like so:

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "dev-data", table_name = "contacts", transformation_ctx = "datasource0")

But when I call .toDF() on the dynamic frame, the headers are 'col0', 'col1', 'col2' etc. and my actual headers are in the first row of the dataframe.

Note - I can't set them manually as the columns in the data source are variable & iterating over the columns in a loop to set them results in error because you'd have to set the same dataframe variable multiple times, which glue can't handle.

How might I capture the headers while reading from the data source?

406

asked May 30 '18 18:05

Tibberzz

2 Answers

You can try withHeader param. e.g.

dyF = glueContext.create_dynamic_frame.from_options(
    's3',
    {'paths': ['s3://awsglue-datasets/examples/medicare/Medicare_Hospital_Provider.csv']},
    'csv',
    {'withHeader': True})

The documentation for this can be found here

123

answered Sep 23 '22 23:09

Dheeraj

I know this post is old, but I just ran into a similar issue and spent way too long figuring out what the problem was. Wanted to share my solution in case it's helpful to others!

I was using the GUI on AWS and forgot to actually add the correct classifier to the crawler before running it. This resulted in AWS Glue incorrectly detecting datatypes (they mostly came out as strings) and the column names were not detected (they came out as col1, col2, etc). You can create the classifier in "classifiers" under "crawlers". Then, when setting up the crawler, add your classifier to the "selected classifiers" section at the bottom.

Documentation: https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html

answered Sep 22 '22 23:09

TheGreenSpleen25

Related questions
                            
                                Running a cron job in Elastic Beanstalk
                            
                                How do you call adminInitiateAuth from NodeJS Lambda?
                            
                                Independent python subprocess from AWS Lambda function
                            
                                How to copy folder from S3 to elastic beanstalk instance on instance creation
                            
                                Django doesn't see environment variables when deployed to Elastic Beanstalk
                            
                                How to Use AWS S3 C++ SDK TransferManager DownloadFile Callback
                            
                                SSH connection in Amazon lightsail
                            
                                How to update a string set In DDB that is nested inside a map
                            
                                I'm uploading data from my Swift app to Amazon S3 and it drains battery like nothing else. How can this be avoided?
                            
                                How to add a security group to an existing EC2 instance with CloudFormation
                            
                                Sklearn on aws lambda
                            
                                How can I sync data in S3 between a Beijing(China) bucket and a global one?
                            
                                AWS RDS - IAM Database Authentication with Rails
                            
                                Connect to Active MQ with golang
                            
                                IAM role inside SAM template
                            
                                How can we create database and table in Amazon Athena using CloudFormation
                            
                                How to actually track S3 upload progress (JavaScript SDK)
                            
                                Can't find refresh token when Cognito redirects back to my URL
                            
                                Redirect to built-in sign-in page for AWS Cognito user pool
                            
                                Regex filtering of messages in SNS

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Read Headers from Data Source in an AWS Glue Job

Tags:

amazon-web-services

pyspark

aws-glue

Tibberzz

People also ask

2 Answers

Dheeraj

TheGreenSpleen25

Recent Activity

Donate For Us