S3 Select retrieve headers in the CSV

Tags:

I am trying to fetch a subset of records from a CSV stored in an S# bucket using the following code:

s3 = boto3.client('s3')
bucket = bucket
file_name = file

sql_stmt = """SELECT S.* FROM s3object S LIMIT 10"""


req = s3.select_object_content(
    Bucket=bucket,
    Key=file,
    ExpressionType='SQL',
    Expression=sql_stmt,
    InputSerialization = {'CSV': {'FileHeaderInfo': 'USE'}},
    OutputSerialization = {'CSV': {}},
)

records = []
for event in req['Payload']:
    if 'Records' in event:
        records.append(event['Records']['Payload'])
    elif 'Stats' in event:
        stats = event['Stats']['Details']


file_str = ''.join(r.decode('utf-8') for r in records)

select_df = pd.read_csv(StringIO(file_str))
df = pd.DataFrame(select_df)
print(df)

This successfully yields the records but misses out on headers.

I read here S3 Select CSV Headers that S3 Select does not yield headers at all. So, is it possible to retrieve the headers of a CSV file in S3 in any other way?

619

asked Apr 25 '19 22:04

Sumedha Nagpal

1 Answers

Change InputSerialization={'CSV': {"FileHeaderInfo": "Use"}},

TO InputSerialization={'CSV': {"FileHeaderInfo": "NONE"}},

Then, it will print full content, including the header.

Explanation:

FileHeaderInfo accepts one of "NONE" OR "USE" OR "IGNORE".

Use NONE option rather then USE, it will then print header as well, as NONE tells that you need header as well for processing.

Here is reference. https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.select_object_content

I hope it helps.

answered Sep 25 '22 10:09

Red Boy

Related questions
                            
                                Python NetworkX — set node color automatically based on a list of values
                            
                                Conditional element classes with jinja, I want a div to get a class if a list item contains a certain item
                            
                                matplotlib hist function argmument density not working
                            
                                Call a coroutine without yielding the event loop
                            
                                How to search and play a video on YouTube using Selenium in Python?
                            
                                How to resample a column by id
                            
                                Import failure of s3fs library in AWS Glue
                            
                                Pandas: Filling data for missing dates
                            
                                Numpy tobytes() with defined byteorder
                            
                                calling a function with delay
                            
                                What's the fastest way to copy values from one tensor to another in PyTorch?
                            
                                Pandas groupby for multiple values in a column
                            
                                Skip directory name in import path by importing subpackage in __init__.py
                            
                                Numpy array with different standard deviation per row
                            
                                Pyspark error on creating dataframe: 'StructField' object has no attribute 'encode'
                            
                                How draw box across multiple axes on matplotlib using ax position as reference
                            
                                Why does custom Python object cannot be used with ParDo Fn?
                            
                                How to I make my AI algorithm play 9 board tic-tac-toe?
                            
                                ImageDataGenerator: how to add the 4th dimension to a numpy array?
                            
                                Sum of diagonal elements in a matrix

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

S3 Select retrieve headers in the CSV

Tags:

python

csv

amazon-s3

export-to-csv

boto3

Sumedha Nagpal

People also ask

1 Answers

Red Boy

Recent Activity

Donate For Us