Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use the s3 hook in airflow

I have an s3 folder location, that I am moving to GCS. I am using Airflow to make the movements happen.

In this environment, my s3 is an "ever growing" folder, meaning we do not delete files after we get them.

def GetFiles(**kwargs):
    foundfiles = False

    s3 = S3Hook(aws_conn_id='S3_BDEX')
    s3.get_conn()
    bucket = s3.get_bucket(
        bucket_name='/file.share.external.bdex.com/Offrs'
    )
    files = s3.list_prefixes(bucket_name='/file.share.external.bdex.com/Offrs')
    print("BUCKET:  {}".format(files))


check_for_file = BranchPythonOperator(
    task_id='Check_FTP_and_Download',
    provide_context=True,
    python_callable=GetFiles,
    dag=dag
)

What I need here is the list of files and their creation date/time. This way I can compare existing files to determine if they are new or not.

I know I can connect, because the function get_bucket function worked. However, in this case, I get the following errors:

Invalid bucket name "/file.share.external.bdex.com/Offrs": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$"

Thank you

like image 609
arcee123 Avatar asked Feb 13 '20 06:02

arcee123


People also ask

What is S3 hook?

For example, the S3Hook , which is one of the most widely used hooks, relies on the boto3 library to manage its connection with S3. The S3Hook contains over 20 methods to interact with S3 buckets, including methods like: check_for_bucket : Checks if a bucket with a specific name exists.


1 Answers

  1. The bucket name is wrong. If the url is s3://something/path/to/file, then the bucket name is "something".
like image 106
R Penumaka Avatar answered Nov 04 '22 06:11

R Penumaka