I have an s3 folder location, that I am moving to GCS. I am using Airflow to make the movements happen.
In this environment, my s3 is an "ever growing" folder, meaning we do not delete files after we get them.
def GetFiles(**kwargs):
foundfiles = False
s3 = S3Hook(aws_conn_id='S3_BDEX')
s3.get_conn()
bucket = s3.get_bucket(
bucket_name='/file.share.external.bdex.com/Offrs'
)
files = s3.list_prefixes(bucket_name='/file.share.external.bdex.com/Offrs')
print("BUCKET: {}".format(files))
check_for_file = BranchPythonOperator(
task_id='Check_FTP_and_Download',
provide_context=True,
python_callable=GetFiles,
dag=dag
)
What I need here is the list of files and their creation date/time. This way I can compare existing files to determine if they are new or not.
I know I can connect, because the function get_bucket
function worked.
However, in this case, I get the following errors:
Invalid bucket name "/file.share.external.bdex.com/Offrs": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$"
Thank you
For example, the S3Hook , which is one of the most widely used hooks, relies on the boto3 library to manage its connection with S3. The S3Hook contains over 20 methods to interact with S3 buckets, including methods like: check_for_bucket : Checks if a bucket with a specific name exists.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With