I have few CSV files in azure blob storage, and we are using COPY INTO command to load the files in snowflake table. The problem is: The file system is: container >> folder (Ex: account) >> Number of files like 2011-09.csv 2011-10.csv likewise and account folder also has a sub-folder 'Snapshot' which also has files that has similar data but with different name like 2019-11_1654478715.csv So while using COPY INTO command, the target table in Snowflake is populated with duplicate rows.
Iam using this one:
copy into BINGO_DWH_DEV.LANDING.CRM_ACCOUNT_TEMP from 'azure://abc.blob.core.windows.net/abc-abc/account' credentials=(azure_sas_token= 'abc') ON_ERROR='CONTINUE' FILE_FORMAT=(type=csv field_delimiter=',' FIELD_OPTIONALLY_ENCLOSED_BY='"');
Any ideas where I can use COPY INTO command with regular expression that can pick only the files like '2011-09.csv' and not the files from the Snapshot folder.
Appreciate your help
You can use pattern keyword as regular expressions to insert files based on pattern.
Please refer to the Snowflake documentation.
Example:
copy into emp_basic
from @%emp_basic
file_format = (type = csv field_optionally_enclosed_by='"')
pattern = '.*2011-19.*.csv.gz'
on_error = 'continue';
It depends on how you set the stage location (Azure blob or S3 or GCP). Let’s say that your files get landed in the "folder" s3://yourbucket/folder1/[filename],gz. And that you've set your stage to point to s3://yourbucket used pattern:
pattern='.*2011-09*.csv.*.gz'
Then it will scan all files under s3://yourbucket.
If however your stage has been setup to point to the folder s3://yourbucket/folder1/ and the pattern used is:
pattern='.*2011-09.*csv.*.gz'
Then it will look only in s3://yourbucket/folder1/.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With