I'm trying to load a parquet file from s3 to AWS postgresql RDS. The data import using aws_s3.table_import_from_s3 is working fine for csv file but when I tried the similar way for parquet file, I get the below error:
ERROR: invalid byte sequence for encoding "UTF8": 0x00
In AWS docs, I see option for custom delimiter or zip file. Is it possible to import parquet data?
You can import data from Amazon S3 into a table belonging to an RDS for PostgreSQL DB instance. To do this, you use the aws_s3 PostgreSQL extension that Amazon RDS provides. Your database must be running PostgreSQL version 10.7 or higher to import from Amazon S3 into RDS for PostgreSQL.
Under Access management, choose Policies. Choose Create Policy. On the Visual editor tab, choose Choose a service, and then choose S3. For Actions, choose Expand all, and then choose the bucket permissions and object permissions required to transfer files from an Amazon S3 bucket to Amazon RDS.
Parquet access can be made transparent to PostgreSQL via the parquet_fdw extension. Parquet storage can provide substantial space savings. Parquet storage is a bit slower than native storage, but can offload management of static data from the back-up and reliability operations needed by the rest of the data.
Data import into an AWS PostgreSQL RDS supports what COPY does. Although there is a PostgreSQL binary file format it doesn't support Parquet so in order to import the data you have to convert it to a text file or the PostgreSQL binary file format first.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With