Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to import parquet files from s3 to postgresql rds

I'm trying to load a parquet file from s3 to AWS postgresql RDS. The data import using aws_s3.table_import_from_s3 is working fine for csv file but when I tried the similar way for parquet file, I get the below error:

ERROR: invalid byte sequence for encoding "UTF8": 0x00

In AWS docs, I see option for custom delimiter or zip file. Is it possible to import parquet data?

like image 596
swetha Avatar asked Jul 06 '20 06:07

swetha


People also ask

How do I transfer data from S3 to RDS Postgres?

You can import data from Amazon S3 into a table belonging to an RDS for PostgreSQL DB instance. To do this, you use the aws_s3 PostgreSQL extension that Amazon RDS provides. Your database must be running PostgreSQL version 10.7 or higher to import from Amazon S3 into RDS for PostgreSQL.

How do I transfer files from S3 to RDS?

Under Access management, choose Policies. Choose Create Policy. On the Visual editor tab, choose Choose a service, and then choose S3. For Actions, choose Expand all, and then choose the bucket permissions and object permissions required to transfer files from an Amazon S3 bucket to Amazon RDS.

Does Postgres support parquet files?

Parquet access can be made transparent to PostgreSQL via the parquet_fdw extension. Parquet storage can provide substantial space savings. Parquet storage is a bit slower than native storage, but can offload management of static data from the back-up and reliability operations needed by the rest of the data.


1 Answers

Data import into an AWS PostgreSQL RDS supports what COPY does. Although there is a PostgreSQL binary file format it doesn't support Parquet so in order to import the data you have to convert it to a text file or the PostgreSQL binary file format first.

like image 156
voroninman Avatar answered Nov 02 '22 23:11

voroninman