Is there a way to directly insert data from a parquet file into PostgreSQL database?

Tags:

I'm trying to restore some historic backup files that saved in parquet format, and I want to read from them once and write the data into a PostgreSQL database.

I know that backup files saved using spark, but there is a strict restriction for me that I cant install spark in the DB machine or read the parquet file using spark in a remote device and write it to the database using spark_df.write.jdbc. Everything needs to happen on the DB machine and in the absence of spark and Hadoop only using Postgres and Bash scripting.

my files structure is something like:

foo/
    foo/part-00000-2a4e207f-4c09-48a6-96c7-de0071f966ab.c000.snappy.parquet
    foo/part-00001-2a4e207f-4c09-48a6-96c7-de0071f966ab.c000.snappy.parquet
    foo/part-00002-2a4e207f-4c09-48a6-96c7-de0071f966ab.c000.snappy.parquet
    ..
    ..

I expect to read data and schema from each parquet folder like foo, create a table using that schema and write the data into the shaped table, only using bash and Postgres CLI.

616

asked Nov 10 '19 08:11

Javad Bahoosh

1 Answers

You can using spark and converting parquet files to csv format, then moving the files to DB machine and import them by any tools.

spark.read.parquet("...").write.csv("...")

import pandas as pd
df = pd.read_csv('mypath.csv')
df.columns = [c.lower() for c in df.columns] #postgres doesn't like capitals or spaces

from sqlalchemy import create_engine
engine = create_engine('postgresql://username:password@localhost:5432/dbname')

df.to_sql("my_table_name", engine)

122

answered Oct 12 '22 18:10

Moein Hosseini

Related questions
                            
                                Pass environment variable to sudo
                            
                                Sort text columns by number of lines in bash
                            
                                Bash on Ubuntu on Windows start parameter
                            
                                Redirecting output of a program to a rotating file
                            
                                IFS in zsh behave differently than bash
                            
                                makefile giving unexpected end of file error
                            
                                bash append file from multiple thread
                            
                                pushd does not go into specified directory
                            
                                how can i redirect the output of cppcheck into file?
                            
                                PHP/Ubuntu - QxcbConnection: Could not connect to display aborted
                            
                                How to pass a glob to a shell script via an environment variable while passing ShellCheck
                            
                                How to convert 'ls' command to 'cat' command?
                            
                                Python/Bash - Get filenames with escaped characters
                            
                                bash read -r -N skips part of the input
                            
                                How can I run a Bash shell script as a Build Event in Visual Studio?
                            
                                Why are bash variables not global if sourced script runs in a function?
                            
                                How to deploy spring boot jar file to EC2 using jenkins?
                            
                                Send INSERT and F12 in expect script
                            
                                Precedence of operators in Bash
                            
                                How to print the whole line that contains a specified byte offset in a file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a way to directly insert data from a parquet file into PostgreSQL database?

Tags:

bash

postgresql

hdfs

parquet

Javad Bahoosh

People also ask

1 Answers

Moein Hosseini

Recent Activity

Donate For Us