Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to import a lot of csv files into PostgreSQL db

I see plenty of examples of importing a CSV into a PostgreSQL db, but what I need is an efficient way to import 500,000 CSV's into a single PostgreSQL db. Each CSV is a bit over 500KB (so grand total of approx 272GB of data).

The CSV's are identically formatted and there are no duplicate records (the data was generated programatically from a raw data source). I have been searching and will continue to search online for options, but I would appreciate any direction on getting this done in the most efficient manner possible. I do have some experience with Python, but will dig into any other solution that seems appropriate.

Thanks!

like image 219
FredG Avatar asked Sep 28 '12 19:09

FredG


People also ask

Can you import CSV into PostgreSQL?

PgAdmin is a Graphical User Interface (GUI) that allows businesses to import data into PostgreSQL databases. With this service, you can convert CSV files into acceptable PostgreSQL database formats, and import the CSV into your PostgreSQL format.

Which of the following commands is the most efficient way to bulk load data into a Postgres table from a CSV file?

Goto solution for bulk loading into PostgreSQL is the native copy command.

How much load can PostgreSQL handle?

PostgreSQL does not impose a limit on the total size of a database. Databases of 4 terabytes (TB) are reported to exist. A database of this size is more than sufficient for all but the most demanding applications.


1 Answers

If you start by reading the PostgreSQL guide "Populating a Database" you'll see several pieces of advice:

  1. Load the data in a single transaction.
  2. Use COPY if at all possible.
  3. Remove indexes, foreign key constraints etc before loading the data and restore them afterwards.

PostgreSQL's COPY statement already supports the CSV format:

COPY table (column1, column2, ...) FROM '/path/to/data.csv' WITH (FORMAT CSV)

so it looks as if you are best off not using Python at all, or using Python only to generate the required sequence of COPY statements.

like image 176
Gareth Rees Avatar answered Oct 06 '22 03:10

Gareth Rees