I see plenty of examples of importing a CSV into a PostgreSQL db, but what I need is an efficient way to import 500,000 CSV's into a single PostgreSQL db. Each CSV is a bit over 500KB (so grand total of approx 272GB of data).
The CSV's are identically formatted and there are no duplicate records (the data was generated programatically from a raw data source). I have been searching and will continue to search online for options, but I would appreciate any direction on getting this done in the most efficient manner possible. I do have some experience with Python, but will dig into any other solution that seems appropriate.
Thanks!
PgAdmin is a Graphical User Interface (GUI) that allows businesses to import data into PostgreSQL databases. With this service, you can convert CSV files into acceptable PostgreSQL database formats, and import the CSV into your PostgreSQL format.
Goto solution for bulk loading into PostgreSQL is the native copy command.
PostgreSQL does not impose a limit on the total size of a database. Databases of 4 terabytes (TB) are reported to exist. A database of this size is more than sufficient for all but the most demanding applications.
If you start by reading the PostgreSQL guide "Populating a Database" you'll see several pieces of advice:
COPY
if at all possible.PostgreSQL's COPY
statement already supports the CSV format:
COPY table (column1, column2, ...) FROM '/path/to/data.csv' WITH (FORMAT CSV)
so it looks as if you are best off not using Python at all, or using Python only to generate the required sequence of COPY
statements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With