Why is my PostgreSQL table larger (in GB) than the csv it came from?

Q: How do I export a table from PostgreSQL to CSV?

The easiest but the most efficient way to export data from a Postgres table to a CSV file is by using the COPY command. COPY command generates a CSV file on the Database Server. You can export the entire table or the results of a query to a CSV file with the COPY TO command.

Q: What is the table size in PostgreSQL?

To determine the size of a Postgres database table from the command line, log in to your server using SSH and access the database with the table you want to check. Type this code in the command line to determine the size of any of the database's tables. SELECT pg_size_pretty( pg_total_relation_size('tablename') );

Q: Which commands is the most efficient way to bulk load data into a Postgres table from a CSV file?

Goto solution for bulk loading into PostgreSQL is the native copy command.

Tags:

csv

postgresql

size

filesize

amazon-rds

A < 4 GB csv became a 7.7 GB table in my AWS Postgres instance. And a 14 GB csv wouldn't load into 22 GB of space, I'm guessing because it is also going to double in size! Is this factor of two normal? And if so, why, and is it reliable?

269

asked Mar 23 '15 22:03

Erin

1 Answers

There are many possible reasons:

Indexes take up space. If you have lots of indexes, especially multi-column indexes or GiST / GIN indexes, they can be a big space hog.
Some data types are represented more compactly in text form than in a table. For example, 1 consumes 1 byte in csv (or 2 if you count the comma delimiter) but if you store it in a bigint column it requires 8 bytes.
If there's a FILLFACTOR set, PostgreSQL will intentionally waste space so make later UPDATEs and INSERTs faster. If you don't know what FILLFACTOR is, then there isn't one set.
PostgreSQL has a much larger per-row overhead than CSV. In CSV, the per-row overhead is 2 bytes for a newline and carriage return. Rows in a PostgreSQL table require 24 to 28 bytes, plus data values, mainly because of the metadata required for multiversion concurrency control. So a CSV with very many narrow rows will produce a significantly bigger table than one the same size in bytes that has fewer wider rows.
PostgreSQL can do out-of-line storage and compression of values using TOAST. This can make big text strings significantly smaller in the database than in CSV.

You can use octet_size and pg_column_size to get PostgreSQL to tell you how big rows are. Because of TOAST out-of-line compressed storage, the pg_column_size might be different for a tuple produced by a VALUES expression vs one SELECTed from a table.

You can also use pg_total_relation_size to find out how big the table for a given sample input is.

123

answered Sep 17 '22 08:09

Craig Ringer

Related questions
                            
                                laravel belongstomany with condition
                            
                                PL/pgSQL styleguide
                            
                                /phppgadmin Forbidden You don't have permission to access /phppgadmin/ on this server
                            
                                The requested route has not been mapped in Spark
                            
                                Incorrect sort/collation/order with spaces in Postgresql 9.4
                            
                                PostgreSQL: starting a sequence at MAX(the_column)+1
                            
                                Query on Postgres JSON array field in Rails
                            
                                Rails: select records with maximum date
                            
                                Log specific postgresql query using pg-promise
                            
                                Postgres container crashes with `database files are incompatible with server` after container's image has been updated to the latest one
                            
                                PostgreSQL docker: "could not bind IPv6 socket: Cannot assign requested address"
                            
                                How to insert value to identity column in PostgreSQL 11.1
                            
                                Deploy Postgres11 to Elastic Beanstalk - Requires /etc/redhat-release
                            
                                Psycopg2 doesn't like table names that start with a lower case letter
                            
                                Mapping values without a table
                            
                                Create array in SELECT
                            
                                Way to use postgresql and avoid testunit by default in rails 3.2?
                            
                                PostgreSQL equivalent of Oracle's PERCENTILE_CONT function
                            
                                How to connect to PostgreSQL through CLI?
                            
                                How to set some context variable for a user/connection

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With