I have a data structure that looks like this:
Model Place
primary key "id"
foreign key "parent" -> Place
foreign key "neighbor" -> Place (symmetryc)
foreign key "belongtos" -> Place (asymmetric)
a bunch of scalar fields ...
I have over 5 million rows in the model table, and I need to insert ~50 million rows into each of the two foreign key tables. I have SQL
files that look like this:
INSERT INTO place_belongtos (from_place_id, to_place_id) VALUES (123, 456);
and they are about 7 Gb each. The problem is, when I do psql < belongtos.sql
, it takes me about 12 hours to import ~4 million rows on my AMD Turion64x2 CPU. OS is Gentoo ~amd64, PostgreSQL is version 8.4, compiled locally. The data dir is a bind mount, located on my second extended partition (ext4
), which I believe is not the bottleneck.
I'm suspecting it takes so long to insert the foreign key relations because psql
checks for the key constraints for each row, which probably adds some unnecessary overhead, as I know for sure that the data is valid. Is there a way to speed up the import, i.e. temporarily disabling the constraint check?
Aggregations vs. If you're simply filtering the data and data fits in memory, Postgres is capable of parsing roughly 5-10 million rows per second (assuming some reasonable row size of say 100 bytes). If you're aggregating then you're at about 1-2 million rows per second.
There is no PostgreSQL-imposed limit on the number of indexes you can create on a table. Of course, performance may degrade if you choose to create more and more indexes on a table with more and more columns. PostgreSQL has a limit of 1GB for the size of any one field in a table.
Goto solution for bulk loading into PostgreSQL is the native copy command.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With