I have a requirement where I need to store the records at rate of 10,000 records/sec into a database (with indexing on a few fields). Number of columns in one record is 25. I am doing a batch insert of 100,000 records in one transaction block. To improve the insertion rate, I changed the tablespace from disk to RAM.With that I am able to achieve only 5,000 inserts per second.
I have also done the following tuning in the postgres config:
Other information:
I am wondering why a single insert query is taking around 0.2 msec on average when database is not writing anything on disk (as I am using RAM based tablespace). Is there something I am doing wrong?
Help appreciated.
Prashant
\COPY schema.temp_table FROM /tmp/data.csv WITH CSV
For large volumes of data:
SELECT
statements will use. In other words, try to align the physical model with the logical model.CLUSTER
index (most important column on the left). For example:CREATE UNIQUE INDEX measurement_001_stc_index ON climate.measurement_001 USING btree (station_id, taken, category_id); ALTER TABLE climate.measurement_001 CLUSTER ON measurement_001_stc_index;
On a machine with 4GB of RAM, I did the following...
Tell the Kernel that it's okay for programs to use gobs of shared memory:
sysctl -w kernel.shmmax=536870912
sysctl -p /etc/sysctl.conf
/etc/postgresql/8.4/main/postgresql.conf
and set:shared_buffers = 1GB temp_buffers = 32MB work_mem = 32MB maintenance_work_mem = 64MB seq_page_cost = 1.0 random_page_cost = 2.0 cpu_index_tuple_cost = 0.001 effective_cache_size = 512MB checkpoint_segments = 10
For example, let's say you have data based on weather, divided into different categories. Rather than having a single monstrous table, divide it into several tables (one per category).
CREATE TABLE climate.measurement
(
id bigserial NOT NULL,
taken date NOT NULL,
station_id integer NOT NULL,
amount numeric(8,2) NOT NULL,
flag character varying(1) NOT NULL,
category_id smallint NOT NULL,
CONSTRAINT measurement_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE TABLE climate.measurement_001
(
-- Inherited from table climate.measurement_001: id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass),
-- Inherited from table climate.measurement_001: taken date NOT NULL,
-- Inherited from table climate.measurement_001: station_id integer NOT NULL,
-- Inherited from table climate.measurement_001: amount numeric(8,2) NOT NULL,
-- Inherited from table climate.measurement_001: flag character varying(1) NOT NULL,
-- Inherited from table climate.measurement_001: category_id smallint NOT NULL,
CONSTRAINT measurement_001_pkey PRIMARY KEY (id),
CONSTRAINT measurement_001_category_id_ck CHECK (category_id = 1)
)
INHERITS (climate.measurement)
WITH (
OIDS=FALSE
);
Bump up the table stats for the important columns:
ALTER TABLE climate.measurement_001 ALTER COLUMN taken SET STATISTICS 1000;
ALTER TABLE climate.measurement_001 ALTER COLUMN station_id SET STATISTICS 1000;
Don't forget to VACUUM
and ANALYSE
afterwards.
are you doing your insert as a series of
INSERT INTO tablename (...) VALUES (...);
INSERT INTO tablename (...) VALUES (...);
...
or as one multiple-row insert:
INSERT INTO tablename (...) VALUES (...),(...),(...);
second one will be faster significantly on 100k rows.
source: http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With