I want to increase my database performance. In a project, all tables went from int
to bigint
, which I think is a bad choice not only regarding storage, since int
requires 4 bytes
, and bigint
requires 8 bytes
;but also regarding performance.
So I created a small table with 10 millions entries, with a script in Python:
import uuid
rows=10000000
output='insert_description_bigint.sql'
f = open(output, 'w')
set_schema="SET search_path = norma;\n"
f.write(set_schema)
for i in range(1,rows):
random_string=uuid.uuid4()
query="insert into description_bigint (description_id, description) values (%d, '%s'); \n"
f.write(query % (i,random_string))
And this is how I created my two
tables:
-- BIGINT
DROP TABLE IF EXISTS description_bigint;
CREATE TABLE description_bigint
(
description_id BIGINT PRIMARY KEY NOT NULL,
description VARCHAR(200),
constraint description_id_positive CHECK (description_id >= 0)
);
select count(1) from description_bigint;
select * from description_bigint;
select * from description_bigint where description_id = 9999999;
-- INT
DROP TABLE IF EXISTS description_int;
CREATE TABLE description_int
(
description_id INT PRIMARY KEY NOT NULL,
description VARCHAR(200),
constraint description_id_positive CHECK (description_id >= 0)
);
After inserting all this data, I do a query for both tables, to measure the difference between them. And for my surprise they both have the same performance:
select * from description_bigint; -- 11m55s
select * from description_int; -- 11m55s
Am I doing something wrong with my benchmark ? Shouldn't int
be faster than bigint
? Especially, when the primary key
is by definition an index
which means, to create an index for bigint
would be slower than create an index for int
, with the same amount of data, right ?
I know that is not just a small thing that will make a huge impact regarding performance on my database, but I want to ensure that we are using the best practices and focused into performance here.
Maximum Value (INT VS BIGINT)The maximum value for INT is 2,147,483,647 (2^31-1) while the maximum value for BIGINT is 9,223,372,036,854,775,807 (2^63-1).
There are three kinds of integers in PostgreSQL: Small integer ( SMALLINT ) is 2-byte signed integer that has a range from -32,768 to 32,767. Integer ( INT ) is a 4-byte integer that has a range from -2,147,483,648 to 2,147,483,647.
In a 64-bit system the two tables are nearly identical. The column description_id
in description_int
covers 8 bytes (4 for integer and 4 as alignment). Try this test:
select
pg_relation_size('description_int')/10000000 as table_int,
pg_relation_size('description_bigint')/10000000 as table_bigint,
pg_relation_size('description_int_pkey')/10000000 as index_int,
pg_relation_size('description_bigint_pkey')/10000000 as index_bigint;
The average row size of both tables is virtually the same. This is because the integer column occupies 8 bytes (4 bytes for a value and 4 bytes of alignment) exactly like bigint (8 bytes for a value without a filler). The same applies to index entries. This is a special case, however. If we add one more integer column to the first table:
CREATE TABLE two_integers
(
description_id INT PRIMARY KEY NOT NULL,
one_more_int INT,
description VARCHAR(200),
constraint description_id_positive CHECK (description_id >= 0)
);
the average row size should remain the same because the first 8 bytes will be used for two integers (without filler).
Find more details in Calculating and saving space in PostgreSQL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With