I want to decide between using a citext column with an index or using a text column with an index on lower().
I performed some benchmarks. To my surprise the search with the index on lower() causes an index scan, but in the citext case I get an index only scan. I was expecting the index on lower() to cause an index only scan too.
Also, the total cost with the citext index is 4.44, but with the index on lower(), the total cost is 8.44.
So the first thing coming to my mind is that the citext column index is better than the expression index on a text column.
CREATE TABLE test_citext(a citext NOT NULL);
INSERT INTO test_citext
SELECT cast(x as text)
FROM generate_series(1, 1000000) AS x;
VACUUM (FREEZE, ANALYZE) test_citext;
create index citextind on test_citext(a);
Select * from test_citext where a = 'test';
--Index Only Scan.Total cost 4.44
CREATE TABLE test_textlowerindex(a text NOT NULL);
INSERT INTO test_textlowerindex
SELECT cast(x as text)
FROM generate_series(1, 1000000) AS x;
VACUUM (FREEZE, ANALYZE) test_textlowerindex;
create index lowertextind on test_textlowerindex(lower(a));
Select * from test_textlowerindex where lower(a) = 'test';
--Index Scan.Total cost 8.44
Am I right?
Mister Laurenz Albe thanks for your answer.I changed my above script as you said. Result :
CREATE TABLE test_citext(a citext NOT NULL);
INSERT INTO test_citext
SELECT cast(x as text)
FROM generate_series(1, 1000000) AS x;
create index citextind on test_citext(a);
VACUUM (FREEZE, ANALYZE) test_citext;
Select count(*) from test_citext where a = 'test';
--Index Only Scan 4.44 + 4.46
CREATE TABLE test_textlowerindex(a text NOT NULL);
INSERT INTO test_textlowerindex
SELECT cast(x as text)
FROM generate_series(1, 1000000) AS x;
create index lowertextind on test_textlowerindex(lower(a));
VACUUM (FREEZE, ANALYZE) test_textlowerindex;
Select count(*) from test_textlowerindex where lower(a) = 'test';
--Index Scan 8.44 + 8.46
But anything did not changed Even If I run analyze after creating index and using count(*) in select.Index Scan still continues with an index on lower().
Your test is misleading. There are two problems here:
You didn't run ANALYZE after creating index lowertextind.
Without that, PostgreSQL doesn't know how lower(a) is distributed and will probably produce a wrong cost estimate.
By using SELECT * you inadvertently allowed an index only scan to be used for the first query, but not for the second. This is because the first index contains all table columns, but the second doesn't.
Since the second index doesn't contain a, the value has to be fetched from the table, causing additional work.
You could use SELECT count(*) FROM ... for a fairer benchmark.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With