Say, I have a table ResidentInfo
, and in this table I have unique constraints HomeAddress
, which is VARCHAR
type. For future query, I gonna add an index on this column. The query will only have operation =
, and I'll use B-TREE pattern since the Hash pattern is not recommended currently.
Question: From efficiency view, using B-TREE, do you think I should add a new column with numbers 1,2,3....,N corresponding to different homeaddress, and instead of adding index on HomeAddress
, I should add index on the number column?
I ask this question because I don't know how index works.
A GiST index is lossy, meaning that the index might produce false matches, and it is necessary to check the actual table row to eliminate such false matches. (PostgreSQL does this automatically when needed.) GiST indexes are lossy because each document is represented in the index by a fixed-length signature.
GIN stands for Generalized Inverted Index. GIN is designed for handling cases where the items to be indexed are composite values, and the queries to be handled by the index need to search for element values that appear within the composite items.
For simple equality checks (=
), a B-Tree index on a varchar
or text
column is simple and the best choice. It certainly helps performance a lot.
Of course, a B-Tree index on a simple integer
performs better. For starters, comparing simple integer
values is a bit faster. But more importantly, performance is also a function of the size of the index. A bigger column means fewer rows per data page, means more pages have to be read ...
Since the HomeAddress
is hardly unique anyway, it's not a good natural primary key. I would strongly suggest to use a surrogate primary key instead. A serial
column is the obvious choice for that. Its only purpose is to have a simple, fast primary key to work with.
If you have other tables referencing said table, this becomes even more efficient. Instead of duplicating a lengthy string for the foreign key column, you only need the 4 bytes for an integer column. And you don't need to cascade updates so much, since an address is bound to change, while a surrogate pk can stay the same (but doesn't have to, of course).
Your table could look like this:
CREATE TABLE resident ( resident_id serial PRIMARY KEY ,address text NOT NULL -- more columns ); CREATE INDEX resident_adr_idx ON resident(address);
This results in two B-Tree indexes. A unique index on resident_id
and a plain index on address
.
More about indexes in the manual.
Postgres offers a lot of options - but you don't need any more for this simple case.
In Postgres, a unique constraint is enforced by maintaining a unique index on the field, so you're covered already.
In the event you decide the unique constraint on the address is bad (which, honestly, it is: what a spouse creating a separate account? about flatshares? etc.), you can create one like so:
create index on ResidentInfo (HomeAddress);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With