Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

postgresql index on string column

Say, I have a table ResidentInfo, and in this table I have unique constraints HomeAddress, which is VARCHAR type. For future query, I gonna add an index on this column. The query will only have operation =, and I'll use B-TREE pattern since the Hash pattern is not recommended currently.

Question: From efficiency view, using B-TREE, do you think I should add a new column with numbers 1,2,3....,N corresponding to different homeaddress, and instead of adding index on HomeAddress, I should add index on the number column?

I ask this question because I don't know how index works.

like image 313
Hao Avatar asked Jun 04 '13 17:06

Hao


People also ask

What is GiST index in PostgreSQL?

A GiST index is lossy, meaning that the index might produce false matches, and it is necessary to check the actual table row to eliminate such false matches. (PostgreSQL does this automatically when needed.) GiST indexes are lossy because each document is represented in the index by a fixed-length signature.

What is a GIN index?

GIN stands for Generalized Inverted Index. GIN is designed for handling cases where the items to be indexed are composite values, and the queries to be handled by the index need to search for element values that appear within the composite items.


2 Answers

For simple equality checks (=), a B-Tree index on a varchar or text column is simple and the best choice. It certainly helps performance a lot.

Of course, a B-Tree index on a simple integer performs better. For starters, comparing simple integer values is a bit faster. But more importantly, performance is also a function of the size of the index. A bigger column means fewer rows per data page, means more pages have to be read ...

Since the HomeAddress is hardly unique anyway, it's not a good natural primary key. I would strongly suggest to use a surrogate primary key instead. A serial column is the obvious choice for that. Its only purpose is to have a simple, fast primary key to work with.

If you have other tables referencing said table, this becomes even more efficient. Instead of duplicating a lengthy string for the foreign key column, you only need the 4 bytes for an integer column. And you don't need to cascade updates so much, since an address is bound to change, while a surrogate pk can stay the same (but doesn't have to, of course).

Your table could look like this:

CREATE TABLE resident (    resident_id serial PRIMARY KEY   ,address text NOT NULL    -- more columns );  CREATE INDEX resident_adr_idx ON resident(address); 

This results in two B-Tree indexes. A unique index on resident_id and a plain index on address.

More about indexes in the manual.
Postgres offers a lot of options - but you don't need any more for this simple case.

like image 129
Erwin Brandstetter Avatar answered Sep 30 '22 15:09

Erwin Brandstetter


In Postgres, a unique constraint is enforced by maintaining a unique index on the field, so you're covered already.

In the event you decide the unique constraint on the address is bad (which, honestly, it is: what a spouse creating a separate account? about flatshares? etc.), you can create one like so:

create index on ResidentInfo (HomeAddress); 
like image 25
Denis de Bernardy Avatar answered Sep 30 '22 14:09

Denis de Bernardy