Say, I have a table <code>ResidentInfo</code>, and in this table I have unique constraints <code>HomeAddress</code>, which is <code>VARCHAR</code> type. For future query, I gonna add an index on this column. The query will only have operation <code>=</code>, and I'll use B-TREE pattern since the Hash pattern is not recommended currently. Question: From efficiency view, using B-TREE, do you think I should add a new column with numbers 1,2,3....,N corresponding to different homeaddress, and instead of adding index on <code>HomeAddress</code>, I should add index on the number column? I ask this question because I don't know how index works.

For simple equality checks (<code>=</code>), a B-Tree index on a <code>varchar</code> or <code>text</code> column is simple and the best choice. It certainly helps performance a lot. Of course, a B-Tree index on a simple <code>integer</code> performs better. For starters, comparing simple <code>integer</code> values is a bit faster. But more importantly, performance is also a function of the size of the index. A bigger column means fewer rows per data page, means more pages have to be read ... Since the <code>HomeAddress</code> is hardly unique anyway, it's not a good natural primary key. I would strongly suggest to use a surrogate primary key instead. A <code>serial</code> column is the obvious choice for that. Its only purpose is to have a simple, fast primary key to work with. If you have other tables referencing said table, this becomes even more efficient. Instead of duplicating a lengthy string for the foreign key column, you only need the 4 bytes for an integer column. And you don't need to cascade updates so much, since an address is bound to change, while a surrogate pk can stay the same (but doesn't have to, of course). Your table could look like this: <pre class="prettyprint"><code>CREATE TABLE resident ( resident_id serial PRIMARY KEY ,address text NOT NULL -- more columns ); CREATE INDEX resident_adr_idx ON resident(address); </code></pre> This results in two B-Tree indexes. A unique index on <code>resident_id</code> and a plain index on <code>address</code>. More about indexes in the manual. Postgres offers a lot of options - but you don't need any more for this simple case.

postgresql index on string column

Tags:

database

indexing

postgresql

Say, I have a table ResidentInfo, and in this table I have unique constraints HomeAddress, which is VARCHAR type. For future query, I gonna add an index on this column. The query will only have operation =, and I'll use B-TREE pattern since the Hash pattern is not recommended currently.

Question: From efficiency view, using B-TREE, do you think I should add a new column with numbers 1,2,3....,N corresponding to different homeaddress, and instead of adding index on HomeAddress, I should add index on the number column?

I ask this question because I don't know how index works.

313

asked Jun 04 '13 17:06

Hao

2 Answers

For simple equality checks (=), a B-Tree index on a varchar or text column is simple and the best choice. It certainly helps performance a lot.

Of course, a B-Tree index on a simple integer performs better. For starters, comparing simple integer values is a bit faster. But more importantly, performance is also a function of the size of the index. A bigger column means fewer rows per data page, means more pages have to be read ...

Since the HomeAddress is hardly unique anyway, it's not a good natural primary key. I would strongly suggest to use a surrogate primary key instead. A serial column is the obvious choice for that. Its only purpose is to have a simple, fast primary key to work with.

If you have other tables referencing said table, this becomes even more efficient. Instead of duplicating a lengthy string for the foreign key column, you only need the 4 bytes for an integer column. And you don't need to cascade updates so much, since an address is bound to change, while a surrogate pk can stay the same (but doesn't have to, of course).

Your table could look like this:

CREATE TABLE resident (    resident_id serial PRIMARY KEY   ,address text NOT NULL    -- more columns );  CREATE INDEX resident_adr_idx ON resident(address);

This results in two B-Tree indexes. A unique index on resident_id and a plain index on address.

More about indexes in the manual.
Postgres offers a lot of options - but you don't need any more for this simple case.

129

answered Sep 30 '22 15:09

Erwin Brandstetter

In Postgres, a unique constraint is enforced by maintaining a unique index on the field, so you're covered already.

In the event you decide the unique constraint on the address is bad (which, honestly, it is: what a spouse creating a separate account? about flatshares? etc.), you can create one like so:

create index on ResidentInfo (HomeAddress);

answered Sep 30 '22 14:09

Denis de Bernardy

Related questions
                            
                                Using text as a primary key in SQLite table bad?
                            
                                Techniques for database inheritance?
                            
                                MySQL pid ended (cannot start mysql)
                            
                                SQL LIKE operator in Cloud Firestore?
                            
                                Tool for automatically creating data for django model [closed]
                            
                                Creating new database in DataGrip JetBrains
                            
                                Does MongoDB support floating point types?
                            
                                Why are relational databases having scalability issues?
                            
                                What does it mean to vacuum a database?
                            
                                Detached Entity and Managed Entity
                            
                                How to create join table with JPA annotations?
                            
                                Unit-Testing Databases
                            
                                Should you make a self-referencing table column a foreign key?
                            
                                How to parse the data from Google Alerts?
                            
                                Multiple and single indexes
                            
                                How to get the raw 'created_at' value in the database (not an object cast to an ActiveSupport::TimeWithZone)
                            
                                How do you join two tables on a foreign key field using django ORM?
                            
                                DataGrip added value compared to IntelliJ IDEA
                            
                                How to execute an Oracle stored procedure via a database link
                            
                                MySQL: Why use VARCHAR(20) instead of VARCHAR(255)? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With