Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance Tuning: Create index for boolean column

I have written a daemon processor which will fetch rows from one database and insert them into another for synchronizing. It will fetch rows based on a boolean indication flag sync_done.

My table has hundreds of thousands of rows. When I select all rows with sync_done is false, will it cause any database performance issues? Should I apply indexing for that sync_done column to improve performance, since only rows with a sync_done value of false are fetched?

Say, I have 10000 rows. Of those, 9500 have already been synchronized (sync_done is true) and will not be selected.

Please suggest how I might proceed.

like image 203
Pavunkumar Avatar asked Aug 19 '12 07:08

Pavunkumar


People also ask

Can you index a Boolean?

The Boolean values like True & false and 1&0 can be used as indexes in panda dataframe. They can help us filter out the required records. In the below exampels we will see different methods that can be used to carry out the Boolean indexing operations.

How do indexes affect database performance?

An index is used to speed up data search and SQL query performance. The database indexes reduce the number of data pages that have to be read in order to find the specific record. The biggest challenge with indexing is to determine the right ones for each table.

How do you set a Boolean value in SQL query?

Sql server does not expose a boolean data type which can be used in queries. Instead, it has a bit data type where the possible values are 0 or 1 . So to answer your question, you should use 1 to indicate a true value, 0 to indicate a false value, or null to indicate an unknown value.

When to use indexes in sql?

Indexes are used to retrieve data from the database more quickly than otherwise. The users cannot see the indexes, they are just used to speed up searches/queries. Note: Updating a table with indexes takes more time than updating a table without (because the indexes also need an update).


1 Answers

For a query like this, a partial index covering only unsynced rows would serve best.

CREATE INDEX ON tbl (id) WHERE sync_done = FALSE; 

However, for a use case like this, other synchronization methods may be preferable to begin with:

  • Have a look at LISTEN / NOTIFY.
  • Or use a trigger in combination with dblink or a foreign data wrapper like postgres_fdw (preferably).
  • Or one of the many available replication methods.
    Streaming Replication was added with Postgres 9.0 and has become increasingly popular.
like image 102
Erwin Brandstetter Avatar answered Oct 13 '22 21:10

Erwin Brandstetter