Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow performance of postgres even on creating indexes

Tags:

sql

postgresql

I created indexes in postgres for the below mentioned table using md5. The indexes and the table are given below:

 create table my_table(col1 character varying, col2 character varying, col3 character varying);

my_table looks like (I have just given an example. My actual table is of 1Tera Byte):

   col1  col2   col3
   <a12> <j178> <k109>

create index index1 on my_table (md5(col1), md5(col2), md5(col2));

I tried to create index without using md5, however I ended up getting the error:

ERROR:  index row size 2760 exceeds maximum 2712 for index "index1"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

However, I notice that my query processing time remains the same whether I had created index or not. I am confused as to what could be the reason. Can someone please help me with this?

The sql query which I have fired is of the form:

select col3 from my_table where col1='<a12>' and col2='<j178>';
like image 235
Alice Everett Avatar asked Feb 14 '23 23:02

Alice Everett


1 Answers

Since you are getting an error when you try to create a standard btree index I am guessing that the data in one or more of these columns is quite large.

The index you created is perhaps best described as "a b-tree index of the md5 hashes of the three columns", rather than an "md5 index of three columns".

In order for PostgreSQL to use the index, your query must be for md5 hashes. Try:

SELECT col3
FROM my_table 
WHERE
      md5(col1) = md5('<a12>')
  and md5(col2) = md5('<j178>')

The planner will say "oh, I have an index of md5(col1) etc, I'll use that". Note that this will only work for full equality queries (=) and not for LIKE or range queries. Also it won't get the value for col3 from the index because only the md5 of col3 is stored there, so it will still need to go to the table to get the col3 value.

For a small table this would probably result in the planner deciding to skip the index and just do a full scan on the table, but it sounds like your table is big enough that the index will be worthwhile - postgres will scan the index, find the row entries that match, then retreive those rows from the table.

Now if col3 is the one that has large hunks of data in it and cols 1 and 2 are small, you could just create a normal index of col1, col2. You really only need to index the columns in your where clause, not those in the select part.

The postgres indexes documentation is pretty good: http://www.postgresql.org/docs/9.0/static/indexes.html but the CREATE INDEX page is probably the single most useful one: http://www.postgresql.org/docs/9.1/static/sql-createindex.html

The best way to find out if your indexes are being used is to use the "EXPLAIN" instruction: http://www.postgresql.org/docs/9.1/static/sql-explain.html - if you use pgadmin3 to play with your DB (I highly recommend it) then just press F7 in the query window and it will do the explain and present it in a nice GUI showing you the query plan. This has saved many hours of hair-pulling trying to find out why my indices weren't being used.

like image 186
agittins Avatar answered Feb 25 '23 19:02

agittins