Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster search for records where 1st character of field doesn't match [A-Za-z]?

I currently have the following:

User (id, fname, lname, deleted_at, guest)

I can query for a list of user's by their fname initial like so:

User Load (9.6ms)  SELECT "users".* FROM "users" WHERE (users.deleted_at IS NULL) AND (lower(left(fname, 1)) = 's') ORDER BY fname ASC LIMIT 25 OFFSET 0

This is fast thanks to the following index:

  CREATE INDEX users_multi_idx
  ON users (lower(left(fname, 1)), fname)
  WHERE deleted_at IS NULL;

What I want to do now is be able to query for all Users that do not start with the letter's A-Z. I got this to work like so:

SELECT "users".* FROM "users" WHERE (users.deleted_at IS NULL) AND (lower(left(fname, 1)) ~ E'^[^a-zA-Z].*') ORDER BY fname ASC LIMIT 25 OFFSET 0

But the problem is this query is very slow and does not appear to be using the index to speed up the first query. Any suggestions on how I can elegantly make the 2nd query (non a-z) faster?

I'm using Postgres 9.1 with rails 3.2

Thanks

like image 852
AnApprentice Avatar asked Oct 16 '12 01:10

AnApprentice


2 Answers

Updated answer
Preceding question here.

My first idea idea (index with text_pattern_ops) did not work with the regular expression in my tests. Better rewrite your query to:

SELECT *
FROM   users
WHERE  deleted_at IS NULL
WHERE lower(left(fname, 1)) < 'a' COLLATE "C"
OR    lower(left(fname, 1)) > 'z' COLLATE "C"
ORDER  BY fname
LIMIT  25 OFFSET 0;

Besides from these expressions being faster generally, your regular expression also had capital letters in it, which did not match the index with lower(). And the trailing characters were pointless while comparing to a single char.

And use this index:

CREATE INDEX users_multi_idx
ON users (lower(left(fname, 1)) COLLATE "C", fname)
WHERE deleted_at IS NULL;

The COLLATE "C" part is optional and only contributes a very minor gain in performance. It's purpose is to reset collation rules to default posix collation, which just uses byte order and is generally faster. Useful, where collation rules are not relevant anyway.

If you create the index with it, only queries that match the collation can use it. So you might just skip it to simplify things if performance is not your paramount requirement.

like image 79
Erwin Brandstetter Avatar answered Oct 11 '22 15:10

Erwin Brandstetter


As an alternative to @ErwinBrandstetter's general solution, PostgreSQL supports partial indexes. You can say:

CREATE INDEX users_nonalphanumeric_not_deleted_key
ON users (id)
WHERE (users.deleted_at IS NULL) AND (lower(left(fname, 1)) ~ E'^[^a-zA-Z].*');

This index won't help for any other lookups, but it will precompute the answer for this particular query. This technique is often useful for queries that return a small, predefined subset from a much larger table, since the resulting index will disregard the vast majority of the table and contain only the rows of interest.

like image 44
willglynn Avatar answered Oct 11 '22 16:10

willglynn