I have inherited a large legacy codebase which runs in django 1.5 and my current task is to speed up a section of the site which takes ~1min to load.
I did a profile of the app and got this:
The culprit in particular is the following query (stripped for brevity):
SELECT COUNT(*) FROM "entities_entity" WHERE (
"entities_entity"."date_filed" <= '2016-01-21' AND (
UPPER("entities_entity"."entity_city_state_zip"::text) LIKE UPPER('%Atherton%') OR
UPPER("entities_entity"."entity_city_state_zip"::text) LIKE UPPER('%Berkeley%') OR
-- 34 more of these
UPPER("entities_entity"."agent_city_state_zip"::text) LIKE UPPER('%Atherton%') OR
UPPER("entities_entity"."agent_city_state_zip"::text) LIKE UPPER('%Berkeley%') OR
-- 34 more of these
)
)
which basically consist on a big like query on two fields, entity_city_state_zip
and agent_city_state_zip
which are character varying(200) | not null
fields.
That query is performed twice (!), taking 18814.02ms each time, and one more time replacing the COUNT
for a SELECT
taking up an extra 20216.49
(I'm going to cache the result of the COUNT
)
The explain looks like this:
Aggregate (cost=175867.33..175867.34 rows=1 width=0) (actual time=17841.502..17841.502 rows=1 loops=1)
-> Seq Scan on entities_entity (cost=0.00..175858.95 rows=3351 width=0) (actual time=0.849..17818.551 rows=145075 loops=1)
Filter: ((date_filed <= '2016-01-21'::date) AND ((upper((entity_city_state_zip)::text) ~~ '%ATHERTON%'::text) OR (upper((entity_city_state_zip)::text) ~~ '%BERKELEY%'::text) (..skipped..) OR (upper((agent_city_state_zip)::text) ~~ '%ATHERTON%'::text) OR (upper((agent_city_state_zip)::text) ~~ '%BERKELEY%'::text) OR (upper((agent_city_state_zip)::text) ~~ '%BURLINGAME%'::text) ))
Rows Removed by Filter: 310249
Planning time: 2.110 ms
Execution time: 17841.944 ms
I've tried using an index on entity_city_state_zip
and agent_city_state_zip
using various combinations like:
CREATE INDEX ON entities_entity (upper(entity_city_state_zip));
CREATE INDEX ON entities_entity (upper(agent_city_state_zip));
or using varchar_pattern_ops
, with no luck.
The server is using something like this:
qs = queryset.filter(Q(entity_city_state_zip__icontains = all_city_list) |
Q(agent_city_state_zip__icontains = all_city_list))
to generate that query.
I don't know what else to try,
Thanks!
I think problem in "multiple LIKE" and in UPPER("entities_entity ...
You can use:
WHERE entities_entity.entity_city_state_zip SIMILAR TO '%Atherton%|%Berkeley%'
Or something like this:
WHERE entities_entity.entity_city_state_zip LIKE ANY(ARRAY['%Atherton%', '%Berkeley%'])
Edited
About Raw SQL query in Django:
Regards
I watched a course in Pluralsight that addressed a very similar issue. The course was "Postgres for .NET Developers" and this was in the section "Fun With Simple SQL", "Full Text Search."
To summarize their solution, using your example:
Create a new column in your table that will represent your entity_city_state_zip
as a tsvector:
create table entities_entity (
date_filed date,
entity_city_state_zip text,
csz_search tsvector not null -- add this column
);
Initially you might have to make it nullable, then populate the data and make it non-nullable.
update entities_entity
set csz_search = to_tsvector (entity_city_state_zip);
Next, create a trigger that will cause the new field to be populated any time a record is added or modified:
create trigger entities_insert_update
before insert or update on entities_entity
for each row execute procedure
tsvector_update_trigger(csz_search,'pg_catalog.english',entity_city_state_zip);
Your search queries can now query on the tsvector field rather than the city/state/zip field:
select * from entities_entity
where csz_search @@ to_tsquery('Atherton')
Some notes of interest on this:
upper
functions you have in your queryAs a final step, put a GIN
index on the tsquery field:
create index entities_entity_ix1 on entities_entity
using gin(csz_search);
If I understand the course right, this should make your query fly, and it will overcome the issue of a btree index's inability to work on a like '%
query.
Here is the explain plan on such a query:
Bitmap Heap Scan on entities_entity (cost=56.16..1204.78 rows=505 width=81)
Recheck Cond: (csz_search @@ to_tsquery('Atherton'::text))
-> Bitmap Index Scan on entities_entity_ix1 (cost=0.00..56.04 rows=505 width=0)
Index Cond: (csz_search @@ to_tsquery('Atherton'::text))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With