I've got a query:
EXPLAIN ANALYZE
SELECT CAST(DATE(associationtime) AS text) AS date ,
cast(SUM(extract(epoch
FROM disassociationtime) - extract(epoch
FROM associationtime)) AS bigint) AS sessionduration,
cast(SUM(tx) AS bigint)AS tx,
cast(SUM(rx) AS bigint) AS rx,
cast(SUM(dataRetries) AS bigint) AS DATA,
cast(SUM(rtsRetries) AS bigint) AS rts,
count(*)
FROM SESSION
WHERE ssid_id=42
AND ap_id=1731
AND DATE(associationtime)>=DATE('Tue Nov 04 00:00:00 MSK 2014')
AND DATE(associationtime)<=DATE('Thu Nov 20 00:00:00 MSK 2014')
GROUP BY(DATE(associationtime))
ORDER BY DATE(associationtime);
The output is:
GroupAggregate (cost=0.44..17710.66 rows=1 width=32) (actual time=4.501..78.880 rows=17 loops=1)
-> Index Scan using session_lim_values_idx on session (cost=0.44..17538.94 rows=6868 width=32) (actual time=0.074..73.266 rows=7869 loops=1)
Index Cond: ((date(associationtime) >= '2014-11-04'::date) AND (date(associationtime) <= '2014-11-20'::date))
Filter: ((ssid_id = 42) AND (ap_id = 1731))
Rows Removed by Filter: 297425
Total runtime: 78.932 ms
Look at this line:
Index Scan using session_lim_values_idx
As you can see, query uses three fields to scan: ssid_id, ap_id and associationtime. I've got an index fo this:
ssid_pkey | btree | {id}
ap_pkey | btree | {id}
testingshit_pkey | btree | {one,two,three}
session_date_ssid_idx | btree | {ssid_id,date(associationtime),"date_trunc('hour'::text, associationtime)"}
session_pkey | btree | {associationtime,disassociationtime,sessionduration,clientip,clientmac,devicename,tx,rx,protocol,snr,rssi,dataretries,rtsretries }
session_main_idx | btree | {ssid_id,ap_id,associationtime,disassociationtime,sessionduration,clientip,clientmac,devicename,tx,rx,protocol,snr,rssi,dataretres,rtsretries}
session_date_idx | btree | {date(associationtime),"date_trunc('hour'::text, associationtime)"}
session_date_apid_idx | btree | {ap_id,date(associationtime),"date_trunc('hour'::text, associationtime)"}
session_date_ssid_apid_idx | btree | {ssid_id,ap_id,date(associationtime),"date_trunc('hour'::text, associationtime)"}
ap_apname_idx | btree | {apname}
users_pkey | btree | {username}
user_roles_pkey | btree | {user_role_id}
session_lim_values_idx | btree | {date(associationtime)}
It's called session_date_ssid_apid_idx
. But why query uses wrong index?
session_date_ssid_apid_idx:
------------+-----------------------------+-------------------------------------------
ssid_id | integer | ssid_id
ap_id | integer | ap_id
date | date | date(associationtime)
date_trunc | timestamp without time zone | date_trunc('hour'::text, associationtime)
session_lim_values_idx:
date | date | date(associationtime)
What index would you create?
UPD: \d session
--------------------+-----------------------------+------------------------------------------------------
id | integer | NOT NULL DEFAULT nextval('session_id_seq'::regclass)
ssid_id | integer | NOT NULL
ap_id | integer | NOT NULL
associationtime | timestamp without time zone | NOT NULL
disassociationtime | timestamp without time zone | NOT NULL
sessionduration | character varying(100) | NOT NULL
clientip | character varying(100) | NOT NULL
clientmac | character varying(100) | NOT NULL
devicename | character varying(100) | NOT NULL
tx | integer | NOT NULL
rx | integer | NOT NULL
protocol | character varying(100) | NOT NULL
snr | integer | NOT NULL
rssi | integer | NOT NULL
dataretries | integer | NOT NULL
rtsretries | integer | NOT NULL
╚эфхъё√:
"session_pkey" PRIMARY KEY, btree (associationtime, disassociationtime, sessionduration, clientip, clientmac, devicename, tx, rx, protocol, snr, rssi, dataretries, rtsretries)
"session_date_ap_ssid_idx" btree (ssid_id, ap_id, associationtime)
"session_date_apid_idx" btree (ap_id, date(associationtime), date_trunc('hour'::text, associationtime))
"session_date_idx" btree (date(associationtime), date_trunc('hour'::text, associationtime))
"session_date_ssid_apid_idx" btree (ssid_id, ap_id, associationtime)
"session_date_ssid_idx" btree (ssid_id, date(associationtime), date_trunc('hour'::text, associationtime))
"session_lim_values_idx" btree (date(associationtime))
"session_main_idx" btree (ssid_id, ap_id, associationtime, disassociationtime, sessionduration, clientip, clientmac, devicename, tx, rx, protocol, snr, rssi, dataretries, rtsretries)
Very common values in the predicates for ssid_id
and ap_id
can make it cheaper for Postgres to pick the smaller index session_lim_values_idx
(only 1 date
column) over the seemingly better fit, but bigger index session_date_ssid_apid_idx
(4 columns) and filter the rest.
In your case around 4 % of the rows have ssid_id=42 AND ap_id=1731
. That shouldn't normally warrant the switch to the smaller index. But several other factors are in play that may tilt the scale, basically cost settings and statistics. Details:
Adjust your cost settings if you did not do so already as advised in linked the answer above.
Increase the statistics target for involved columns ssid_id
, ap_id
and run ANALYZE
:
One special factor here: Postgres collects separate statistics for expressions in indexes. Check with:
SELECT * FROM pg_statistic
WHERE starelid = 'session_date_ssid_apid_idx'::regclass;
You'll find a dedicated row for the expression date(associationtime)
. More details:
Make the index session_date_ssid_apid_idx
more attractive (smaller) by removing the 4th column "date_trunc('hour'::text, associationtime)
. Looking at your later added table definition, you already did that.
I would rather use the standard syntax for casts: cast(associationtime AS date)
instead of the function syntax date(associationtime)
. Not saying that matters at all, I just know the standard way to work properly. You can use the shorthand syntax associationtime::date
in your queries, that's compatible with the expression index, but use the verbose form in the index definition.
Also, test with EXPLAIN ANALYZE
which query plan is actually faster by removing / recreating only the index you want to test. Then you'll see whether Postgres picked the best plan after all.
You have quite a number of indexes, I would check whether all of them are actually used and get rid of the rest. Indexes have maintenance cost and it's generally beneficial to concentrate on fewer indexes if possible (fit in cache more easily and may be cached already when needed). Weigh cost vs. benefit.
I'd use:
SUM(extract(epoch FROM disassociationtime
- associationtime)::int) AS sessionduration
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With