Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL query causing cpu to spike to 100%. Given 90k records, is a cost of 7000 ok?

I'm working to understand how cost and actual time should be used to optimize queries. My app is rails 3 with a PostgreSQL 9.1 db. My query is used by Delayed_job:

EXPLAIN ANALYZE SELECT  "delayed_jobs".*
FROM "delayed_jobs"
WHERE ((run_at <= '2011-05-23 15:16:43.180810' AND (locked_at IS NULL OR locked_at < '2011-01-25 11:05:28.077144') OR locked_by = 'host:foo pid:2') AND failed_at IS NULL AND queue = 'authentication_emails')
ORDER BY priority ASC, run_at ASC LIMIT 5

Or:

EXPLAIN ANALYZE SELECT  "delayed_jobs".*
FROM "delayed_jobs"
WHERE ((run_at <= '2011-05-23 15:16:43.180810' AND (locked_at IS NULL OR locked_at < '2011-01-25 11:05:28.077144') OR locked_by = 'host:foo pid:2') AND failed_at IS NULL )
ORDER BY priority ASC, run_at ASC LIMIT 5

For the first query, the output equals:

Limit  (cost=7097.57..7097.57 rows=1 width=1008) (actual time=35.657..35.657 rows=0 loops=1)
  ->  Sort  (cost=7097.57..7097.57 rows=1 width=1008) (actual time=35.655..35.655 rows=0 loops=1)
        Sort Key: priority, run_at
        Sort Method: quicksort  Memory: 25kB
        ->  Seq Scan on delayed_jobs  (cost=0.00..7097.56 rows=1 width=1008) (actual time=35.648..35.648 rows=0 loops=1)
              Filter: ((failed_at IS NULL) AND ((queue)::text = 'authentication_emails'::text) AND (((run_at <= '2011-05-23 15:16:43.18081'::timestamp without time zone) AND ((locked_at IS NULL) OR (locked_at < '2011-01-25 11:05:28.077144'::timestamp without time zone))) OR (locked_by = 'host:foo pid:2'::text)))
Total runtime: 35.695 ms

The table currently has 90k records and can range from 0-200k. We're noticing this query is causing the CPU to spike and cause bottlenecks. What can be learned from the explain info above. Where should indexes be added if any? Thanks

DB Schema.. Table has 0 indexes.

  create_table "delayed_jobs", :force => true do |t|
    t.integer  "priority",   :default => 0
    t.integer  "attempts",   :default => 0
    t.text     "handler"
    t.text     "last_error"
    t.datetime "run_at"
    t.datetime "locked_at"
    t.datetime "failed_at"
    t.text     "locked_by"
    t.datetime "created_at",                :null => false
    t.datetime "updated_at",                :null => false
    t.string   "queue"
  end
like image 340
AnApprentice Avatar asked Nov 13 '22 13:11

AnApprentice


1 Answers

Analysis

If you will go this section of the PostgreSQL documentation, you will learn how planner is using statistics to estimate costs. This is very usable information!

If you say, that table has round 90k records (and using default costs), then cost for rows' processing will be:

90000 * (cpu_tuple_cost + cpu_operator_cost) = 90000 * 0.0125 = 1125

We can now approximate how many pages your table occupies:

(7097.56-1125)/seq_page_cost = 5972.56

Which makes it roughly 46Mb (with default 8k page size). Thus I assume your table fits into the shared_buffers, even default ones.

Looking at the average row width I also assume, that table is mostly stored as MAIN.

Next, you're using fields of type text and string as predicates. Not sure how they map to the PostgreSQL internal types, but I would assume it is text. This type is compressable by default, therefore PostgreSQL has to perform de-compression for each row to check predicates. I'm not sure after which threshold compression kicks off, take a look at this message (and the whole thread).

Conclusion

  1. You haven't shown us real EXPLAIN (analyze) output, as I also don't think that 35ms query can cause bottlenecks, except...
  2. You haven't mentioned how many sessions are using your database at the bottleneck moments, and also it is not clear how frequently this query is run. I assume that it is very popular one.
  3. Your table seems to fit into the memory, therefore all operations will be CPU-bound in any case.
  4. Values used in the predicates are compressable and seems to be compressed.

Therefore I'd said that bottleneck comes from the peak amount of queries run in parallel on the data, that requires extra CPU cycles for de-compression.

What to do?

  1. Normalise your table. It feels that “queue” column is of very low selectivity. Consider creating external type (like ENUM) for it, or organize a dictionary table with a proper Foreign Key. I'm also not sure bout locked_by column, can it be normalised?
  2. Create indexes on run_at and locked_at columns.
  3. Index ON priority, run_at columns will benefit your sorts, but I doubt it will help in this case. I assume that priority column is of low selectivity, therefore planner will prefer to use Bitmap And over Index Scans on run_at and locked_at columns.

I hope I am not terribly wrong here :) Comments/corrections are welcome!

P.S. Let me know how it goes for you.

like image 63
vyegorov Avatar answered Nov 15 '22 05:11

vyegorov