My application is very database intensive so I'm trying to reduce the load on the database. I am using PostgreSQL as rdbms and python is the programming language.
To reduce the load I am already using a caching mechanism in the application. The caching type I used is a server cache, browser cache.
Currently I'm tuning the PostgreSQL query cache to get it in line with the characteristics of queries being run on the server.
Questions:
Tuning PostgreSQL is far more than just tuning caches. In fact, the primary high level things are "shared buffers" (think of this as the main data and index cache), and the work_mem.
The shared buffers help with reading and writing. You want to give it a decent size, but it's for the entire cluster.. and you can't really tune it on a per table or especially query basis. Importantly, it's not really storing query results.. it's storing tables, indexes and other data. In an ACID compliant database, it's not very efficient or useful to cache query results.
The "work_mem" is used to sort query results in memory and not have to resort to writing to disk.. depending on your query, this area could be as important as the buffer cache, and easier to tune. Before running a query that needs to do a larger sort, you can issue the set command like "SET work_mem = '256MB';"
As others have suggested you can figure out WHY a query is running slowly using "explain". I'd personally suggest learning the "access path" postgresql is taking to get to your data. That's far more involved and honestly a better use of resources than simply thinking of "caching results".
You can honestly improve things a lot with data design as well and using features such as partitioning, functional indexes, and other techniques.
One other thing is that you can get better performance by writing better queries.. things like "with" clauses can prevent postgres' optimizer from optimizing queries fully. The optimizer itself also has parameters that can be adjusted-- so that the DB will spend more (or less) time optimizing a query prior to executing it.. which can make a difference.
You can also use certain techniques to write queries to help the optimizer. One such technique is to use bind variables (colon variables)--- this will result in the optimizer getting the same query over and over with different data passed in. This way, the structure doesn't have to be evaluated over and over.. query plans can be cached in this way.
Without seeing some of your queries, your table and index designs, and an explain plan, it's hard to make specific recommendation.
In general, you need to find queries that aren't as performant as you feel they should be and figure out where the contention is. Likely it's disk access, however,the cause is ultimately the most important part.. is it having to go to disk to hold the sort? Is it internally choosing a bad path to get to the data, such that it's reading data that could easily be eliminated earlier in the query process... I've been an oracle certified DBA for over 20 years, and PostgreSQL is definitely different, however, many of the same techniques are used when it comes to diagnosing a query's performance issues. Although you can't really provide hints, you can still rewrite queries or tune certain parameters to get better performace.. in general, I've found postgresql to be easier to tune in the long run. If you can provide some specifics, perhaps a query and explain info, I'd be happy to give you specific recommendations. Sadly, though, "cache tuning" is likely to provide you the speed you're wanting all on its own.
I developed a system for caching results, to speed-up results queried from a web-based solution. I reproduced below in essence what it did:
The following are the generic caching handling tables and functions.
CREATE TABLE cached_results_headers (
cache_id serial NOT NULL PRIMARY KEY,
date timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP,
last_access timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP,
relid regclass NOT NULL,
query text NOT NULL,
rows int NOT NULL DEFAULT 0
);
CREATE INDEX ON cached_results_headers (relid, md5(query));
CREATE TABLE cached_results (
cache_id int NOT NULL,
row_no int NOT NULL
);
CREATE OR REPLACE FUNCTION f_get_cached_results_header (p_cache_table text, p_source_relation regclass, p_query text, p_max_lifetime interval, p_clear_old_data interval) RETURNS cached_results_headers AS $BODY$
DECLARE
_cache_id int;
_rows int;
BEGIN
IF p_clear_old_data IS NOT NULL THEN
DELETE FROM cached_results_headers WHERE date < CURRENT_TIMESTAMP - p_clear_old_data;
END IF;
_cache_id := cache_id FROM cached_results_headers WHERE relid = p_source_relation AND md5(query) = md5(p_query) AND query = p_query AND date > CURRENT_TIMESTAMP - p_max_lifetime;
IF _cache_id IS NULL THEN
INSERT INTO cached_results_headers (relid, query) VALUES (p_source_relation, p_query) RETURNING cache_id INTO _cache_id;
EXECUTE $$ INSERT INTO $$||p_cache_table||$$ SELECT $1, row_number() OVER (), r.r FROM ($$||p_query||$$) r $$ USING _cache_id;
GET DIAGNOSTICS _rows = ROW_COUNT;
UPDATE cached_results_headers SET rows = _rows WHERE cache_id = _cache_id;
ELSE
UPDATE cached_results_headers SET last_access = CURRENT_TIMESTAMP;
END IF;
RETURN (SELECT h FROM cached_results_headers h WHERE cache_id = _cache_id);
END;
$BODY$ LANGUAGE PLPGSQL SECURITY DEFINER;
The following is an example of how to use the tables and functions above, for a given view named my_view
with a field key
to be selected within a range of integer values. You would replace all the following with your particular needs, and replace my_view
with either a table, a view, or a function. Also replace the filtering parameters as required.
CREATE VIEW my_view AS SELECT ...; -- create a query with your data, with one of the integer columns in the result as "key" to filter by
CREATE TABLE cached_results_my_view (
row my_view NOT NULL,
PRIMARY KEY (cache_id, row_no),
FOREIGN KEY (cache_id) REFERENCES cached_results_headers ON DELETE CASCADE
) INHERITS (cached_results);
CREATE OR REPLACE FUNCTION f_get_my_view_cached_rows (p_filter1 int, p_filter2 int, p_row_from int, p_row_to int) RETURNS SETOF my_view AS $BODY$
DECLARE
_cache_id int;
BEGIN
_cache_id := cache_id
FROM f_get_cached_results_header('cached_results_my_view', 'my_view'::regclass,
'SELECT r FROM my_view r WHERE key BETWEEN '||p_filter1::text||' AND '||p_filter2::text||' ORDER BY key',
'15 minutes'::interval, '1 day'::interval); -- cache for 15 minutes max since creation time; delete all cached data older than 1 day old
RETURN QUERY
SELECT (row).*
FROM cached_results_my_view
WHERE cache_id = _cache_id AND row_no BETWEEN p_row_from AND p_row_to
ORDER BY row_no;
END;
$BODY$ LANGUAGE PLPGSQL;
Example: Retrieve rows from 1 to 2000 from cached my_view results filtered by key BETWEEN 30044 AND 10610679
. Run a first time and the results of the query will be cached into table cached_results_my_view
, and the first 2000 records will be returned. Run it again shortly after and the results will be retrieved from the table cached_results_my_view
directly without executing the query.
SELECT * FROM f_get_my_view_cached_rows(30044, 10610679, 1, 2000);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With