I have a table avl_pool
, and I have a function to find on the map the link nearest to that (x, y)
position.
The performance of this select is very linear, the function require ~8 ms to execute. So calculate this select for 1000 rows require 8 seconds. Or as I show in this sample 20.000 rows require 162 seconds.
SELECT avl_id, x, y, azimuth, map.get_near_link(X, Y, AZIMUTH)
FROM avl_db.avl_pool
WHERE avl_id between 1 AND 20000
"Index Scan using avl_pool_pkey on avl_pool (cost=0.43..11524.76 rows=19143 width=28) (actual time=8.793..162805.384 rows=20000 loops=1)"
" Index Cond: ((avl_id >= 1) AND (avl_id <= 20000))"
" Buffers: shared hit=19879838"
"Planning time: 0.328 ms"
"Execution time: 162812.113 ms"
Using pgAdmin I found out if execute half of the range on separated windows at the same time, the execution time is actually split in half. So looks like the server can handle multiple requests to that same table/function without problem.
-- windows 1
SELECT avl_id, x, y, azimuth, map.get_near_link(X, Y, AZIMUTH)
FROM avl_db.avl_pool
WHERE avl_id between 1 AND 10000
Total query runtime: 83792 ms.
-- windows 2
SELECT avl_id, x, y, azimuth, map.get_near_link(X, Y, AZIMUTH)
FROM avl_db.avl_pool
WHERE avl_id between 10001 AND 20000
Total query runtime: 84047 ms.
So how should I aproach this scenario to improve performance?.
From the C#
aproach I guess I can create multiple threads and each one send a portion of the range and then I join all the data in the client. So instead one query with 20k and 162 seconds, I could send 10 querys with 2000 rows and finish in ~16 seconds. Of course maybe there is an overhead cost in the join, but shouldn't be big compared with the 160 seconds.
Or is there is a different aproach I should consider, even better if is a just sql solution?
@PeterRing I dont think function code is relevant but anyway here is.
CREATE OR REPLACE FUNCTION map.get_near_link(
x NUMERIC,
y NUMERIC,
azim NUMERIC)
RETURNS map.get_near_link AS
$BODY$
DECLARE
strPoint TEXT;
sRow map.get_near_link;
BEGIN
strPoint = 'POINT('|| X || ' ' || Y || ')';
RAISE DEBUG 'GetLink strPoint % -- Azim %', strPoint, Azim;
WITH index_query AS (
SELECT --Seg_ID,
Link_ID,
azimuth,
TRUNC(ST_Distance(ST_GeomFromText(strPoint,4326), geom )*100000)::INTEGER AS distance,
sentido,
--ST_AsText(geom),
geom
FROM map.vzla_seg S
WHERE
ABS(Azim - S.azimuth) < 30 OR
ABS(Azim - S.azimuth) > 330
ORDER BY
geom <-> ST_GeomFromText(strPoint, 4326)
LIMIT 101
)
SELECT i.Link_ID, i.Distance, i.Sentido, v.geom INTO sRow
FROM
index_query i INNER JOIN
map.vzla_rto v ON i.link_id = v.link_id
ORDER BY
distance LIMIT 1;
RAISE DEBUG 'GetLink distance % ', sRow.distance;
IF sRow.distance > 50 THEN
sRow.link_id = -1;
END IF;
RETURN sRow;
END;
$BODY$
LANGUAGE plpgsql IMMUTABLE
COST 100;
ALTER FUNCTION map.get_near_link(NUMERIC, NUMERIC, NUMERIC)
OWNER TO postgres;
Introduction to Parallel Execution. When Oracle runs SQL statements in parallel, multiple processes work together simultaneously to run a single SQL statement. By dividing the work necessary to run a statement among multiple processes, Oracle can run the statement more quickly than if only a single process ran it.
No, each query will require its own session. To execute in parallel, each query must be conducted in its own session.
SQL Server Degree of Parallelism is especially helpful to the Database Warehouse query engine, Telecommunication database, and partitioned table-based database. The Degree of Parallelism parameter should be configured with small computation and data usage analysis.
Consider marking your map.get_near_link
function as PARALLEL SAFE
. This will tell the database engine that it is allowed to try generate a parallel plan when executing the function:
PARALLEL UNSAFE indicates that the function can't be executed in parallel mode and the presence of such a function in an SQL statement forces a serial execution plan. This is the default. PARALLEL RESTRICTED indicates that the function can be executed in parallel mode, but the execution is restricted to parallel group leader. PARALLEL SAFE indicates that the function is safe to run in parallel mode without restriction.
There are several settings which can cause the query planner not to generate a parallel query plan under any circumstances. Consider this documentation:
15.4. Parallel Safety
15.2. When Can Parallel Query Be Used?
On my reading, you may be able to achieve a parallel plan if you refactor your function like this:
CREATE OR REPLACE FUNCTION map.get_near_link(
x NUMERIC,
y NUMERIC,
azim NUMERIC)
RETURNS TABLE
(Link_ID INTEGER, Distance INTEGER, Sendito TEXT, Geom GEOGRAPHY)
AS
$$
SELECT
S.Link_ID,
TRUNC(ST_Distance(ST_GeomFromText('POINT('|| X || ' ' || Y || ')',4326), S.geom) * 100000)::INTEGER AS distance,
S.sentido,
v.geom
FROM (
SELECT *
FROM map.vzla_seg
WHERE ABS(Azim - S.azimuth) NOT BETWEEN 30 AND 330
) S
INNER JOIN map.vzla_rto v
ON S.link_id = v.link_id
WHERE
ST_Distance(ST_GeomFromText('POINT('|| X || ' ' || Y || ')',4326), S.geom) * 100000 < 50
ORDER BY
S.geom <-> ST_GeomFromText('POINT('|| X || ' ' || Y || ')', 4326)
LIMIT 1
$$
LANGUAGE SQL
PARALLEL SAFE -- Include this parameter
;
If the query optimiser will generate a parallel plan when executing this function, you won't need to implement your own parallelisation logic.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With