Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I split a query in multiple queries or create parallelism to speed a query?

I have a table avl_pool, and I have a function to find on the map the link nearest to that (x, y) position.

The performance of this select is very linear, the function require ~8 ms to execute. So calculate this select for 1000 rows require 8 seconds. Or as I show in this sample 20.000 rows require 162 seconds.

SELECT avl_id, x, y, azimuth, map.get_near_link(X, Y, AZIMUTH)
FROM avl_db.avl_pool         
WHERE avl_id between 1 AND 20000

"Index Scan using avl_pool_pkey on avl_pool  (cost=0.43..11524.76 rows=19143 width=28) (actual time=8.793..162805.384 rows=20000 loops=1)"
"  Index Cond: ((avl_id >= 1) AND (avl_id <= 20000))"
"  Buffers: shared hit=19879838"
"Planning time: 0.328 ms"
"Execution time: 162812.113 ms"

Using pgAdmin I found out if execute half of the range on separated windows at the same time, the execution time is actually split in half. So looks like the server can handle multiple requests to that same table/function without problem.

-- windows 1
SELECT avl_id, x, y, azimuth, map.get_near_link(X, Y, AZIMUTH)
FROM avl_db.avl_pool         
WHERE avl_id between 1 AND 10000 

Total query runtime: 83792 ms.

-- windows 2
SELECT avl_id, x, y, azimuth, map.get_near_link(X, Y, AZIMUTH)
FROM avl_db.avl_pool         
WHERE avl_id between 10001 AND 20000

Total query runtime: 84047 ms.

So how should I aproach this scenario to improve performance?.

From the C# aproach I guess I can create multiple threads and each one send a portion of the range and then I join all the data in the client. So instead one query with 20k and 162 seconds, I could send 10 querys with 2000 rows and finish in ~16 seconds. Of course maybe there is an overhead cost in the join, but shouldn't be big compared with the 160 seconds.

Or is there is a different aproach I should consider, even better if is a just sql solution?


@PeterRing I dont think function code is relevant but anyway here is.

CREATE OR REPLACE FUNCTION map.get_near_link(
    x NUMERIC,
    y NUMERIC,
    azim NUMERIC)
  RETURNS map.get_near_link AS
$BODY$
DECLARE
    strPoint TEXT;
    sRow map.get_near_link;
  BEGIN
    strPoint = 'POINT('|| X || ' ' || Y || ')';
    RAISE DEBUG 'GetLink strPoint % -- Azim %', strPoint, Azim;

    WITH index_query AS (
        SELECT --Seg_ID,
               Link_ID,
               azimuth,
               TRUNC(ST_Distance(ST_GeomFromText(strPoint,4326), geom  )*100000)::INTEGER AS distance,
               sentido,
               --ST_AsText(geom),
               geom
        FROM map.vzla_seg S
        WHERE
            ABS(Azim - S.azimuth) < 30 OR
            ABS(Azim - S.azimuth) > 330
        ORDER BY
            geom <-> ST_GeomFromText(strPoint, 4326)
        LIMIT 101
    )
    SELECT i.Link_ID, i.Distance, i.Sentido, v.geom INTO sRow
    FROM
        index_query i INNER JOIN
        map.vzla_rto v ON i.link_id = v.link_id
    ORDER BY
        distance LIMIT 1;

    RAISE DEBUG 'GetLink distance % ', sRow.distance;
    IF sRow.distance > 50 THEN
        sRow.link_id = -1;
    END IF;

    RETURN sRow;
  END;
$BODY$
  LANGUAGE plpgsql IMMUTABLE
  COST 100;
ALTER FUNCTION map.get_near_link(NUMERIC, NUMERIC, NUMERIC)
  OWNER TO postgres;
like image 539
Juan Carlos Oropeza Avatar asked Jan 26 '17 14:01

Juan Carlos Oropeza


People also ask

What is the use of parallel in SQL query?

Introduction to Parallel Execution. When Oracle runs SQL statements in parallel, multiple processes work together simultaneously to run a single SQL statement. By dividing the work necessary to run a statement among multiple processes, Oracle can run the statement more quickly than if only a single process ran it.

Can we execute queries parallel from different session?

No, each query will require its own session. To execute in parallel, each query must be conducted in its own session.

Is parallelism good in SQL Server?

SQL Server Degree of Parallelism is especially helpful to the Database Warehouse query engine, Telecommunication database, and partitioned table-based database. The Degree of Parallelism parameter should be configured with small computation and data usage analysis.


1 Answers

Consider marking your map.get_near_link function as PARALLEL SAFE. This will tell the database engine that it is allowed to try generate a parallel plan when executing the function:

PARALLEL UNSAFE indicates that the function can't be executed in parallel mode and the presence of such a function in an SQL statement forces a serial execution plan. This is the default. PARALLEL RESTRICTED indicates that the function can be executed in parallel mode, but the execution is restricted to parallel group leader. PARALLEL SAFE indicates that the function is safe to run in parallel mode without restriction.

There are several settings which can cause the query planner not to generate a parallel query plan under any circumstances. Consider this documentation:

  • 15.4. Parallel Safety

  • 15.2. When Can Parallel Query Be Used?

On my reading, you may be able to achieve a parallel plan if you refactor your function like this:

CREATE OR REPLACE FUNCTION map.get_near_link(
    x NUMERIC,
    y NUMERIC,
    azim NUMERIC)
RETURNS TABLE
(Link_ID INTEGER, Distance INTEGER, Sendito TEXT, Geom GEOGRAPHY)
AS
$$
        SELECT 
               S.Link_ID,
               TRUNC(ST_Distance(ST_GeomFromText('POINT('|| X || ' ' || Y || ')',4326), S.geom) * 100000)::INTEGER AS distance,
               S.sentido,
               v.geom
        FROM (
          SELECT *
          FROM map.vzla_seg
          WHERE ABS(Azim - S.azimuth) NOT BETWEEN 30 AND 330
        ) S
          INNER JOIN map.vzla_rto v
            ON S.link_id = v.link_id
        WHERE
            ST_Distance(ST_GeomFromText('POINT('|| X || ' ' || Y || ')',4326), S.geom) * 100000 < 50
        ORDER BY
            S.geom <-> ST_GeomFromText('POINT('|| X || ' ' || Y || ')', 4326)
        LIMIT 1
$$
LANGUAGE SQL
PARALLEL SAFE -- Include this parameter
;

If the query optimiser will generate a parallel plan when executing this function, you won't need to implement your own parallelisation logic.

like image 150
Serge Avatar answered Oct 05 '22 01:10

Serge