Can I split a query in multiple queries or create parallelism to speed a query?

Tags:

I have a table avl_pool, and I have a function to find on the map the link nearest to that (x, y) position.

The performance of this select is very linear, the function require ~8 ms to execute. So calculate this select for 1000 rows require 8 seconds. Or as I show in this sample 20.000 rows require 162 seconds.

SELECT avl_id, x, y, azimuth, map.get_near_link(X, Y, AZIMUTH)
FROM avl_db.avl_pool         
WHERE avl_id between 1 AND 20000

"Index Scan using avl_pool_pkey on avl_pool  (cost=0.43..11524.76 rows=19143 width=28) (actual time=8.793..162805.384 rows=20000 loops=1)"
"  Index Cond: ((avl_id >= 1) AND (avl_id <= 20000))"
"  Buffers: shared hit=19879838"
"Planning time: 0.328 ms"
"Execution time: 162812.113 ms"

Using pgAdmin I found out if execute half of the range on separated windows at the same time, the execution time is actually split in half. So looks like the server can handle multiple requests to that same table/function without problem.

-- windows 1
SELECT avl_id, x, y, azimuth, map.get_near_link(X, Y, AZIMUTH)
FROM avl_db.avl_pool         
WHERE avl_id between 1 AND 10000 

Total query runtime: 83792 ms.

-- windows 2
SELECT avl_id, x, y, azimuth, map.get_near_link(X, Y, AZIMUTH)
FROM avl_db.avl_pool         
WHERE avl_id between 10001 AND 20000

Total query runtime: 84047 ms.

So how should I aproach this scenario to improve performance?.

From the C# aproach I guess I can create multiple threads and each one send a portion of the range and then I join all the data in the client. So instead one query with 20k and 162 seconds, I could send 10 querys with 2000 rows and finish in ~16 seconds. Of course maybe there is an overhead cost in the join, but shouldn't be big compared with the 160 seconds.

Or is there is a different aproach I should consider, even better if is a just sql solution?

@PeterRing I dont think function code is relevant but anyway here is.

CREATE OR REPLACE FUNCTION map.get_near_link(
    x NUMERIC,
    y NUMERIC,
    azim NUMERIC)
  RETURNS map.get_near_link AS
$BODY$
DECLARE
    strPoint TEXT;
    sRow map.get_near_link;
  BEGIN
    strPoint = 'POINT('|| X || ' ' || Y || ')';
    RAISE DEBUG 'GetLink strPoint % -- Azim %', strPoint, Azim;

    WITH index_query AS (
        SELECT --Seg_ID,
               Link_ID,
               azimuth,
               TRUNC(ST_Distance(ST_GeomFromText(strPoint,4326), geom  )*100000)::INTEGER AS distance,
               sentido,
               --ST_AsText(geom),
               geom
        FROM map.vzla_seg S
        WHERE
            ABS(Azim - S.azimuth) < 30 OR
            ABS(Azim - S.azimuth) > 330
        ORDER BY
            geom <-> ST_GeomFromText(strPoint, 4326)
        LIMIT 101
    )
    SELECT i.Link_ID, i.Distance, i.Sentido, v.geom INTO sRow
    FROM
        index_query i INNER JOIN
        map.vzla_rto v ON i.link_id = v.link_id
    ORDER BY
        distance LIMIT 1;

    RAISE DEBUG 'GetLink distance % ', sRow.distance;
    IF sRow.distance > 50 THEN
        sRow.link_id = -1;
    END IF;

    RETURN sRow;
  END;
$BODY$
  LANGUAGE plpgsql IMMUTABLE
  COST 100;
ALTER FUNCTION map.get_near_link(NUMERIC, NUMERIC, NUMERIC)
  OWNER TO postgres;

539

asked Jan 26 '17 14:01

Juan Carlos Oropeza

1 Answers

Consider marking your map.get_near_link function as PARALLEL SAFE. This will tell the database engine that it is allowed to try generate a parallel plan when executing the function:

PARALLEL UNSAFE indicates that the function can't be executed in parallel mode and the presence of such a function in an SQL statement forces a serial execution plan. This is the default. PARALLEL RESTRICTED indicates that the function can be executed in parallel mode, but the execution is restricted to parallel group leader. PARALLEL SAFE indicates that the function is safe to run in parallel mode without restriction.

There are several settings which can cause the query planner not to generate a parallel query plan under any circumstances. Consider this documentation:

15.4. Parallel Safety
15.2. When Can Parallel Query Be Used?

On my reading, you may be able to achieve a parallel plan if you refactor your function like this:

CREATE OR REPLACE FUNCTION map.get_near_link(
    x NUMERIC,
    y NUMERIC,
    azim NUMERIC)
RETURNS TABLE
(Link_ID INTEGER, Distance INTEGER, Sendito TEXT, Geom GEOGRAPHY)
AS
$$
        SELECT 
               S.Link_ID,
               TRUNC(ST_Distance(ST_GeomFromText('POINT('|| X || ' ' || Y || ')',4326), S.geom) * 100000)::INTEGER AS distance,
               S.sentido,
               v.geom
        FROM (
          SELECT *
          FROM map.vzla_seg
          WHERE ABS(Azim - S.azimuth) NOT BETWEEN 30 AND 330
        ) S
          INNER JOIN map.vzla_rto v
            ON S.link_id = v.link_id
        WHERE
            ST_Distance(ST_GeomFromText('POINT('|| X || ' ' || Y || ')',4326), S.geom) * 100000 < 50
        ORDER BY
            S.geom <-> ST_GeomFromText('POINT('|| X || ' ' || Y || ')', 4326)
        LIMIT 1
$$
LANGUAGE SQL
PARALLEL SAFE -- Include this parameter
;

If the query optimiser will generate a parallel plan when executing this function, you won't need to implement your own parallelisation logic.

150

answered Oct 05 '22 01:10

Serge

Related questions
                            
                                How to Make Microsoft.VisualStudio.Diagnostics.UI.Controls.MultiSelectComboBox Work
                            
                                Why is the date format different for the same culture on different computers or OS?
                            
                                Mock OData Client's Container using Moq
                            
                                NuGet package with a dependency on Visual C++ 2013 Runtime
                            
                                Directly signing an Office Word document using XML
                            
                                Lambda expressions order by and take issue
                            
                                How to add System.Drawing in xamarin? [closed]
                            
                                Compare two lists of colors
                            
                                Is a static function equivalent to a static Func member in C#?
                            
                                Separating application level logging and framework level logging in ASP.NET Core
                            
                                Wrap CNTK Applications
                            
                                Service reference not loading: Schema with target namespace could not be found
                            
                                Member access call does not compile but static call does
                            
                                C# await Task.Delay(1000); only takes 640ms to return
                            
                                Use multiple tasks to retrieve all records from a large collection
                            
                                VsTsc task unsupported by toolsversion
                            
                                Get the connected server of a PrincipalContext for global catalog
                            
                                How to rewrite AST dynamically in resharper plugin?
                            
                                C # EnvDTE Moving projects from the Solution to SolutionFolder
                            
                                web request in asp.net core

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can I split a query in multiple queries or create parallelism to speed a query?

Tags:

performance

c#

sql

multithreading

postgresql

Juan Carlos Oropeza

People also ask

1 Answers

Serge

Recent Activity

Donate For Us