How to track query progress in PostgreSQL?

Tags:

postgresql

Is there a plugin or a script that can track the progress of long query in PostgreSQL?

I mean I need to set progress bar value in Java that related to some update query in Postgres. I search over internet, but I just found some paper that not have any official implementation in any RDBMS system.

922

asked Oct 06 '14 08:10

Mohammad Fajar

3 Answers

I found a good answer here: Tracking progress of an update statement

The trick is to first create a sequence (name it as you like):

CREATE SEQUENCE query_progress START 1;

Then append to your query's WHERE part:

AND NEXTVAL('query_progress')!=0

Now you can query the progress:

SELECT NEXTVAL('query_progress');

Finally don't forget to get rid of the sequence:

DROP SEQUENCE query_progress;

Note that this will most likely make your query run even slower and every time you check progress it will additionally increment the value. The above link suggested creating a temporary sequence but PostgreSQL doesn't seem to make them visible across sessions.

answered Sep 21 '22 11:09

jjrv

I have figured a way that might help. But further processing may be needed if you would like to implement it into your code like Java and etc.

The way is to examine the page content in order to track the progress.

Postgresql has a extension called pageinspect that can examine the page information of a particular table.

Details here : https://www.postgresql.org/docs/current/pageinspect.html

Also spend some time on understanding postgresql's page layout here

https://www.postgresql.org/docs/current/storage-page-layout.html

Look at xmin, xmax and ctid in particular

I am assuming the table the row insertion is following certain order. Like the table's pkey. And any long update will likely have new page appended.

I am also assuming that the primary key id is mostly continuous, with little to some gap. Since it is just an estimation, I think it is OK with this condition.

You cannot find out the total page number by doing SELECT relname, relpages FROM pg_class though, since it is not updated.

You will hit with an exception if page index is not existed in the strage ( but you will find the page, even if it is not updated in pg_class or so) , so make a little "binary search" on the "page_index" to find the largest page you have. Don't need to be exact.

Use

SELECT backend_xid FROM pg_stat_activity WHERE pid = process-id

To find your current transcation id.

Use

SELECT lp,t_xmin,t_xmax,t_ctid,t_bits,t_data FROM heap_page_items(get_raw_page('relation_name', page_index));

In the sample I am working on it may looks like this

SELECT lp,t_xmin,t_xmax,t_ctid,t_bits,t_data FROM heap_page_items(get_raw_page('foo', 3407000));

lp | t_xmin | t_xmax | t_ctid | t_bits | t_data

1 | 592744 | 592744 | (3407000,1) | 110000000111000000000000 | \xd1100000000000000e4400000000000054010000611b0000631b0000

2 | 592744 | 592744 | (3407000,2) | 110000000111000000000000 | \xd110000000000000104400000000000040010000611b0000631b0000

3 | 592744 | 592744 | (3407000,3) | 110000000111000000000000 | \xd11000000000000011440000000000007c010000611b0000631b0000

t_data is the data. lp is the tuple index from the item list. t_xmin and t_xmax is the transcation id. And the t_ctid is the point to the tuple within the tuple itself. t_bits is the NULL bitmap if you have null value in your tuple.

First check to see if t_min = t_max, and t_ctid (page_index, tuple_id) and lp is the same. If so, check if the t_xmin is the same as your transcation id. If so check data.

Be aware of Endian-ness and NULL bitmap. In my case, it is big-endian (LSB first).

In my example, the first row is valid. And the first BIGINT (8 bytes 16 hex number) is the sorted id I am looking. So on first row the data is

\xd110000000000000

Which translate to 0x101d (check endian-ness) --> 4305

And I know my largest id is 18209 and smallest_id is 2857. And I seperate the job into 8 parts so

(18209 - 2857) / 8 = 1919

And this is the first part I ran. so

2857 + 1919 = 4776

This means that my sub-job starts at 2857 id and currently at 4305. If it hits 4776, this thread is done!

This is

(4305 - 2857)/ 1919 = 75.5% Done

Limitations

This will not work with hash value update. In my case, the id happen to order sequentially as the pkey. And the planner trigger a sequential read. This should also work if the planner is doing some sort of btree index scan for update.

Look into CLUSTER if you are interested in ordering the physical rows in index order.

Again this method is not exact. And with the assumption highlighted above. If used in a program, should use sparsely to prevent extra overhead for the disk I/O

answered Sep 24 '22 11:09

Yunting Zhao

Not sure if this is an exact answer to what people are looking for, but I have made a simple function that reports back the current state of a table insert by means of measuring its page size over time. This isn't a direct window into what is happening, but it is a good approximate of what/whether anything is happening. It's also a solid measure of the bottom line (how fast a table is being "filled up").

The function returns a list of table names with the current size (in bytes and human-readable units) and rate of growth for both the table and all its associated indexes.

** bonus: it also includes temp file activity as well

I use this especially to see the progress of loading a table as well as how fast it is being loaded, which is good for estimating how long it will take (though increasingly less linear for large loads).

Here is a portable function:

CREATE OR REPLACE FUNCTION table_build_monitor(
    IN table_or_schema_list TEXT[] DEFAULT NULL
,   IN sample_period INT DEFAULT 10
)
RETURNS TABLE (
    table_name TEXT
,   table_size TEXT
,   index_size TEXT
)
AS
$$
DECLARE
    table_list TEXT[];
    schema_list TEXT[];
BEGIN

DROP TABLE IF EXISTS table_sizes_loop;
CREATE TEMP TABLE table_sizes_loop (
    table_name_loop TEXT
,   table_size_bytes BIGINT
,   indexes_size_bytes BIGINT
)
;

select
    array_remove(array_agg(case when split_part(poo, '.',2) = '*' then split_part(poo, '.',1) else NULL end), NULL::TEXT)
,   array_remove(array_agg(case when split_part(poo, '.',2) = '*' then NULL else poo end), NULL::TEXT)
FROM unnest(array[table_or_schema_list]) poo
INTO schema_list, table_list
;

INSERT INTO table_sizes_loop

SELECT
    pg_tables.schemaname||'.'|| pg_tables.tablename as table_name
,   pg_relation_size(pg_tables.schemaname||'.'|| pg_tables.tablename) AS table_size_bytes
,   pg_indexes_size(pg_tables.schemaname||'.'|| pg_tables.tablename) AS indexes_size_bytes
FROM pg_tables
WHERE
    pg_tables.schemaname = ANY(schema_list)
OR  (pg_tables.schemaname||'.'|| pg_tables.tablename)::text = ANY(table_list)

UNION

SELECT
    'temp_files'
,   temp_bytes
,   NULL
FROM pg_stat_database
WHERE
    datname = current_database()
;

PERFORM pg_sleep(sample_period);

RETURN QUERY

with
    base AS
(
SELECT
    pg_tables.schemaname||'.'|| pg_tables.tablename as table_name_loop
,   pg_relation_size(pg_tables.schemaname||'.'|| pg_tables.tablename) AS table_size_bytes
,   pg_indexes_size(pg_tables.schemaname||'.'|| pg_tables.tablename) AS indexes_size_bytes

FROM pg_tables
WHERE
    pg_tables.schemaname::text = ANY(schema_list)
OR  (pg_tables.schemaname||'.'|| pg_tables.tablename)::text = ANY(table_list)

UNION

SELECT
    'temp_files'
,   temp_bytes
,   NULL
FROM pg_stat_database
WHERE
    datname = current_database()

)
SELECT
    table_name_loop
,   CASE WHEN table_name_loop = 'temp_files' THEN
        pg_size_pretty((base.table_size_bytes - tsl.table_size_bytes)/sample_period) || '/s'
    ELSE
            base.table_size_bytes
        || ' (' || pg_size_pretty((base.table_size_bytes))
        || ') - ' || pg_size_pretty((base.table_size_bytes - tsl.table_size_bytes)/sample_period) || '/s'
    END as table_size
,       base.table_size_bytes
    || ' (' || pg_size_pretty((base.indexes_size_bytes))
    || ') - ' || pg_size_pretty((base.indexes_size_bytes - tsl.indexes_size_bytes)/sample_period) || '/s'
    as table_size
FROM table_sizes_loop tsl
JOIN base USING (table_name_loop)
ORDER BY base.table_size_bytes DESC
;

END
$$
LANGUAGE plpgsql
;

To view it, use a select statement like the following, passing a list of schema-qualified tables or something like "schema.*" for the whole schema - and optionally the sample period (default is 10s).

select * from table_build_monitor('{public.*}', 3);

answered Sep 24 '22 11:09

Alexi Theodore

Related questions
                            
                                Postgres Hstore vs. Redis - performance wise
                            
                                PostgreSQL using UUID vs Text as primary key
                            
                                Manage Connection Pooling in multi-tenant web app with Spring, Hibernate and C3P0
                            
                                possible to filter the queryset after querying? django
                            
                                What's the difference between pg_table_size, pg_relation_size & pg_total_relation_size? (PostgreSQL)
                            
                                Oracle equivalent of Postgres' DISTINCT ON?
                            
                                copy data from csv to postgresql using python
                            
                                No function matches the given name and argument types
                            
                                how to make array_agg() work like group_concat() from mySQL
                            
                                PostgreSQL column type conversion from bigint to bigserial
                            
                                Is there something like a zip() function in PostgreSQL that combines two arrays?
                            
                                App to monitor PostgreSQL queries in real time?
                            
                                Search text of all functions in PGAdmin
                            
                                ERROR: column of relation does not exist PostgreSQL ,Unable to run insert query
                            
                                Return Boolean Value as TRUE or FALSE in Select (PostgreSQL/pgAdmin)
                            
                                Is there a maximum length when storing into PostgreSQL TEXT
                            
                                Rails 5 db migration: how to fix ActiveRecord::ConcurrentMigrationError
                            
                                How to write a constraint concerning a max number of rows in postgresql?
                            
                                Specifying distinct sequence per table in Hibernate on subclasses
                            
                                how to group by and return sum row in Postgres

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With