Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

psycopg2: What page_size to use

I'm writing a large number of records to a postgres database, using psycopg2.extras.execute_values(cursor, query, data, page_size=100)

I get what the page_size parameter does, but don't really know what would be a sensible value to set it to. (Above uses the default value of 100.) What are the downsides of simply setting this to something ridiculously large?

like image 282
acdr Avatar asked Sep 06 '18 12:09

acdr


People also ask

Does psycopg2 need PostgreSQL?

The psycopg2-binary package is meant for beginners to start playing with Python and PostgreSQL without the need to meet the build requirements.

Is psycopg2 connection thread safe?

Thread and process safetyThe Psycopg module and the connection objects are thread-safe: many threads can access the same database either using separate sessions and creating a connection per thread or using the same connection and creating separate cursors. In DB API 2.0 parlance, Psycopg is level 2 thread safe.

What is python3 psycopg2?

Project description. Psycopg is the most popular PostgreSQL database adapter for the Python programming language. Its main features are the complete implementation of the Python DB API 2.0 specification and the thread safety (several threads can share the same connection).

How to improve the performance of psycopg2 batch?

Another approach offered by the Psycopg2 library is execute_batch (). It reduces the number of server roundtrips, improving the performance in contrast to the executemany () function. The method achieves this, by joining the statements together until the page_size is reached (usually 8kB in Postgres). Let’s check the performance.

What is psycopg2 library for PostgreSQL?

I’m sure everybody who worked with Python and a PostgreSQL database is familiar or definitely heard about the psycopg2 library. It is the most popular PostgreSQL database adapter for the Python programming language. In my work, I come in contact with this library every day and execute hundreds of automated statements.

What are the basic psycopg commands?

The basic Psycopg usage is common to all the database adapters implementing the DB API 2.0 protocol. Here is an interactive session showing some of the basic commands: The main entry points of Psycopg are: The function connect () creates a new database session and returns a new connection instance.

How do I get psycopg2?

You can also obtain a stand-alone package, not requiring a compiler or external libraries, by installing the psycopg2-binary package from PyPI: The binary package is a practical choice for development and testing but in production it is advised to use the package built from sources.


1 Answers

Based on my understanding, the page_size gives the size of input values per sql statement. Give larger number means longer sql statement, and hence more memory usage for the query. If you do not need the query to return any values, it would be safe to use a smaller value such as 100 by default.

However, if you would like to insert/update certain table with returning statement, you may like to increate page_size to at least the same length as your data. You may set it at length(data) (your data should be a list of lists or a list of tuples), and the downside is that you have to introduce some limit to the number of data values per call. Postgresql allows very long sql, so if you have enough memory, millions of records should be acceptable.

like image 92
Zhengzhi Avatar answered Oct 28 '22 15:10

Zhengzhi