I'm writing a large number of records to a postgres database, using psycopg2.extras.execute_values(cursor, query, data, page_size=100)
I get what the page_size
parameter does, but don't really know what would be a sensible value to set it to. (Above uses the default value of 100.) What are the downsides of simply setting this to something ridiculously large?
The psycopg2-binary package is meant for beginners to start playing with Python and PostgreSQL without the need to meet the build requirements.
Thread and process safetyThe Psycopg module and the connection objects are thread-safe: many threads can access the same database either using separate sessions and creating a connection per thread or using the same connection and creating separate cursors. In DB API 2.0 parlance, Psycopg is level 2 thread safe.
Project description. Psycopg is the most popular PostgreSQL database adapter for the Python programming language. Its main features are the complete implementation of the Python DB API 2.0 specification and the thread safety (several threads can share the same connection).
Another approach offered by the Psycopg2 library is execute_batch (). It reduces the number of server roundtrips, improving the performance in contrast to the executemany () function. The method achieves this, by joining the statements together until the page_size is reached (usually 8kB in Postgres). Let’s check the performance.
I’m sure everybody who worked with Python and a PostgreSQL database is familiar or definitely heard about the psycopg2 library. It is the most popular PostgreSQL database adapter for the Python programming language. In my work, I come in contact with this library every day and execute hundreds of automated statements.
The basic Psycopg usage is common to all the database adapters implementing the DB API 2.0 protocol. Here is an interactive session showing some of the basic commands: The main entry points of Psycopg are: The function connect () creates a new database session and returns a new connection instance.
You can also obtain a stand-alone package, not requiring a compiler or external libraries, by installing the psycopg2-binary package from PyPI: The binary package is a practical choice for development and testing but in production it is advised to use the package built from sources.
Based on my understanding, the page_size gives the size of input values per sql statement. Give larger number means longer sql statement, and hence more memory usage for the query. If you do not need the query to return any values, it would be safe to use a smaller value such as 100 by default.
However, if you would like to insert/update certain table with returning statement, you may like to increate page_size to at least the same length as your data. You may set it at length(data) (your data should be a list of lists or a list of tuples), and the downside is that you have to introduce some limit to the number of data values per call. Postgresql allows very long sql, so if you have enough memory, millions of records should be acceptable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With