Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Large ResultSet on postgresql query

I'm running a query against a table in a postgresql database. The database is on a remote machine. The table has around 30 sub-tables using postgresql partitioning capability.

The query will return a large result set, something around 1.8 million rows.

In my code I use spring jdbc support, method JdbcTemplate.query, but my RowCallbackHandler is not being called.

My best guess is that the postgresql jdbc driver (I use version 8.3-603.jdbc4) is accumulating the result in memory before calling my code. I thought the fetchSize configuration could control this, but I tried it and nothing changes. I did this as postgresql manual recomended.

This query worked fine when I used Oracle XE. But I'm trying to migrate to postgresql because of the partitioning feature, which is not available in Oracle XE.

My environment:

  • Postgresql 8.3
  • Windows Server 2008 Enterprise 64-bit
  • JRE 1.6 64-bit
  • Spring 2.5.6
  • Postgresql JDBC Driver 8.3-603
like image 918
tuler Avatar asked May 05 '09 21:05

tuler


People also ask

Can Postgres handle 100 million rows?

Aggregations vs. If you're simply filtering the data and data fits in memory, Postgres is capable of parsing roughly 5-10 million rows per second (assuming some reasonable row size of say 100 bytes). If you're aggregating then you're at about 1-2 million rows per second.

How big can a Postgres query be?

The maximum length of a query that PostgreSQL can process is 2147483648 characters (signed 4-byte integer; see src/include/lib/stringinfo. h ).

Can PostgreSQL handle big data?

PostgreSQL is well known as the most advanced opensource database, and it helps you to manage your data no matter how big, small or different the dataset is, so you can use it to manage or analyze your big data, and of course, there are several ways to make this possible, e.g Apache Spark.


1 Answers

In order to use a cursor to retrieve data you have to set the ResultSet type of ResultSet.TYPE_FORWARD_ONLY (the default) and autocommit to false in addition to setting a fetch size. That is referenced in the doc you linked to but you didn't explicitly mention that you did those steps.

Be careful with PostgreSQL's partitioning scheme. It really does very horrible things with the optimizer and can cause massive performance issues where there should not be (depending on specifics of your data). In any case, is your row only 1.8M rows? There is no reason that it would need to be partitioned based on size alone given that it is appropriately indexed.

like image 176
Trey Avatar answered Sep 19 '22 23:09

Trey