Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between Statement.setMaxRows vs Statement.setFetchsize in Hive

I am running queries against Hive. The same queries are supposed to work with other JDBC drivers, meaning, other relational databases.

I can't use the method Statement.setFetchSize, because it is not supported in the Hive JDBC 0.13.0.

I am trying to work this around, therefore, I came to another similar method: Statement.setMaxRows

In which cases should I use Statement.setMaxRows vs Statement.setFetchsize?

Is it possible to use them interchangeably?

Thanks.

like image 442
Filipe Miranda Avatar asked Sep 04 '15 13:09

Filipe Miranda


People also ask

What is the use of setFetchSize () and setMaxRows () methods in statement?

The setFetchSize(int) method defines the number of rows that will be read from the database when the ResultSet needs more rows. setFetchSize(int) affects how the database returns the ResultSet data. Whereas, setMaxRows(int) method of the ResultSet specifies how many rows a ResultSet can contain at a time.

What does JDBC setMaxRows method do?

According to the JDBC specifications, the Statement. setMaxRows(int maxRows) method is supposed to: Sets the limit for the maximum number of rows that any ResultSet object generated by this Statement object can contain to the given number. If the limit is exceeded, the excess rows are silently dropped.

What is ResultSet fetch size?

The result set fetch size, either set explicitly, or by default equal to the statement fetch size that was passed to it, determines the number of rows that are retrieved in any subsequent trips to the database for that result set.

What is Fetchsize?

fetch-size to specify the number of rows to be fetched from the database when additional rows are needed.


1 Answers

No, you can't use them interchangeably. They do different things. The setMaxRows = number of rows that can be returned overall. setFetchSize = number that will be returned in each database roundtrip i.e.

setFetchSize Gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed for ResultSet objects genrated by this Statement.

setMaxRows Sets the limit for the maximum number of rows that any ResultSet object generated by this Statement object can contain to the given number.

In fact since setFetchSize is a hint the driver is free to ignore this and do what it sees fit. So don't worry about Hive JDBC not supporting this.

Note that all that setMaxRows is doing is

reducing the size of the ResultSet object. It won't affect the speed of the query. setMaxRows doesn't change the actual SQL - using top/limit/rownum e.g. - so it doesn't change the work the DB does. The query will return more results than your limit if there are more results to return, then truncate them to fit your ResultSet.

This answer does a good job of explaining how setFetchSize is important:

very important to performance and memory-management within the JVM as it controls the number of network calls from the JVM to the database and correspondingly the amount of RAM used for ResultSet processing.


Btw, setFetchSize can be set on java.sql.Statement as well as java.sql.ResultSet. The default value is set by the Statement object that created the result set. The fetch size may be changed at any time. And Hive JDBC has it's own HiveQueryResultSet with a setFetchSize method.
like image 55
Laurentiu L. Avatar answered Nov 01 '22 21:11

Laurentiu L.