Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"The requested fetchSize is more than the allowed value in Athena" with JDBC driver

I'm trying to pull data from an Athena DB into R using RJDBC as described in detail on AWS's own blog. Alas, the amount of data I'm trying to pull is substantial and so I'm getting the following error message:

Error in .jcall(rp, "I", "fetch", stride, block) : 
  java.sql.SQLException: The requested fetchSize is more than the allowed value in Athena. Please reduce the fetchSize and try again. Refer to the Athena documentation for valid fetchSize values.

The Athena documentation doesn't actually give any such fetchSize values but I gather from this github issue that the value should be lower than 1000. I gather from the same github issue that there is no way to pass this fetchSize to RJDBC. So are there other ways of querying Athena that respect this limit?

like image 587
RoyalTS Avatar asked Jan 03 '23 22:01

RoyalTS


1 Answers

The basic problem is that dbGetQuery doesn't allow one to specify the fetchSize. As per the RJDBC package author one workaround is to call the two functions that dbGetQuery wraps separately and pass the fetchSize to fetch():

q <- dbSendQuery(c, ...)
fetch(q, -1, block=999)

More generally:

setMethod("dbGetQuery", signature(conn="JDBCConnection", statement="character"),  def=function(conn, statement, ...) {
  r <- dbSendQuery(conn, statement, ...)
  on.exit(.jcall(r@stat, "V", "close"))
  if (conn@jc %instanceof% "com.amazonaws.athena.jdbc.AthenaConnection") fetch(r, -1, 999) # Athena can only pull 999 rows at a time
  else fetch(r, -1)
})
like image 115
RoyalTS Avatar answered Jan 14 '23 11:01

RoyalTS