I'm trying to pull data from an Athena DB into R using RJDBC
as described in detail on AWS's own blog. Alas, the amount of data I'm trying to pull is substantial and so I'm getting the following error message:
Error in .jcall(rp, "I", "fetch", stride, block) :
java.sql.SQLException: The requested fetchSize is more than the allowed value in Athena. Please reduce the fetchSize and try again. Refer to the Athena documentation for valid fetchSize values.
The Athena documentation doesn't actually give any such fetchSize
values but I gather from this github issue that the value should be lower than 1000. I gather from the same github issue that there is no way to pass this fetchSize
to RJDBC. So are there other ways of querying Athena that respect this limit?
The basic problem is that dbGetQuery
doesn't allow one to specify the fetchSize
. As per the RJDBC
package author one workaround is to call the two functions that dbGetQuery
wraps separately and pass the fetchSize
to fetch()
:
q <- dbSendQuery(c, ...)
fetch(q, -1, block=999)
More generally:
setMethod("dbGetQuery", signature(conn="JDBCConnection", statement="character"), def=function(conn, statement, ...) {
r <- dbSendQuery(conn, statement, ...)
on.exit(.jcall(r@stat, "V", "close"))
if (conn@jc %instanceof% "com.amazonaws.athena.jdbc.AthenaConnection") fetch(r, -1, 999) # Athena can only pull 999 rows at a time
else fetch(r, -1)
})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With