I am trying to find out what additional arguments can be passed to dplyr::collect
in the ellipsis ...
. I want to do this because I believe that the behaviour of collect
has changed between dplyr
version 0.4.3
and 0.5
. It seems that in the new version collect()
only downloads the first 100k rows, unless a new n = Inf
argument is passed.
I have retrieved the methods associated with collect
using:
> methods('collect')
[1] collect.data.frame* collect.tbl_sql*
see '?methods' for accessing help and source code
I have looked at the help file for S3 methods
but cannot work out how to get help on collect.tbl_sql
, as ?"dplyr::collect.tbl_sql"
does not work.
As noted by Chrisss and Zheyuan Li:
methods
indicates that each of these methods are not exported from the dplyr
namespace.?dplyr:::collect.tbl_sql
In 0.4.3
by examining tbl-sqr.r
file in the source code:
collect.tbl_sql <- function(x, ...) {
grouped_df(x$query$fetch(), groups(x))
}
and in 0.5
:
> dplyr:::collect.tbl_sql
function (x, ..., n = 1e+05, warn_incomplete = TRUE)
{
assert_that(length(n) == 1, n > 0L)
if (n == Inf) {
n <- -1
}
sql <- sql_render(x)
res <- dbSendQuery(x$src$con, sql)
on.exit(dbClearResult(res))
out <- dbFetch(res, n)
if (warn_incomplete) {
res_warn_incomplete(res, "n = Inf")
}
grouped_df(out, groups(x))
}
Thus, we can conclude that the behaviour of collect
has indeed changed in the manner originally described in my question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With