Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is the difference between dplyr::copy_to and sparklyr::sdf_copy_to?

Tags:

r

dplyr

sparklyr

I am using the library sparklyr to interact with 'spark'. There are two functions for put a data frame in a spark context. Such functions are 'dplyr::copy_to' and 'sparklyr::sdf_copy_to'. What is the difference and when is recommended to use one instead of the other?

like image 825
Sergio Marrero Marrero Avatar asked Nov 16 '22 09:11

Sergio Marrero Marrero


1 Answers

They're the same. I would use copy_to rather than the specialist sdf_copy_to because it is more consistent with other data sources, but that's stylistic.

The function copy_to is a generic from dplyr and works with any data source which implements a dplyr backend.

You can use it with a spark connection because sparklyr implements copy_to.src_spark and copy_to.spark_connection. They are not exposed to the user since you're supposed to use copy_to and let it dispatch to the correct method.

copy_to.src_sparck just calls copy_to.spark_connection:

#> sparklyr:::copy_to.src_spark
function (dest, df, name, overwrite, ...) 
{
    copy_to(spark_connection(dest), df, name, ...)
}
<bytecode: 0x5646b227a9d0>
<environment: namespace:sparklyr>

copy_to.spark_connection just calls sdf_copy_to:

#> sparklyr:::copy_to.spark_connection
function (dest, df, name = spark_table_name(substitute(df)), 
    overwrite = FALSE, memory = TRUE, repartition = 0L, ...) 
{
    sdf_copy_to(dest, df, name, memory, repartition, overwrite, 
        ...)
}
<bytecode: 0x5646b21ef120>
<environment: namespace:sparklyr>

sdf_copy_to follows the package-wide convention of prefixing with "sdf_" the functions related to Spark DataFrames. On the other hand, copy_to is from dplyr and sparklyr provides compatible methods for the convenience of dplyr users.

like image 121
asachet Avatar answered May 16 '23 09:05

asachet