I have looked all over and I'm still unable to get those three dplyr functions to work within sparklyr. I have a reproducible example below. First, some session info:
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.4 (Maipo)
I am running dplyr 0.7.4, sparklyr 0.8.3, spark version 2.2.0
Here is the (desired) result of running dplyr code outside of sparklyr:
set.seed(999)
df <- data.frame(group = letters[rep(1:4, each = 2)],
class = letters[rep(1:4, by = 2)],
value = rnorm(8), stringsAsFactors = FALSE)
> df
group class value
1 a a -0.9677497
2 a b -1.1210094
3 b c 1.3254637
4 b d 0.1339774
5 c a 0.9387494
6 c b 0.1725381
7 d c 0.9576504
8 d d -1.3626862
df %>%
group_by(group) %>%
summarize(value = sum(value),
class = first(class))
# A tibble: 4 x 3
group value class
<chr> <dbl> <chr>
1 a -1.59 a
2 b 1.07 c
3 c -0.843 a
4 d -3.15 c
However, when I copy over that data.frame to spark, the result is not what I expect:
df <- sdf_copy_to(sc, df, "df", memory = FALSE, overwrite = TRUE)
df %>%
group_by(group) %>%
summarize(value = sum(value),
class = first(class))
# Source: lazy query [?? x 3]
# Database: spark_connection
group value class
<chr> <dbl> <chr>
1 d -3.15 `class`
2 c -0.843 `class`
3 b 1.07 `class`
4 a -1.59 `class`
I also tried to see if there was a namespace issue but that did not solve this problem:
df %>%
group_by(group) %>%
summarize(value = sum(value),
class = dplyr::first(class))
Error in x[[n]] : object of type 'builtin' is not subsettable
In my non-reproducible example I was also sometimes getting the following error depending on how I changed the code, but I haven't gotten it to show for this example.
Error in nth(x, -1L, order_by = order_by, default = default) :
object 'class' not found
Any help (including alternatives) would be greatly appreciated!
I had the same problem, this should work.
df %>%
group_by(group) %>%
summarize(value = sum(value),
class = first_value(class))
It works good with both character or numeric columns.
By the way, I'm using dplyr 0.8.0.1 and sparklyr 0.9.4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With