Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter two tables with crosstalk

I am creating a Flexdashboard in R. I want the dashboard to contains both a table and a series of visualizations, that would be filtered through inputs.

As I need to deliver a dashboard locally (without a server running in the background), I am unable to use Shiny, hence I rely on crosstalk.

I know that the crosstalk package provides limited functionality in the front-end. For instance, the documentation says that you can't aggregate the SharedData object.

Nonetheless, I am not clear if I can use the same inputs to filter two different dataframes.

For example, lets say I have:

  1. Dataframe One: Contains original data

    df1 <- structure(list(owner = structure(c(1L, 2L, 2L, 2L, 2L), .Label = c("John", 
    "Mark"), class = "factor"), hp = c(250, 120, 250, 100, 110), 
    car = structure(c(2L, 2L, 2L, 1L, 1L), .Label = c("benz", 
    "bmw"), class = "factor"), id = structure(1:5, .Label = c("car1", 
    "car2", "car3", "car4", "car5"), class = "factor")), .Names = c("owner", 
    "hp", "car", "id"), row.names = c(NA, -5L), class = "data.frame")
    
  2. Dataframe Two: Contains aggregated data

    df2 <- structure(list(car = structure(c(1L, 2L, 1L, 2L), .Label = c("benz", 
    + "bmw"), class = "factor"), owner = structure(c(1L, 1L, 2L, 2L
    + ), .Label = c("John", "Mark"), class = "factor"), freq = c(0L, 
    + 1L, 2L, 2L)), .Names = c("car", "owner", "freq"), row.names = c(NA, 
    + -4L), class = "data.frame")
    

These two dataframes contain columns with identical values - car and owner. As well as, additional columns too.

I could create two different objects:

library(crosstalk)
shared_df1 <- SharedData$new(df1)
shared_df2 <- SharedData$new(df2)

and than:

filter_select("owner", "Car owner:", shared_df1, ~ owner)
filter_select("owner", "Car owner:", shared_df2, ~ owner)

However, that would mean that the user will need to fill inputs that are essentially identical, twice. Also, if the table is large, this would double the size of the memory needed to use the dashboard.

Is it possible to work around this problem in crosstalk?

like image 938
Prometheus Avatar asked Feb 02 '18 11:02

Prometheus


2 Answers

Ah I recently ran into this too, there is another argument to SharedData$new(..., group = )! The group argument seems to do the trick. I found out by accident when I had two dataframes and used the group =.

If you make a sharedData object, it will include

  • a dataframe
  • a key to select rows by - preferably unique, but not necessarily.
  • a group name

What I think happens is that crosstalk filters the sharedData by the key - for all sharedData objects in the same group! So as long as two dataframes use the same key, you should be able to filter them together in one group.

This should work for your example.

---
title: "blabla"
output:
   flexdashboard::flex_dashboard:
   orientation: rows
   social: menu
   source_code: embed
   theme: cerulean
---

```{r}
library(plotly)
library(crosstalk)
library(tidyverse)
```

```{r Make dataset}
df1 <- structure(list(owner = structure(c(1L, 2L, 2L, 2L, 2L), .Label = c("John", "Mark"), class = "factor"), hp = c(250, 120, 250, 100, 110), car = structure(c(2L, 2L, 2L, 1L, 1L), .Label = c("benz", "bmw"), class = "factor"), id = structure(1:5, .Label = c("car1", "car2", "car3", "car4", "car5"), class = "factor")), .Names = c("owner", "hp", "car", "id"), row.names = c(NA, -5L), class = "data.frame")

df2 <- structure(list(car = structure(c(1L, 2L, 1L, 2L), .Label = c("benz", 
"bmw"), class = "factor"), owner = structure(c(1L, 1L, 2L, 2L
), .Label = c("John", "Mark"), class = "factor"), freq = c(0L, 
1L, 2L, 2L)), .Names = c("car", "owner", "freq"), row.names = c(NA, 
-4L), class = "data.frame")
```

#

##

### Filters

```{r}
library(crosstalk)
# Notice the 'group = ' argument - this does the trick!
shared_df1 <- SharedData$new(df1, ~owner, group = "Choose owner")
shared_df2 <- SharedData$new(df2, ~owner, group = "Choose owner")

filter_select("owner", "Car owner:", shared_df1, ~owner)
# You don't need this second filter now
# filter_select("owner", "Car owner:", shared_df2, ~ owner)
```

### Plot1 with plotly

```{r}
plot_ly(shared_df1, x = ~id, y = ~hp, color = ~owner) %>% add_markers() %>% highlight("plotly_click")
```

### Plots with plotly

```{r}
plot_ly(shared_df2, x = ~owner, y = ~freq, color = ~car) %>% group_by(owner) %>% add_bars()
```

##

### Dataframe 1

```{r}
DT::datatable(shared_df1)
```

### Dataframe 2

```{r}
DT::datatable(shared_df2)
```

I spent some time on this by trying to extract data from plot_ly() using plotly_data() without luck until I figured out the answer. That's why there's some very simple plots with plotly.

like image 186
Lodewic Van Twillert Avatar answered Nov 12 '22 20:11

Lodewic Van Twillert


Recently, I've also wanted to use one filter to filter 2 visualizations.

Brief description of my situation
I've wanted to use one filter to filter a boxplot and a table.
Source data has been a data frame. I've wanted to use some of variables for the boxplot and also calculate some statistics (like mean, standard deviation, mode, number of records).
Functions I've needed to use to display results: plotly::plot_ly(), DT::datatable(), crosstalk::bscols().

I've found out that there are 3 key information to solve this situation
Key 1) It's necessary to correctly create shared data.
In my case, I've had to use crosstalk::SharedData$new() twice.
Correct shared data, to be used as source for visualizations, can be used if firstly keys 2 and 3 are fulfilled.
Key 2) When creating shared data, use the same group argument as "Lodewic Van Twillert" explained on 16 Mar 2018.
Key 3) Ensure that all SharedData instances refer conceptually to the same data points, and share the same keys.
Start with ensuring that a data frame has row names even if row names are character vector with numbers (like "1", "2", ...).
Used literature for this key 3: https://rstudio.github.io/crosstalk/using.html. (I suggest to mainly read subtitle "Grouping".)

Summary of steps I've used to fulfill key information from above
Key 3) This one could be tricky in order to fulfill relevant conditions of key 3 above.
The approach I've chosen creates one table containing all data and this table (data frame) will be used to create both shared data.
I've applied data manipulations to original data frame (risk_scores_df) so now this data has a new column.
I've created a new data frame with statistics.
I've joined both data frames using risk_scores_df <- dplyr::left_join... so now the original data frame contains all prepared data.
I've run print(rownames(risk_scores_df)) to ensure that my updated data frame has row names.
Now, I've had one data frame containing all data (needed for both visualizations) that fulfill conditions of information of key 3 above.
Key 2) I've simply added group = "sd1" in both crosstalk::SharedData$new()
Key 1) This one could be also tricky if a wrong approach is chosen.
Here, the key to create correct shared data instances is to use that one table with all data and choose only rows and columns needed for a relevant shared data.
Example - in my case, I've run codes in Option 1 to create two shared data instances, but also Option 2 is possible.

Option 1 (choosing of only needed rows and columns is in crosstalk::SharedData$new())

  rs_df_sd1 <- crosstalk::SharedData$new(
    risk_scores_df[, c(1, 2, 5)],
    group = "sd1"
  )
  rs_df_sd1a <- crosstalk::SharedData$new(
    risk_scores_df[risk_scores_df$NumRecords > 0 &
                   is.na(risk_scores_df$NumRecords) == F,
                   c(1, 6:11)],
    group = "sd1"
  )

Option 2 (choosing of only needed rows and columns is in additional variables)

  sd1 <- risk_scores_df[, c(1, 2, 5)]
  sd1a <- risk_scores_df[risk_scores_df$NumRecords > 0 &
                         is.na(risk_scores_df$NumRecords) == F,
                         c(1, 6:11)]

  rs_df_sd1 <- crosstalk::SharedData$new(sd1, group = "sd1")
  rs_df_sd1a <- crosstalk::SharedData$new(sd1a, group = "sd1")

Completing the solution
At this point I've created shared data instances rs_df_sd1 and rs_df_sd1a that can be used as main sources for visualizations that will be filtered using crosstalk::bscols().
Brief example:

  box_n_jitter_chart1 <- plotly::plot_ly(rs_df_sd1) %>% add_trace(...
  DT_table1 <- DT::datatable(rs_df_sd1a)
  crosstalk::bscols(
    widths = c(6, 12, NA),
    crosstalk::filter_select(
      id = "idAvgRisk",
      label = "Account",
      sharedData = rs_df_sd1,
      group = ~Account,
      multiple = F
    ),
    box_n_jitter_chart1,
    DT_table1
  )

Note: DT::datatable() can also use rs_df_sd1a$data() and cells = list(values = base::rbind(... (see that cells = ... is used; see more about using cells e.g. at https://plotly.com/r/reference/table/) but because method data() is used (see more e.g. at https://rdrr.io/cran/crosstalk/man/SharedData.html#method-data) then it will not work with crosstalk::bscols.

like image 34
LearnUseZone Avatar answered Nov 12 '22 21:11

LearnUseZone