I have been looking for a way of having a stacked bar plot in an upsetR graph. I downloaded the movies data set (from here) and added a column having only two values "M" and "C". Below, information on how I loaded the data and added the "x" column.
Edit:
m <- read.csv(system.file("extdata", "movies.csv", package = "UpSetR"),
header = T, sep = ";")
nrow(m)
[1] 3883
x<-c(rep("M", 3000), rep("C", 883))
m<-cbind(m, x)
unique(m$x)
[1] M C
This is the structure of the data frame:
str(m)
'data.frame': 3883 obs. of 22 variables:
$ Name : Factor w/ 3883 levels "$1,000,000 Duck (1971)",..: 3577 1858 1483 3718 1175 1559 3010 3548 3363 1420 ...
$ ReleaseDate: int 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 ...
$ Action : int 0 0 0 0 0 1 0 0 1 1 ...
$ Adventure : int 0 1 0 0 0 0 0 1 0 1 ...
$ Children : int 1 1 0 0 0 0 0 1 0 0 ...
$ Comedy : int 1 0 1 1 1 0 1 0 0 0 ...
$ Crime : int 0 0 0 0 0 1 0 0 0 0 ...
$ Documentary: int 0 0 0 0 0 0 0 0 0 0 ...
$ Drama : int 0 0 0 1 0 0 0 0 0 0 ...
$ Fantasy : int 0 1 0 0 0 0 0 0 0 0 ...
$ Noir : int 0 0 0 0 0 0 0 0 0 0 ...
$ Horror : int 0 0 0 0 0 0 0 0 0 0 ...
$ Musical : int 0 0 0 0 0 0 0 0 0 0 ...
$ Mystery : int 0 0 0 0 0 0 0 0 0 0 ...
$ Romance : int 0 0 1 0 0 0 1 0 0 0 ...
$ SciFi : int 0 0 0 0 0 0 0 0 0 0 ...
$ Thriller : int 0 0 0 0 0 1 0 0 0 1 ...
$ War : int 0 0 0 0 0 0 0 0 0 0 ...
$ Western : int 0 0 0 0 0 0 0 0 0 0 ...
$ AvgRating : num 4.15 3.2 3.02 2.73 3.01 3.88 3.41 3.01 2.66 3.54 ...
$ Watches : int 2077 701 478 170 296 940 458 68 102 888 ...
$ x : Factor w/ 2 levels "M","C": 1 1 1 1 1 1 1 1 1 1 ...
Now I tried to implement the stacked bar plot as follow:
upset(m,
queries = list(
list(query = elements,
params = list("x", "M"), color = "#e69f00", active = T),
list(query = elements,
params = list("x", "C"), color = "#cc79a7", active = T)))
The result looks like this:
As you can see the proportions are wrong as there should be in each bar only two colors (factor) either "M" or "C". This issue seems to be not a trivial one, as also pointed out here. Does anyone have an idea on how to implement this in UpsetR? Thanks a lot
The creation of stacked bar plot using ggplot2 can be done with the help of position="stack" argument inside geom_bar function. If we want to create the stacked bar plot then geom_text function will be used with the same position argument and the aes to define the labels as shown in the below example.
A stacked chart is a form of bar chart that shows the composition and comparison of a few variables, either relative or absolute, over time. Also called a stacked bar or column chart, they look like a series of columns or bars that are stacked on top of each other.
We can draw a stacked bar graphs in python using matplotlib library python. We can create some dummy data and plot the same chart. For this dummy data creation, we can either use NumPy array or we can provide in Pandas data frame. Alternatively, we can use the seaborn library as well to achieve the same.
Here is a way to create an upset plot with stacked barplot, but using my ComplexUpset rather than UpSetR:
library(ComplexUpset)
movies = as.data.frame(ggplot2movies::movies)
genres = colnames(movies)[18:24]
# for simplicity of examples, only use the complete data points
movies[movies$mpaa == '', 'mpaa'] = NA
movies = na.omit(movies)
upset(
movies,
genres,
base_annotations=list(
'Intersection size'=intersection_size(
counts=FALSE,
mapping=aes(fill=mpaa)
)
),
width_ratio=0.1
)
Please see more examples in the documentation. The Installation instructions are available on GitHub: krassowski/complex-upset (there is also a comparison to UpSetR and other packages).
I had a similar problem and found this workaround:
library("UpSetR")
m <- read.csv(system.file("extdata", "movies.csv", package = "UpSetR"),
header = T, sep = ";")
x<-c(rep("M", 2000), rep("Q", 1000), rep("C", 883))
m<-cbind(m, x)
upset(m,
queries = list(
list(query = elements,
params = list("x", c("M","Q", "C")), color = "#e69f00", active = T),
list(query = elements,
params = list("x", c("Q","C")), color = "#cc79a7", active = T),
list(query = elements,
params = list("x", "C"), color = grey(0.7), active = T)))
The problem in the original example is that every query overlays over the total bar separately and starts at y=0
. Thus, the remaining black part of the bar always has the exact same height as the purple part at the bottom. The workaround is to systematically add queries of combinations of the different values the variable can take:
c("M","Q","C")
as the second parameter to params = list()
).c("Q","C")
in the first step here). The value left out will be represented by the color of the query, the last one that still included it ("M"
in this example).params = list()
.It should be possible do this programmatically for larger numbers of possible values and providing some color palette. But this remains a workaround and a native implementation of stacking the queries would be nice to have--so if you would like to see this functionality, you might consider bumping up the respective issue over at the Github repo.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With