Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract intersection list from upset object

I'm making some comparisons with UpSetR, and I'd like to save the lists of elements that fall into each intersection. Is this possible? I can't find it anywhere...

It would be pretty tedious to do it manually (many lists), and since they're calculated anyway not being able to save them is frustrating

like image 677
Matteo Avatar asked Apr 25 '26 05:04

Matteo


2 Answers

Here is my solution. It is bio-related, but should be easy to translate to other fields. I start with a list of vectors. In my case a list of genes (list of characters) belonging to different signatures (different sets).

str(list_filter)
List of 9
 $ CellAge_Induces         : chr [1:153] "AAK1" "ABI3" "ADCK5" "AGT" ...
 $ CellAge_Inhibits        : chr [1:121] "ACLY" "AKR1B1" "ASPH" "ATF7IP" ...
 $ CLASSICAL_SASP          : chr [1:38] "BGN" "CCL2" "CCL20" "COL1A1" ...
 $ FRIDMAN_SENESCENCE_UP   : chr [1:77] "ALDH1A3" "CCND1" "CD44" "CDKN1A" ...
 $ ISM_SCORE               : chr [1:128] "HSH2D" "OTOF" "TRIM69" "PSME1" ...
 $ MOSERLE_IFNA_RESPONSE   : chr [1:31] "CD274" "CMPK2" "CXCL10" "DDX58" ...
 $ REACTOME_SENESCENCE_SASP: chr [1:110] "ANAPC1" "ANAPC10" "ANAPC11" "ANAPC15" ...
 $ SAEPHIA_CURATED_SASP    : chr [1:38] "IL1A" "IL1B" "CXCL10" "CXCL1" ...
 $ senmayo                 : chr [1:125] "ACVR1B" "ANG" "ANGPT1" "ANGPTL4" ...

From this list, I generate two tables: One with the unique gene names

df2 <- data.frame(gene=unique(unlist(list_filter)))

head(df2)
     gene
1    AAK1
2    ABI3
3   ADCK5
4     AGT
5    AKT1
6 ALOX15B

dim(df2)
[1] 671   1

One is simply a "dataframe" version of the list. With every gene in the signature and the name of every signature (set).

df1 <- lapply(list_filter,function(x){
  data.frame(gene = x)
}) %>% 
  bind_rows(.id = "path")

head(df1)
             path    gene
1 CellAge_Induces    AAK1
2 CellAge_Induces    ABI3
3 CellAge_Induces   ADCK5
4 CellAge_Induces     AGT
5 CellAge_Induces    AKT1
6 CellAge_Induces ALOX15B

dim(df1)
[1] 821   2

now I iterate the search of each unique gene name and save the identity of the signatures in a column.

df_int <- lapply(df2$gene,function(x){
  # pull the name of the intersections
  intersection <- df1 %>% 
    dplyr::filter(gene==x) %>% 
    arrange(path) %>% 
    pull("path") %>% 
    paste0(collapse = "|")
  
  # build the dataframe
  data.frame(gene = x,int = intersection)
}) %>% 
  bind_rows()

head(df_int,n=20)
      gene                                                            int
1     AAK1                                                CellAge_Induces
2     ABI3                                                CellAge_Induces
3    ADCK5                                                CellAge_Induces
4      AGT                                                CellAge_Induces
5     AKT1                                                CellAge_Induces
6  ALOX15B                                                CellAge_Induces
7       AR                                                CellAge_Induces
8   ARPC1B                                                CellAge_Induces
9    ASF1A                                                CellAge_Induces
10     AXL                                        CellAge_Induces|senmayo
11 BHLHE40                                                CellAge_Induces
12     BLK                                                CellAge_Induces
13    BRAF                                                CellAge_Induces
14    BRD7                                                CellAge_Induces
15    CAV1                                                CellAge_Induces
16   CCND1                          CellAge_Induces|FRIDMAN_SENESCENCE_UP
17   CDK18                                                CellAge_Induces
18  CDKN1A CellAge_Induces|FRIDMAN_SENESCENCE_UP|REACTOME_SENESCENCE_SASP
19  CDKN1C                          CellAge_Induces|FRIDMAN_SENESCENCE_UP
20  CDKN1B                       CellAge_Induces|REACTOME_SENESCENCE_SASP

dim(df_int)
[1] 671   2

the dataframe can be summarised and compared to the output provided by calling

df_int %>% 
  group_by(int) %>% 
  summarise(n=n()) %>% 
  arrange(desc(n))
# A tibble: 47 × 2
   int                                 n
   <chr>                           <int>
 1 CellAge_Induces                   126
 2 CellAge_Inhibits                  110
 3 REACTOME_SENESCENCE_SASP           95
 4 ISM_SCORE                          93
 5 senmayo                            77
 6 FRIDMAN_SENESCENCE_UP              44
 7 ISM_SCORE|MOSERLE_IFNA_RESPONSE    27
 8 CLASSICAL_SASP|senmayo             12
 9 CLASSICAL_SASP                      8
10 SAEPHIA_CURATED_SASP                8
# … with 37 more rows
# ℹ Use `print(n = ...)` to see more rows


upset(fromList(list_filter),nsets = 10) 

enter image description here

like image 130
edoardo pedrini Avatar answered Apr 26 '26 19:04

edoardo pedrini


Here is my take at extracting the different intersections together with the list of elements in them.

The main idea is to paste all the 0's and 1's from the binary table to create unique identifiers for each intersection and them use the dplyr::group_by function to extract the information

data <- data.frame(
  entry = paste0("Entry.", 1:10),
  "A" = c(0,0,0,0,1,0,1,1,0,0),
  "B" = c(1,0,0,0,1,1,1,1,1,0),
  "C" = c(1,1,1,1,0,0,1,0,1,1)
)

# NOT REQUIRED. Only to confirm that upset works with these data
upset(data)

You can then identify the intersections by pasting all the binary columns. I use the unite convenience function for this:

NB: you may have to change this depending on whether your data has row names or a column with names

data_with_intersection <- data %>%
  unite(col = "intersection", -c("entry"), sep = "")

From here, you can easily calculate the size of each intersection:

# Table of intersections and the number of entries
data_with_intersection %>%
  group_by(intersection) %>%
  summarise(n = n()) %>%
  arrange(desc(n))

Or even extract the list of entries/elements in each intersection:

# List of intersections and their entries
data_with_intersection %>%
  group_by(intersection) %>%
  summarise(list = list(entry)) %>%
  mutate(list = setNames(list, intersection)) %>%
  pull(list)
like image 44
Javier Herrero Avatar answered Apr 26 '26 19:04

Javier Herrero



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!