Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

r creating an adjacency matrix from columns in a dataframe

I am interested in testing some network visualization techniques but before trying those functions I want to build an adjacency matrix (from, to) using the dataframe which is as follows.

 Id   Gender   Col_Cold_1  Col_Cold_2  Col_Cold_3  Col_Hot_1  Col_Hot_2   Col_Hot_3  
 10   F         pain       sleep        NA         infection  medication  walking
 14   F         Bump       NA           muscle     NA         twitching   flutter
 17   M                    pain         hemoloma   Callus     infection   
 18   F         muscle                  pain                  twitching   medication

My goal is to create an adjacency matrix as follows

1) All values in columns with keyword Cold will contribute to the rows  
2) All values in columns with keyword Hot will contribute to the columns

For example, pain, sleep, Bump, muscle, hemaloma are cell values under the columns with keyword Cold and they will form the rows and cell values such as infection, medication, Callus, walking, twitching, flutter are under columns with keywords Hot and this will form the columns of the association matrix.

The final desired output should appear like this:

           infection  medication  walking  twitching  flutter  Callus
     pain  2          2           1        1                   1
    sleep  1          1           1
     Bump                                  1          1
   muscle             1                    1
 hemaloma  1                                                   1
  • [pain, infection] = 2 because the association between pain and infection occurs twice in the original dataframe: once in row 1 and again in row 3.

  • [pain, medication]=2 because association between pain and medication occurs twice once in row 1 and again in row 4.

Any suggestions or advice on producing such an association matrix is much appreciated thanks.

Reproducible Dataset

df = structure(list(id = c(10, 14, 17, 18), Gender = structure(c(1L, 1L, 2L, 1L), .Label = c("F", "M"), class = "factor"), Col_Cold_1 = structure(c(4L, 2L, 1L, 3L), .Label = c("", "Bump", "muscle", "pain"), class = "factor"), Col_Cold_2 = structure(c(4L, 2L, 3L, 1L), .Label = c("", "NA", "pain", "sleep"), class = "factor"), Col_Cold_3 = structure(c(1L, 3L, 2L, 4L), .Label = c("NA", "hemaloma", "muscle", "pain" ), class = "factor"), Col_Hot_1 = structure(c(4L, 3L, 2L, 1L), .Label = c("", "Callus", "NA", "infection"), class = "factor"), Col_Hot_2 = structure(c(2L, 3L, 1L, 3L), .Label = c("infection", "medication", "twitching"), class = "factor"), Col_Hot_3 = structure(c(4L, 2L, 1L, 3L), .Label = c("", "flutter", "medication", "walking" ), class = "factor")), .Names = c("id", "Gender", "Col_Cold_1", "Col_Cold_2", "Col_Cold_3", "Col_Hot_1", "Col_Hot_2", "Col_Hot_3" ), row.names = c(NA, -4L), class = "data.frame")
like image 453
Lilla Bulten Avatar asked Dec 18 '16 23:12

Lilla Bulten


1 Answers

One way is to make the dataset into a "tidy" form, then use xtabs. First, some cleaning up:

df[] <- lapply(df, as.character)  # Convert factors to characters
df[df == "NA" | df == "" | is.na(df)] <- NA  # Make all blanks NAs

Now, tidy the dataset:

library(tidyr)
library(dplyr)
out <- do.call(rbind, sapply(grep("^Col_Cold", names(df), value = T), function(x){
  vars <- c(x, grep("^Col_Hot", names(df), value = T))
  setNames(gather_(select(df, one_of(vars)), 
    key_col = x,
    value_col = "value",
    gather_cols = vars[-1])[, c(1, 3)], c("cold", "hot"))
}, simplify = FALSE))

The idea is to "pair" each of the "cold" columns with each of the "hot" columns to make a long dataset. out looks like this:

out
#        cold        hot
# 1      pain  infection
# 2      Bump       <NA>
# 3      <NA>     Callus
# 4    muscle       <NA>
# 5      pain medication
# ...

Finally, use xtabs to make the desired output:

xtabs(~ cold + hot, na.omit(out))
#           hot
# cold       Callus flutter infection medication twitching walking
#   Bump          0       1         0          0         1       0
#   hemaloma      1       0         1          0         0       0
#   muscle        0       1         0          1         2       0
#   pain          1       0         2          2         1       1
#   sleep         0       0         1          1         0       1
like image 104
Weihuang Wong Avatar answered Oct 10 '22 02:10

Weihuang Wong