Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to build a matrix from a dataframe based on the values of a specific column?

I have a dataframe named df as follows:

Genes         ID          Type 
CFH         MB-0002       Gain 
CFHR3       MB-0002       Gain 
DEFB131     MB-0003       Gain 
UNC93B5     MB-0003       Loss 
CCDC125     MB-0004       Loss 
CCNB1       MB-0002       Gain
CFH         MB-0004       Loss
CCNB1       MB-0003       Gain   

I want to build a matrix, say Mat, and write it into a csv file where I will have the Genes as rows and the IDs as columns. I want to put:

  • 1 if the corresponding type is Gain
  • -1 if the corresponding type is Loss
  • 0 in all other places.

And example of my matrix would be:

                MB-0002 MB-0003 MB-0004
   CFH              1       0      -1
   CFHR3            1       0       0
   DEFB131          0       1       0
   UNC93B5          0      -1       0
   CCDC125          0       0      -1
   CCNB1            1       1       0
like image 225
Rasif Ajwad Avatar asked Mar 11 '23 16:03

Rasif Ajwad


1 Answers

Try:

xtabs(c(1L, -1L)[Type] ~ ., data=df)
#         ID
#Genes     MB-0002 MB-0003 MB-0004
#  CCDC125       0       0      -1
#  CCNB1         1       1       0
#  CFH           1       0      -1
#  CFHR3         1       0       0
#  DEFB131       0       1       0
#  UNC93B5       0      -1       0

xtab() is similar to table() except that it takes a variable containing the frequency counts for each combination of levels. You can convert the result back to a data-frame with as.data.frame().

The left-hand side of the formula gives the "counts" (in this case the values that the contingency table is to be populated with). It uses a known trick to convert a factor to a numeric vector using indexing (see ?factor). The . on right-hand side is a short-cut for "the rest of the variables in the data-frame", which in this case is equivalent to Genes + ID.

like image 65
Ernest A Avatar answered Apr 06 '23 22:04

Ernest A