how can I create an incidence matrix in Julia

Question

I would like to create an incidence matrix.
I have a file with 3 columns, like:

And I want a matrix like (without col and row names):

  2 4 21 22 26 58 347 360   
A 1 0  0  1  0  0   0   0   
B 0 1  1  0  0  0   0   0   
C 0 0  1  0  0  0   0   1   
D 1 0  0  0  1  0   0   0   
E 0 0  0  1  0  1   0   0   
F 1 0  0  0  0  0   1   0

I have started the code like:

haps = readdlm("File.txt",header=true)      
hap1_2 = map(Int64,haps[1][:,2:end])    
ID = (haps[1][:,1])                      
dic1 = Dict()

for (i in 1:21)
    dic1[ID[i]] = hap1_2[i,:]
end

X=[zeros(21,22)];       #the original file has 21 rows and 22 columns 
X1 = hcat(ID,X)

The problem now is that I don't know how to fill the matrix with 1s in the specific columns as in the example above.
I'm also not sure if I'm on the right way.

Any suggestion that could help me??

Thanks!

Dan Getz · Accepted Answer

NamedArrays is a neat package which allows naming both rows and columns and seems to fit the bill for this problem. Suppose the data is in data.csv, here is one method to go about it (install NamedArrays with Pkg.add("NamedArrays")):

data,header = readcsv("data.csv",header=true);
# get the column names by looking at unique values in columns
cols = unique(vec([(header[j+1],data[i,j+1]) for i in 1:size(data,1),j=1:2]))
# row names from ID column
rows = data[:,1]

using NamedArrays
narr = NamedArray(zeros(Int,length(rows),length(cols)),(rows,cols),("id","attr"));
# now stamp in the 1s in the right places
for r=1:size(data,1),c=2:size(data,2) narr[data[r,1],(header[c],data[r,c])] = 1 ; end

Now we have (note I transposed narr for better printout):

julia> narr'
10x6 NamedArray{Int64,2}:
attr ╲ id │ A  B  C  D  E  F
──────────┼─────────────────
("x",22)  │ 1  0  0  0  1  0
("x",4)   │ 0  1  0  0  0  0
("x",21)  │ 0  0  1  0  0  0
("x",26)  │ 0  0  0  1  0  0
("x",2)   │ 0  0  0  0  0  1
("y",2)   │ 1  0  0  1  0  0
("y",21)  │ 0  1  0  0  0  0
("y",360) │ 0  0  1  0  0  0
("y",58)  │ 0  0  0  0  1  0
("y",347) │ 0  0  0  0  0  1

But, if DataFrames are necessary, similar tricks should apply.

---------- UPDATE ----------

In case the column of a value should be ignored i.e. x=2 and y=2 should both set a 1 on column for value 2, then the code becomes:

using NamedArrays
data,header = readcsv("data.csv",header=true);
rows = data[:,1]
cols = map(string,sort(unique(vec(data[:,2:end]))))
narr = NamedArray(zeros(Int,length(rows),length(cols)),(rows,cols),("id","attr"));
for r=1:size(data,1),c=2:size(data,2) narr[data[r,1],string(data[r,c])] = 1 ; end

giving:

julia> narr
6x8 NamedArray{Int64,2}:
id ╲ attr │   2    4   21   22   26   58  347  360
──────────┼───────────────────────────────────────
A         │   1    0    0    1    0    0    0    0
B         │   0    1    1    0    0    0    0    0
C         │   0    0    1    0    0    0    0    1
D         │   1    0    0    0    1    0    0    0
E         │   0    0    0    1    0    1    0    0
F         │   1    0    0    0    0    0    1    0

how can I create an incidence matrix in Julia

Tags:

matrix

julia

godines

1 Answers

Dan Getz

Recent Activity

Donate For Us

how can I create an incidence matrix in Julia

Tags:

matrix

julia

godines

1 Answers

Dan Getz

Related questions

Recent Activity

Donate For Us