I would like to create a matrix from a three column file. I am sure it's something extremely easy, but I just do not understand how it needs to be done. Please be gentle, I am a beginner to python. Thank you
The format of my input file
A A 5
A B 4
A C 3
B B 2
B C 1
C C 0
Desired output - complete matrix
A B C
A 5 4 3
B 4 2 1
C 3 1 0
Or - half matrix
A B C
A 5 4 3
B 2 1
C 0
I tried this, but as I said, I am VERY new to python and programming.
import numpy as np
for line in file('test').readlines():
name1, name2, value = line.strip().split('\t')
a = np.matrix([[name1], [name2], [value]])
print a
WORKING SCRIPT - One of my friend also helped me, so if anyone if interested in a simpler script, here it is. It's not the most efficient, but works perfectly.
data = {}
names = set([])
for line in file('test').readlines():
name1, name2, value = line.strip().split('\t')
data[(name1, name2)] = value
names.update([name1])
names = sorted(list(names))
print names
print data
output = open('out.txt', 'w')
output.write("\t%s\n" % ("\t".join(names)))
for nameA in names:
output.write("%s" % nameA)
for nameB in names:
key = (nameA, nameB)
if key in data:
output.write("\t%s" % data[(nameA, nameB)])
else:
output.write("\t")
output.write("\n")
output.close()
Try:
import pandas as pd
import numpy as np
raw = []
with open('test.txt','r') as f:
for line in f:
raw.append(line.split())
data = pd.DataFrame(raw,columns = ['row','column','value'])
data_ind = data.set_index(['row','column']).unstack('column')
np.array(data_ind.values,dtype=float))
Output:
array([[ 5., 4., 3.],
[ nan, 2., 1.],
[ nan, nan, 0.]])
Although there's already an accepted answer, it uses pandas. A relatively generic way of getting the same effect but by not using a additional library is this: (numpy is used because OP specified numpy, however you can achieve the same thing with lists)
import string
import numpy as np
up = string.ascii_uppercase
uppercase = list()
for letter in up:
uppercase.append(letter)
file = open("a.txt")
matrix = np.zeros((3, 3))
for line in file.readlines():
tmp = line.strip()
tmp = tmp.split(" ")
idx = uppercase.index(tmp[0])
idy = uppercase.index(tmp[1])
matrix[idx, idy] = tmp[2]
Idea is that you gather all the alphabetical letters, hopefully OP will limit themselves to just the English alphabet without special chars (šđćžčę°e etc...).
We create a list of from the alphabet so that we can use the index
method to retrieve the index value. I.e. uppercase.index("A")
is 0
. We can use these indices to fill in our array.
Read in file line by line, strip extra characters, split by space to get:
['A', 'A', '5']
['A', 'B', '4']
This is now the actual working part:
idx = uppercase.index(tmp[0])
idy = uppercase.index(tmp[1])
matrix[idx, idy] = tmp[2]
I.e. for letter "A", idx
evaluates to 0
and so does idy
. Then matrix[0,0]
becomes the value tmp[2]
which is 4
. Following the same logic for "B" we get matrix[0,1]=5
. And so on.
A more generalized case would be to declare matrix = np.zeros((3, 3))
as matrix = np.zeros((26, 26))
because there are 26 letters in english alphabet and the OP doesn't have to just use "ABC", but could potentially use the entire range A-Z.
Example output for upper program would be:
>>> matrix
array([[ 5., 4., 3.],
[ 0., 2., 1.],
[ 0., 0., 0.]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With