I have a voting dataset like that:
republican,n,y,n,y,y,y,n,n,n,y,?,y,y,y,n,y
republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,?
democrat,?,y,y,?,y,y,n,n,n,n,y,n,y,y,n,n
democrat,n,y,y,n,?,y,n,n,n,n,y,n,y,n,n,y
but they are both string so I want to change them to integer matrix and make statistic hou_dat = pd.read_csv("house.data", header=None)
for i in range (0, hou_dat.shape[0]):
for j in range (0, hou_dat.shape[1]):
if hou_dat[i, j] == "republican":
hou_dat[i, j] = 2
if hou_dat[i, j] == "democrat":
hou_dat[i, j] = 3
if hou_dat[i, j] == "y":
hou_dat[i, j] = 1
if hou_dat[i, j] == "n":
hou_dat[i, j] = 0
if hou_dat[i, j] == "?":
hou_dat[i, j] = -1
hou_sta = hou_dat.apply(pd.value_counts)
print(hou_sta)
however, it shows error, how to solve it?:
Exception has occurred: KeyError
(0, 0)
IIUC, you need map
and stack
map_dict = {'republican' : 2,
'democrat' : 3,
'y' : 1,
'n' : 0,
'?' : -1}
df1 = df.stack().map(map_dict).unstack()
print(df1)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0 2 0 1 0 1 1 1 0 0 0 1 -1 1 1 1 0 1
1 2 0 1 0 1 1 1 0 0 0 0 0 1 1 1 0 -1
2 3 -1 1 1 -1 1 1 0 0 0 0 1 0 1 1 0 0
3 3 0 1 1 0 -1 1 0 0 0 0 1 0 1 0 0 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With