I am trying to convert a multi-index pandas DataFrame
into a numpy.ndarray
. The DataFrame is below:
s1 s2 s3 s4
Action State
1 s1 0.0 0 0.8 0.2
s2 0.1 0 0.9 0.0
2 s1 0.0 0 0.9 0.1
s2 0.0 0 1.0 0.0
I would like the resulting numpy.ndarray
to be the following with np.shape() = (2,2,4)
:
[[[ 0.0 0.0 0.8 0.2 ]
[ 0.1 0.0 0.9 0.0 ]]
[[ 0.0 0.0 0.9 0.1 ]
[ 0.0 0.0 1.0 0.0]]]
I have tried df.as_matrix()
but this returns:
[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]
[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]
How do I return a list of lists for the first level with each list representing an Action
records.
You could use the following:
dim = len(df.index.get_level_values(0).unique())
result = df.values.reshape((dim1, dim1, df.shape[1]))
print(result)
[[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]]
[[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]]
The first line just finds the number of groups that you want to groupby.
Why this (or groupby) is needed: as soon as you use .values
, you lose the dimensionality of the MultiIndex from pandas. So you need to re-pass that dimensionality to NumPy in some way.
One way
In [151]: df.groupby(level=0).apply(lambda x: x.values.tolist()).values
Out[151]:
array([[[0.0, 0.0, 0.8, 0.2],
[0.1, 0.0, 0.9, 0.0]],
[[0.0, 0.0, 0.9, 0.1],
[0.0, 0.0, 1.0, 0.0]]], dtype=object)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With