I have a pandas DataFrame with 2 indexes. (MultiIndex) I want to get out a Numpy Matrix with something like df.as_matrix(...)
but this matrix has shape (n_rows, 1)
. I want a matrix of shape (n_index1_rows, n_index2_rows, 1)
.
Is there a way to use .groupby(...)
then a .values.tolist()
or .as_matrix(...)
to get the desired shape?
EDIT: Data
value
current_date temp_date
1970-01-01 00:00:01.446237485 1970-01-01 00:00:01.446237489 30.497100
1970-01-01 00:00:01.446237494 9.584300
1970-01-01 00:00:01.446237455 10.134200
1970-01-01 00:00:01.446237494 7.803683
1970-01-01 00:00:01.446237400 10.678700
1970-01-01 00:00:01.446237373 9.700000
1970-01-01 00:00:01.446237180 15.000000
1970-01-01 00:00:01.446236961 12.928866
1970-01-01 00:00:01.446237032 10.458800
This is kind of the idea:
np.array([np.resize(x.as_matrix(["value"]).copy(), (500, 1)) for (i, x) in df.reset_index("current_date").groupby("current_date")])
Convert the DataFrame to a NumPy array. By default, the dtype of the returned array will be the common NumPy dtype of all types in the DataFrame. For example, if the dtypes are float16 and float32 , the results dtype will be float32 .
To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index(). Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True.
pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.
I think what you want is to unstack the multiindex, e.g.
df.unstack().values[:, :, np.newaxis]
Edit: if you have duplicate indices, unstacking won't work, and you probably want a pivot_table
instead:
pivoted = df.reset_index().pivot_table(index='current_date',
columns='temp_date',
aggfunc='mean')
arr = pivoted.values[:, :, np.newaxis]
arr.shape
# (10, 50, 1)
Here's a full example of unstack
. First we'll create some data:
current = pd.date_range('2015', periods=10, freq='D')
temp = pd.date_range('2015', periods=50, freq='D')
ind = pd.MultiIndex.from_product([current, temp],
names=['current_date', 'temp_date'])
df = pd.DataFrame({'val':np.random.rand(len(ind))},
index=ind)
df.head()
# val
# current_date temp_date
# 2015-01-01 2015-01-01 0.309488
# 2015-01-02 0.697876
# 2015-01-03 0.621318
# 2015-01-04 0.308298
# 2015-01-05 0.936828
Now we unstack the multiindex: we'll show the first 4x4 slice of the data:
df.unstack().iloc[:4, :4]
# val
# temp_date 2015-01-01 2015-01-02 2015-01-03 2015-01-04
# current_date
# 2015-01-01 0.309488 0.697876 0.621318 0.308298
# 2015-01-02 0.323530 0.751486 0.507087 0.995565
# 2015-01-03 0.805709 0.101129 0.358664 0.501209
# 2015-01-04 0.360644 0.941200 0.727570 0.884314
Now extract the numpy array, and reshape to [nrows x ncols x 1] as you specified in the question:
vals = df.unstack().values[:, :, np.newaxis]
print(vals.shape)
# (10, 50, 1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With