Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame with MultiIndex to Numpy Matrix

I have a pandas DataFrame with 2 indexes. (MultiIndex) I want to get out a Numpy Matrix with something like df.as_matrix(...) but this matrix has shape (n_rows, 1). I want a matrix of shape (n_index1_rows, n_index2_rows, 1).

Is there a way to use .groupby(...) then a .values.tolist() or .as_matrix(...) to get the desired shape?

EDIT: Data

                                                              value  
current_date                  temp_date                                        
1970-01-01 00:00:01.446237485 1970-01-01 00:00:01.446237489   30.497100   
                              1970-01-01 00:00:01.446237494    9.584300   
                              1970-01-01 00:00:01.446237455   10.134200   
                              1970-01-01 00:00:01.446237494    7.803683   
                              1970-01-01 00:00:01.446237400   10.678700   
                              1970-01-01 00:00:01.446237373    9.700000   
                              1970-01-01 00:00:01.446237180   15.000000   
                              1970-01-01 00:00:01.446236961   12.928866   
                              1970-01-01 00:00:01.446237032   10.458800

This is kind of the idea:

np.array([np.resize(x.as_matrix(["value"]).copy(), (500, 1)) for (i, x) in df.reset_index("current_date").groupby("current_date")])
like image 949
Ty Pavicich Avatar asked Nov 03 '15 20:11

Ty Pavicich


People also ask

Can we convert DataFrame to NumPy array?

Convert the DataFrame to a NumPy array. By default, the dtype of the returned array will be the common NumPy dtype of all types in the DataFrame. For example, if the dtypes are float16 and float32 , the results dtype will be float32 .

How do I convert MultiIndex to single index in pandas?

To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index(). Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True.

How convert MultiIndex to columns in pandas?

pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.


1 Answers

I think what you want is to unstack the multiindex, e.g.

df.unstack().values[:, :, np.newaxis]

Edit: if you have duplicate indices, unstacking won't work, and you probably want a pivot_table instead:

pivoted = df.reset_index().pivot_table(index='current_date',
                                       columns='temp_date',
                                       aggfunc='mean')
arr = pivoted.values[:, :, np.newaxis]
arr.shape
# (10, 50, 1)

Here's a full example of unstack. First we'll create some data:

current = pd.date_range('2015', periods=10, freq='D')
temp = pd.date_range('2015', periods=50, freq='D')
ind = pd.MultiIndex.from_product([current, temp],
                                 names=['current_date', 'temp_date'])
df = pd.DataFrame({'val':np.random.rand(len(ind))},
                  index=ind)
df.head()
#                               val
# current_date temp_date           
# 2015-01-01   2015-01-01  0.309488
#              2015-01-02  0.697876
#              2015-01-03  0.621318
#              2015-01-04  0.308298
#              2015-01-05  0.936828

Now we unstack the multiindex: we'll show the first 4x4 slice of the data:

df.unstack().iloc[:4, :4]
#                     val                                 
# temp_date    2015-01-01 2015-01-02 2015-01-03 2015-01-04
# current_date                                            
# 2015-01-01     0.309488   0.697876   0.621318   0.308298
# 2015-01-02     0.323530   0.751486   0.507087   0.995565
# 2015-01-03     0.805709   0.101129   0.358664   0.501209
# 2015-01-04     0.360644   0.941200   0.727570   0.884314

Now extract the numpy array, and reshape to [nrows x ncols x 1] as you specified in the question:

vals = df.unstack().values[:, :, np.newaxis]
print(vals.shape)
# (10, 50, 1)
like image 60
jakevdp Avatar answered Oct 05 '22 19:10

jakevdp