Pandas DataFrame with MultiIndex to Numpy Matrix

Tags:

I have a pandas DataFrame with 2 indexes. (MultiIndex) I want to get out a Numpy Matrix with something like df.as_matrix(...) but this matrix has shape (n_rows, 1). I want a matrix of shape (n_index1_rows, n_index2_rows, 1).

Is there a way to use .groupby(...) then a .values.tolist() or .as_matrix(...) to get the desired shape?

EDIT: Data

                                                              value  
current_date                  temp_date                                        
1970-01-01 00:00:01.446237485 1970-01-01 00:00:01.446237489   30.497100   
                              1970-01-01 00:00:01.446237494    9.584300   
                              1970-01-01 00:00:01.446237455   10.134200   
                              1970-01-01 00:00:01.446237494    7.803683   
                              1970-01-01 00:00:01.446237400   10.678700   
                              1970-01-01 00:00:01.446237373    9.700000   
                              1970-01-01 00:00:01.446237180   15.000000   
                              1970-01-01 00:00:01.446236961   12.928866   
                              1970-01-01 00:00:01.446237032   10.458800

This is kind of the idea:

np.array([np.resize(x.as_matrix(["value"]).copy(), (500, 1)) for (i, x) in df.reset_index("current_date").groupby("current_date")])

949

asked Nov 03 '15 20:11

Ty Pavicich

1 Answers

I think what you want is to unstack the multiindex, e.g.

df.unstack().values[:, :, np.newaxis]

Edit: if you have duplicate indices, unstacking won't work, and you probably want a pivot_table instead:

pivoted = df.reset_index().pivot_table(index='current_date',
                                       columns='temp_date',
                                       aggfunc='mean')
arr = pivoted.values[:, :, np.newaxis]
arr.shape
# (10, 50, 1)

Here's a full example of unstack. First we'll create some data:

current = pd.date_range('2015', periods=10, freq='D')
temp = pd.date_range('2015', periods=50, freq='D')
ind = pd.MultiIndex.from_product([current, temp],
                                 names=['current_date', 'temp_date'])
df = pd.DataFrame({'val':np.random.rand(len(ind))},
                  index=ind)
df.head()
#                               val
# current_date temp_date           
# 2015-01-01   2015-01-01  0.309488
#              2015-01-02  0.697876
#              2015-01-03  0.621318
#              2015-01-04  0.308298
#              2015-01-05  0.936828

Now we unstack the multiindex: we'll show the first 4x4 slice of the data:

df.unstack().iloc[:4, :4]
#                     val                                 
# temp_date    2015-01-01 2015-01-02 2015-01-03 2015-01-04
# current_date                                            
# 2015-01-01     0.309488   0.697876   0.621318   0.308298
# 2015-01-02     0.323530   0.751486   0.507087   0.995565
# 2015-01-03     0.805709   0.101129   0.358664   0.501209
# 2015-01-04     0.360644   0.941200   0.727570   0.884314

Now extract the numpy array, and reshape to [nrows x ncols x 1] as you specified in the question:

vals = df.unstack().values[:, :, np.newaxis]
print(vals.shape)
# (10, 50, 1)

answered Oct 05 '22 19:10

jakevdp

Related questions
                            
                                Import Error: No module named numpy Anaconda
                            
                                Neural network backprop not fully training
                            
                                Minimize quadratic function subject to linear equality constraints with SciPy
                            
                                Max value per diagonal in 2d array
                            
                                Better use a tuple or numpy array for storing coordinates
                            
                                How to account for column-contiguous array when extending numpy with C
                            
                                Numpy NdArray Memoization
                            
                                Efficient item binning algorithm (itertools/numpy)
                            
                                NumPy: use 2D index array from argmin in a 3D slice
                            
                                Pythonic way to import data from multiple files into an array
                            
                                Is there a standard way to store XY data in Python?
                            
                                Can not the computed centroid values to be plotted over the existing plot based on data
                            
                                Find two pairs of pairs that sum to the same value
                            
                                ImportError: cannot import name add_newdocs
                            
                                Segmentation fault when using boost::numpy::ndarray
                            
                                Column wise sum V row wise sum: Why don't I see a difference using NumPy?
                            
                                Theano broadcasting different to numpy's
                            
                                Slow bitwise operations
                            
                                Why is B = numpy.dot(A,x) so much slower looping through doing B[i,:,:] = numpy.dot(A[i,:,:],x) )?
                            
                                dimshuffle equivalent function in Numpy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas DataFrame with MultiIndex to Numpy Matrix

Tags:

pandas

matrix

numpy

Ty Pavicich

People also ask

1 Answers

jakevdp

Recent Activity

Donate For Us