Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add numpy array as column to Pandas data frame

I have a Pandas data frame object of shape (X,Y) that looks like this:

[[1, 2, 3], [4, 5, 6], [7, 8, 9]] 

and a numpy sparse matrix (CSC) of shape (X,Z) that looks something like this

[[0, 1, 0], [0, 0, 1], [1, 0, 0]] 

How can I add the content from the matrix to the data frame in a new named column such that the data frame will end up like this:

[[1, 2, 3, [0, 1, 0]], [4, 5, 6, [0, 0, 1]], [7, 8, 9, [1, 0, 0]]] 

Notice the data frame now has shape (X, Y+1) and rows from the matrix are elements in the data frame.

like image 771
Mihai Damian Avatar asked Sep 05 '13 21:09

Mihai Damian


People also ask

How do I assign a NumPy array to a Dataframe column?

To convert an array to a dataframe with Python you need to 1) have your NumPy array (e.g., np_array), and 2) use the pd. DataFrame() constructor like this: df = pd. DataFrame(np_array, columns=['Column1', 'Column2']) . Remember, that each column in your NumPy array needs to be named with columns.

How do I add a NumPy array to an existing Dataframe?

Append Numpy array as a row to DataFrame Sometimes, we have to append a numpy array to the existing dataframe as a row that can be simply achieved by using the dataframe. append() method. We have to numpy myarr that used to create a dataframe. arrtoappend that we have to append as a row to pandas dataframe.

Can NumPy array be Dataframe?

When you have a DataFrame with columns of different datatypes, the returned NumPy Array consists of elements of a single datatype. The lowest datatype of DataFrame is considered for the datatype of the NumPy Array. In the following example, the DataFrame consists of columns of datatype int64 and float64.


2 Answers

import numpy as np import pandas as pd import scipy.sparse as sparse  df = pd.DataFrame(np.arange(1,10).reshape(3,3)) arr = sparse.coo_matrix(([1,1,1], ([0,1,2], [1,2,0])), shape=(3,3)) df['newcol'] = arr.toarray().tolist() print(df) 

yields

   0  1  2     newcol 0  1  2  3  [0, 1, 0] 1  4  5  6  [0, 0, 1] 2  7  8  9  [1, 0, 0] 
like image 65
unutbu Avatar answered Sep 27 '22 21:09

unutbu


Consider using a higher dimensional datastructure (a Panel), rather than storing an array in your column:

In [11]: p = pd.Panel({'df': df, 'csc': csc})  In [12]: p.df Out[12]:     0  1  2 0  1  2  3 1  4  5  6 2  7  8  9  In [13]: p.csc Out[13]:     0  1  2 0  0  1  0 1  0  0  1 2  1  0  0 

Look at cross-sections etc, etc, etc.

In [14]: p.xs(0) Out[14]:     csc  df 0    0   1 1    1   2 2    0   3 

See the docs for more on Panels.

like image 21
Andy Hayden Avatar answered Sep 27 '22 22:09

Andy Hayden