Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add numpy matrix as new columns for pandas dataframe?

Tags:

I have a NxM dataframe and a NxL numpy matrix. I'd like to add the matrix to the dataframe to create L new columns by simply appending the columns and rows the same order they appear. I tried merge() and join(), but I end up with errors:

assign() keywords must be strings

and

columns overlap but no suffix specified

respectively.

Is there a way I can add a numpy matrix as dataframe columns?

like image 707
Booley Avatar asked Aug 14 '18 18:08

Booley


People also ask

How do you add a matrix to a data frame?

A matrix can be converted to a dataframe by using a function called as. data. frame(). It will take each column from the matrix and convert it to each column in the dataframe.

Can pandas DataFrame hold NumPy array?

For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index , Series , or DataFrame .

How do I create a new column in pandas DataFrame?

In pandas you can add/append a new column to the existing DataFrame using DataFrame. insert() method, this method updates the existing DataFrame with a new column. DataFrame. assign() is also used to insert a new column however, this method returns a new Dataframe after adding a new column.


2 Answers

You can turn the matrix into a datframe and use concat with axis=1:

For example, given a dataframe df and a numpy array mat:

>>> df
   a  b
0  5  5
1  0  7
2  1  0
3  0  4
4  6  4

>>> mat
array([[0.44926098, 0.29567859, 0.60728561],
       [0.32180566, 0.32499134, 0.94950085],
       [0.64958125, 0.00566706, 0.56473627],
       [0.17357589, 0.71053224, 0.17854188],
       [0.38348102, 0.12440952, 0.90359566]])

You can do:

>>> pd.concat([df, pd.DataFrame(mat)], axis=1)
   a  b         0         1         2
0  5  5  0.449261  0.295679  0.607286
1  0  7  0.321806  0.324991  0.949501
2  1  0  0.649581  0.005667  0.564736
3  0  4  0.173576  0.710532  0.178542
4  6  4  0.383481  0.124410  0.903596
like image 103
sacuL Avatar answered Sep 17 '22 16:09

sacuL


Setup

df = pd.DataFrame({'a': [5,0,1,0,6], 'b': [5,7,0,4,4]})
mat = np.random.rand(5,3)

Using join:

df.join(pd.DataFrame(mat))

   a  b         0         1         2
0  5  5  0.884061  0.803747  0.727161
1  0  7  0.464009  0.447346  0.171881
2  1  0  0.353604  0.912781  0.199477
3  0  4  0.466095  0.136218  0.405766
4  6  4  0.764678  0.874614  0.310778

If there is the chance of overlapping column names, simply supply a suffix:

df = pd.DataFrame({0: [5,0,1,0,6], 1: [5,7,0,4,4]})
mat = np.random.rand(5,3)

df.join(pd.DataFrame(mat), rsuffix='_')

   0  1        0_        1_         2
0  5  5  0.783722  0.976951  0.563798
1  0  7  0.946070  0.391593  0.273339
2  1  0  0.710195  0.827352  0.839212
3  0  4  0.528824  0.625430  0.465386
4  6  4  0.848423  0.467256  0.962953
like image 35
user3483203 Avatar answered Sep 17 '22 16:09

user3483203