I have a Pandas data frame object of shape (X,Y) that looks like this:
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
and a numpy sparse matrix (CSC) of shape (X,Z) that looks something like this
[[0, 1, 0], [0, 0, 1], [1, 0, 0]]
How can I add the content from the matrix to the data frame in a new named column such that the data frame will end up like this:
[[1, 2, 3, [0, 1, 0]], [4, 5, 6, [0, 0, 1]], [7, 8, 9, [1, 0, 0]]]
Notice the data frame now has shape (X, Y+1) and rows from the matrix are elements in the data frame.
To convert an array to a dataframe with Python you need to 1) have your NumPy array (e.g., np_array), and 2) use the pd. DataFrame() constructor like this: df = pd. DataFrame(np_array, columns=['Column1', 'Column2']) . Remember, that each column in your NumPy array needs to be named with columns.
Append Numpy array as a row to DataFrame Sometimes, we have to append a numpy array to the existing dataframe as a row that can be simply achieved by using the dataframe. append() method. We have to numpy myarr that used to create a dataframe. arrtoappend that we have to append as a row to pandas dataframe.
When you have a DataFrame with columns of different datatypes, the returned NumPy Array consists of elements of a single datatype. The lowest datatype of DataFrame is considered for the datatype of the NumPy Array. In the following example, the DataFrame consists of columns of datatype int64 and float64.
import numpy as np import pandas as pd import scipy.sparse as sparse df = pd.DataFrame(np.arange(1,10).reshape(3,3)) arr = sparse.coo_matrix(([1,1,1], ([0,1,2], [1,2,0])), shape=(3,3)) df['newcol'] = arr.toarray().tolist() print(df)
yields
0 1 2 newcol 0 1 2 3 [0, 1, 0] 1 4 5 6 [0, 0, 1] 2 7 8 9 [1, 0, 0]
Consider using a higher dimensional datastructure (a Panel), rather than storing an array in your column:
In [11]: p = pd.Panel({'df': df, 'csc': csc}) In [12]: p.df Out[12]: 0 1 2 0 1 2 3 1 4 5 6 2 7 8 9 In [13]: p.csc Out[13]: 0 1 2 0 0 1 0 1 0 0 1 2 1 0 0
Look at cross-sections etc, etc, etc.
In [14]: p.xs(0) Out[14]: csc df 0 0 1 1 1 2 2 0 3
See the docs for more on Panels.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With