Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Store numpy.array in cells of a Pandas.DataFrame

I have a dataframe in which I would like to store 'raw' numpy.array:

df['COL_ARRAY'] = df.apply(lambda r: np.array(do_something_with_r), axis=1) 

but it seems that pandas tries to 'unpack' the numpy.array.

Is there a workaround? Other than using a wrapper (see edit below)?

I tried reduce=False with no success.

EDIT

This works, but I have to use the 'dummy' Data class to wrap around the array, which is unsatisfactory and not very elegant.

class Data:     def __init__(self, v):         self.v = v  meas = pd.read_excel(DATA_FILE) meas['DATA'] = meas.apply(     lambda r: Data(np.array(pd.read_csv(r['filename'])))),     axis=1 ) 
like image 713
Cedric H. Avatar asked Aug 07 '17 13:08

Cedric H.


People also ask

How do I save a NumPy array in a DataFrame?

To convert an array to a dataframe with Python you need to 1) have your NumPy array (e.g., np_array), and 2) use the pd. DataFrame() constructor like this: df = pd. DataFrame(np_array, columns=['Column1', 'Column2']) . Remember, that each column in your NumPy array needs to be named with columns.

How do I add a NumPy array to an existing DataFrame?

Append Numpy array as a row to DataFrame Sometimes, we have to append a numpy array to the existing dataframe as a row that can be simply achieved by using the dataframe. append() method. We have to numpy myarr that used to create a dataframe. arrtoappend that we have to append as a row to pandas dataframe.

Can I store a list in a pandas cell?

You can insert a list of values into a cell in Pandas DataFrame using DataFrame.at() , DataFrame.

Can you use NumPy on pandas DataFrame?

Pandas expands on NumPy by providing easy to use methods for data analysis to operate on the DataFrame and Series classes, which are built on NumPy's powerful ndarray class.


1 Answers

Use a wrapper around the numpy array i.e. pass the numpy array as list

a = np.array([5, 6, 7, 8]) df = pd.DataFrame({"a": [a]}) 

Output:

              a 0  [5, 6, 7, 8] 

Or you can use apply(np.array) by creating the tuples i.e. if you have a dataframe

df = pd.DataFrame({'id': [1, 2, 3, 4],                    'a': ['on', 'on', 'off', 'off'],                    'b': ['on', 'off', 'on', 'off']})  df['new'] = df.apply(lambda r: tuple(r), axis=1).apply(np.array) 

Output :

      a    b  id            new 0   on   on   1    [on, on, 1] 1   on  off   2   [on, off, 2] 2  off   on   3   [off, on, 3] 3  off  off   4  [off, off, 4] 
df['new'][0] 

Output :

array(['on', 'on', '1'], dtype='<U2') 
like image 68
Bharath Avatar answered Sep 30 '22 03:09

Bharath