Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas how to place an array in a single dataframe cell?

So I currently have a dataframe that looks like:

Current Dataframe

And I want to add a completely new column called "Predictors" with only one cell that contains an array.

So [0, 'Predictors'] should contain an array and everything below that cell in the same column should be empty.

Here's my attempt, I tried to create a separate dataframe that just contained the "Predictors" column, and tried appending it to the current dataframe, but I get: 'Length mismatch: Expected axis has 3 elements, new values have 4 elements.'

How do I append a single cell containing an array to my dataframe?

# create a list and dataframe to hold the names of predictors
dataframe=dataframe.drop(['price','Date'],axis=1)  
predictorsList = dataframe.columns.get_values().tolist()
predictorsList = np.array(predictorsList, dtype=object)

# Combine actual and forecasted lists to one dataframe
combinedResults = pd.DataFrame({'Actual': actual, 'Forecasted': forecasted})

predictorsDF = pd.DataFrame({'Predictors': [predictorsList]})

# Add Predictors to dataframe
#combinedResults.at[0, 'Predictors'] = predictorsList
pd.concat([combinedResults,predictorsDF], ignore_index=True, axis=1)
like image 711
amadzebra Avatar asked Jul 06 '18 22:07

amadzebra


People also ask

How do you put an array into a DataFrame column?

To convert an array to a dataframe with Python you need to 1) have your NumPy array (e.g., np_array), and 2) use the pd. DataFrame() constructor like this: df = pd. DataFrame(np_array, columns=['Column1', 'Column2']) . Remember, that each column in your NumPy array needs to be named with columns.

How do you put a list into a DataFrame cell?

You can insert a list of values into a cell in Pandas DataFrame using DataFrame.at() , DataFrame. iat() , and DataFrame. loc() methods.

Can a DataFrame cell contains a list?

Data frame columns can contain lists Taking into account the list structure of the column, we can type the following to change the values in a single cell. You can also create a data frame having a list as a column using the data.

Can you append series to DataFrame?

append() Pandas DataFrame. append() will append rows (add rows) of other DataFrame, Series, Dictionary or list of these to another DataFrame.


2 Answers

You could fill the rest of the cells in the desired column with NaN, but they will not "empty". To do that, use pd.merge on both indexes:

Setup

import pandas as pd
import numpy as np

df = pd.DataFrame({
     'Actual': [18.442, 15.4233, 20.6217, 16.7, 18.185], 
     'Forecasted': [19.6377, 13.1665, 19.3992, 17.4557, 14.0053]
})

arr = np.zeros(3)
df_arr = pd.DataFrame({'Predictors': [arr]})

Merging df and df_arr

result = pd.merge(
    df,
    df_arr,
    how='left',
    left_index=True, # Merge on both indexes, since right only has 0...
    right_index=True # all the other rows will be NaN
)

Results

>>> print(result)
    Actual  Forecasted       Predictors
0  18.4420     19.6377  [0.0, 0.0, 0.0]
1  15.4233     13.1665              NaN
2  20.6217     19.3992              NaN
3  16.7000     17.4557              NaN
4  18.1850     14.0053              NaN

>>> result.loc[0, 'Predictors']
array([0., 0., 0.])

>>> result.loc[1, 'Predictors'] # actually contains a NaN value
nan 
like image 132
Tomas Farias Avatar answered Oct 05 '22 23:10

Tomas Farias


You need to change the object type of the column (in your case Predictors) first

import pandas as pd
import numpy as np


df=pd.DataFrame(np.arange(20).reshape(5,4), columns=list('abcd'))
df=df.astype(object)  # this line allows the signment of the array
df.iloc[1,2] = np.array([99,99,99])
print(df)

gives

    a   b             c   d
0   0   1             2   3
1   4   5  [99, 99, 99]   7
2   8   9            10  11
3  12  13            14  15
4  16  17            18  19
like image 32
Markus Dutschke Avatar answered Oct 05 '22 22:10

Markus Dutschke