Similar to this question How to add an empty column to a dataframe?, I am interested in knowing the best way to add a column of empty lists to a DataFrame.
What I am trying to do is basically initialize a column and as I iterate over the rows to process some of them, then add a filled list in this new column to replace the initialized value.
For example, if below is my initial DataFrame:
df = pd.DataFrame(d = {'a': [1,2,3], 'b': [5,6,7]}) # Sample DataFrame >>> df a b 0 1 5 1 2 6 2 3 7
Then I want to ultimately end up with something like this, where each row has been processed separately (sample results shown):
>>> df a b c 0 1 5 [5, 6] 1 2 6 [9, 0] 2 3 7 [1, 2, 3]
Of course, if I try to initialize like df['e'] = []
as I would with any other constant, it thinks I am trying to add a sequence of items with length 0, and hence fails.
If I try initializing a new column as None
or NaN
, I run in to the following issues when trying to assign a list to a location.
df['d'] = None >>> df a b d 0 1 5 None 1 2 6 None 2 3 7 None
Issue 1 (it would be perfect if I can get this approach to work! Maybe something trivial I am missing):
>>> df.loc[0,'d'] = [1,3] ... ValueError: Must have equal len keys and value when setting with an iterable
Issue 2 (this one works, but not without a warning because it is not guaranteed to work as intended):
>>> df['d'][0] = [1,3] C:\Python27\Scripts\ipython:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
Hence I resort to initializing with empty lists and extending them as needed. There are a couple of methods I can think of to initialize this way, but is there a more straightforward way?
Method 1:
df['empty_lists1'] = [list() for x in range(len(df.index))] >>> df a b empty_lists1 0 1 5 [] 1 2 6 [] 2 3 7 []
Method 2:
df['empty_lists2'] = df.apply(lambda x: [], axis=1) >>> df a b empty_lists1 empty_lists2 0 1 5 [] [] 1 2 6 [] [] 2 3 7 [] []
Summary of questions:
Is there any minor syntax change that can be addressed in Issue 1 that can allow a list to be assigned to a None
/NaN
initialized field?
If not, then what is the best way to initialize a new column with empty lists?
Add an Empty Column by Index Using Dataframe.Use DataFrame. insert() method to add an empty column at any position on the pandas DataFrame. This adds a column inplace on the existing DataFrame object.
The easiest way to add an empty column to a dataframe in R is to use the add_column() method: dataf %>% add_column(new_col = NA) .
One more way is to use np.empty
:
df['empty_list'] = np.empty((len(df), 0)).tolist()
You could also knock off .index
in your "Method 1" when trying to find len
of df
.
df['empty_list'] = [[] for _ in range(len(df))]
Turns out, np.empty
is faster...
In [1]: import pandas as pd In [2]: df = pd.DataFrame(pd.np.random.rand(1000000, 5)) In [3]: timeit df['empty1'] = pd.np.empty((len(df), 0)).tolist() 10 loops, best of 3: 127 ms per loop In [4]: timeit df['empty2'] = [[] for _ in range(len(df))] 10 loops, best of 3: 193 ms per loop In [5]: timeit df['empty3'] = df.apply(lambda x: [], axis=1) 1 loops, best of 3: 5.89 s per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With