Add column of empty lists to DataFrame

Tags:

python

pandas

Similar to this question How to add an empty column to a dataframe?, I am interested in knowing the best way to add a column of empty lists to a DataFrame.

What I am trying to do is basically initialize a column and as I iterate over the rows to process some of them, then add a filled list in this new column to replace the initialized value.

For example, if below is my initial DataFrame:

df = pd.DataFrame(d = {'a': [1,2,3], 'b': [5,6,7]}) # Sample DataFrame  >>> df    a  b 0  1  5 1  2  6 2  3  7

Then I want to ultimately end up with something like this, where each row has been processed separately (sample results shown):

>>> df    a  b          c 0  1  5     [5, 6] 1  2  6     [9, 0] 2  3  7  [1, 2, 3]

Of course, if I try to initialize like df['e'] = [] as I would with any other constant, it thinks I am trying to add a sequence of items with length 0, and hence fails.

If I try initializing a new column as None or NaN, I run in to the following issues when trying to assign a list to a location.

df['d'] = None  >>> df    a  b     d 0  1  5  None 1  2  6  None 2  3  7  None

Issue 1 (it would be perfect if I can get this approach to work! Maybe something trivial I am missing):

>>> df.loc[0,'d'] = [1,3]  ... ValueError: Must have equal len keys and value when setting with an iterable

Issue 2 (this one works, but not without a warning because it is not guaranteed to work as intended):

>>> df['d'][0] = [1,3]  C:\Python27\Scripts\ipython:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

Hence I resort to initializing with empty lists and extending them as needed. There are a couple of methods I can think of to initialize this way, but is there a more straightforward way?

Method 1:

df['empty_lists1'] = [list() for x in range(len(df.index))]  >>> df    a  b   empty_lists1 0  1  5             [] 1  2  6             [] 2  3  7             []

Method 2:

 df['empty_lists2'] = df.apply(lambda x: [], axis=1)  >>> df    a  b   empty_lists1   empty_lists2 0  1  5             []             [] 1  2  6             []             [] 2  3  7             []             []

Summary of questions:

Is there any minor syntax change that can be addressed in Issue 1 that can allow a list to be assigned to a None/NaN initialized field?

If not, then what is the best way to initialize a new column with empty lists?

839

asked Jul 17 '15 00:07

vk1011

Video Answer

1 Answers

One more way is to use np.empty:

df['empty_list'] = np.empty((len(df), 0)).tolist()

You could also knock off .index in your "Method 1" when trying to find len of df.

df['empty_list'] = [[] for _ in range(len(df))]

Turns out, np.empty is faster...

In [1]: import pandas as pd  In [2]: df = pd.DataFrame(pd.np.random.rand(1000000, 5))  In [3]: timeit df['empty1'] = pd.np.empty((len(df), 0)).tolist() 10 loops, best of 3: 127 ms per loop  In [4]: timeit df['empty2'] = [[] for _ in range(len(df))] 10 loops, best of 3: 193 ms per loop  In [5]: timeit df['empty3'] = df.apply(lambda x: [], axis=1) 1 loops, best of 3: 5.89 s per loop

answered Sep 24 '22 00:09

ComputerFellow

Related questions
                            
                                Numpy - module has no attribute 'arrange' [closed]
                            
                                Python Multiple Assignment Statements In One Line
                            
                                Is there something like 'autotest' for Python unittests?
                            
                                When running a python script in IDLE, is there a way to pass in command line arguments (args)?
                            
                                How to write a custom `.assertFoo()` method in Python?
                            
                                Different std in pandas vs numpy
                            
                                Django Rest Framework - APIView Pagination
                            
                                How does python compute the hash of a tuple
                            
                                Is there a pure Python Lucene?
                            
                                how to remove attribute of a etree Element?
                            
                                How can I make Perl and Python print each line of the program being executed?
                            
                                Flask url_for URLs in Javascript
                            
                                How to parse string dates with 2-digit year?
                            
                                "subprocess.Popen" - checking for success and errors
                            
                                Break // in x axis of matplotlib [duplicate]
                            
                                Convert Pandas dataframe to Dask dataframe
                            
                                Concepts of backref and back_populate in SQLalchemy?
                            
                                Get original indices of a sorted Numpy array
                            
                                Cython: (Why / When) Is it preferable to use Py_ssize_t for indexing?
                            
                                Better way to mock class attribute in python unit test

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With