Suppose I have a DataFrame of lists,
my_df = pd.DataFrame({'my_list':[[45,12,23],[20,46,78],[45,30,45]]})
which yields the following:
my_list
0 [45, 12, 23]
1 [20, 46, 78]
2 [45, 30, 45]
How can I add an element, let's say 99, to my_list
for each row ?
Expected result :
my_list
0 [45, 12, 23, 99]
1 [20, 46, 78, 99]
2 [45, 30, 45, 99]
In [90]: my_df['my_list'] += [99]
In [91]: my_df
Out[91]:
my_list
0 [45, 12, 23, 99]
1 [20, 46, 78, 99]
2 [45, 30, 45, 99]
Sounds awfully boring but just iterate over the values directly - this way you can call append
and avoid whatever rebinding occurs with +=
, making things significantly faster.
for val in my_df.my_list:
val.append(99)
Demo
>>> import timeit
>>> setup = '''
import pandas as pd; import numpy as np
df = pd.DataFrame({'my_list': np.random.randint(0, 100, (500, 500)).tolist()})
'''
>>> min(timeit.Timer('for val in df.my_list: val.append(90)',
setup=setup).repeat(10, 1000))
0.05669815401779488
>>> min(timeit.Timer('df.my_list += [90]',
setup=setup).repeat(10, 1000))
2.7741127769695595
Of course, if speed (or even if not speed) is important to you, you should question if you really need to have lists inside a DataFrame. Consider working on a NumPy array until you need Pandas utility and doing something like
np.c_[arr, np.full(arr.shape[0], 90)]
or at least splitting your lists inside the DataFrame to separate columns and assigning a new column .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With