I need to initialize the cells in a column of a DataFrame
to lists
.
df['some_col'] = [[] for _ in no_of_rows]
I am wondering is there a better way to do that in terms of time efficiency?
You can create an empty list using an empty pair of square brackets [] or the type constructor list() , a built-in function that creates an empty list when no arguments are passed. Square brackets [] are commonly used in Python to create empty lists because it is faster and more concise.
Create an Empty Series: We can easily create an empty series in Pandas which means it will not have any value. The syntax that is used for creating an Empty Series: <series object> = pandas. Series()
Since you are looking for time efficiency, below some benchmarks. I think list
comprehension is already quite fast to create the empty list
of list
objects, but you can squeeze out a marginal improvement using itertools.repeat
. On the insert
piece, apply
is 3x slower because it loops:
import pandas as pd
from itertools import repeat
df = pd.DataFrame({"A":np.arange(100000)})
%timeit df['some_col'] = [[] for _ in range(len(df))]
100 loops, best of 3: 8.75 ms per loop
%timeit df['some_col'] = [[] for i in repeat(None, len(df))]
100 loops, best of 3: 8.02 ms per loop
%%timeit
df['some_col'] = ''
df['some_col'] = df['some_col'].apply(list)
10 loops, best of 3: 25 ms per loop
Try apply
:
df1['some_col'] = ''
df1['some_col'] = df1['some_col'].apply(list)
Sample:
df1 = pd.DataFrame({'a': pd.Series([1,2])})
print (df1)
a
0 1
1 2
df1['some_col'] = ''
df1['some_col'] = df1['some_col'].apply(list)
print (df1)
a some_col
0 1 []
1 2 []
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With