Speed difference between bracket notation and dot notation for accessing columns in pandas

Tags:

Let's have a small dataframe: df = pd.DataFrame({'CID': [1,2,3,4,12345, 6]})

When I search for membership the speed is vastly different based on whether I ask to search in df.CID or in df['CID'].

In[25]:%timeit 12345 in df.CID
Out[25]:89.8 µs ± 254 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In[26]:%timeit 12345 in df['CID']
Out[26]:42.3 µs ± 334 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In[27]:type( df.CID)
Out[27]: pandas.core.series.Series

In[28]:type( df['CID'])
Out[28]: pandas.core.series.Series

Why is that?

256

asked May 21 '19 14:05

dozyaustin

1 Answers

df['CID'] delegates to NDFrame.__getitem__ and it is more obvious you are performing an indexing operation.

On the other hand, df.CID delegates to NDFrame.__getattr__, which has to do some additional heavy lifting, mainly to determine whether 'CID' is an attribute, a function, or a column you're calling using the attribute access (a convenience, but not recommended for production code).

Now, why is it not recommended? Consider,

df = pd.DataFrame({'A': [1, 2, 3]})
df.A

0    1
1    2
2    3
Name: A, dtype: int64

There are no issues referring to column "A" as df.A, because it does not conflict with any attribute or function namings in pandas. However, consider the pop function (just as an example).

df.pop
# <bound method NDFrame.pop of ...>

df.pop is a bound method of df. Now, I'd like to create a column called "pop" for various reasons.

df['pop'] = [4, 5, 6]
df
   A  pop
0  1    4
1  2    5
2  3    6

Great, but,

df.pop
# <bound method NDFrame.pop of ...>

I cannot use the attribute notation to access this column. However...

df['pop']

0    4
1    5
2    6
Name: pop, dtype: int64

Bracket notation still works. That's why this is better.

answered Oct 13 '22 06:10

cs95

Related questions
                            
                                Sync code to async, without rewriting the function
                            
                                Purpose of django.db.models.fields.Field.name argument
                            
                                Loss goes up back to starting value after re-initializing dataset
                            
                                Using a fake mongoDB for pytest testing
                            
                                redis locking: redispy vs python-redis-lock
                            
                                What is the request header by default in python requests
                            
                                Make arrow head shape symmetric regardless of the angle of the arrow in matplotlib
                            
                                Pandas dataframe raises KeyError when sort_values() method is called
                            
                                Cannot import category_encoders module
                            
                                No output, even with `py.test -s`
                            
                                Calculating percentage of number with Tensorflow
                            
                                Negation and dependency parsing with spaCy
                            
                                Wrapping homogeneous Python objects
                            
                                Encoding special characters for passing to a URL
                            
                                Multiprocessing python within frozen script
                            
                                Maximize consumption Energy
                            
                                Extract salaries from a list of strings
                            
                                Why is accuracy from fit_generator different to that from evaluate_generator in Keras?
                            
                                pylint only showing errors in VSCode
                            
                                Python coroutines: Release context manager when pausing

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Speed difference between bracket notation and dot notation for accessing columns in pandas

Tags:

performance

python

pandas

dozyaustin

People also ask

1 Answers

cs95

Recent Activity

Donate For Us