I was doing some calculations and row manipulations and realised that for some tasks such as mathematical operations they both worked e.g. <pre class="prettyprint"><code>d['c3'] = d.c1 / d. c2 d['c3'] = d['c1'] / d['c2'] </code></pre> I was wondering whether there are some instances where using one is better than the other or what most people used.

You should really just stop accessing columns as attributes and get into the habit of accessing using square brackets <code>[]</code>. This avoids errors where your column names have illegal characters in python, embedded spaces, where your column name shares the same name as a built-in method, and ambiguous usage where for instance you have a column named <code>index</code>: <pre class="prettyprint"><code>In[13]: df = pd.DataFrame(np.random.randn(5,4), columns=[' a', 'mean', 'index', '2']) df.columns.tolist() Out[13]: [' a', 'mean', 'index', '2'] </code></pre> So if we now try to access column <code>2</code>: <pre class="prettyprint"><code>In[14]: df.2 File "<ipython-input-14-0490d6ae2ca0>", line 1 df.2 ^ SyntaxError: invalid syntax </code></pre> It fails as it's an invalid name but <code>df['2']</code> would work <pre class="prettyprint"><code>In[15]: df.a --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-15-b9872a8755ac> in <module>() ----> 1 df.a C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name) 3079 if name in self._info_axis: 3080 return self[name] -> 3081 return object.__getattribute__(self, name) 3082 3083 def __setattr__(self, name, value): AttributeError: 'DataFrame' object has no attribute 'a' </code></pre> So because this is really <code>' a'</code> with a leading space (this would also fail if there were spaces anywhere in the column name) it fails on <code>KeyError</code> <pre class="prettyprint"><code>In[16]: df.mean Out[16]: <bound method DataFrame.mean of a mean index 2 0 -0.022122 1.858308 1.823314 0.238105 1 -0.461662 0.482116 1.848322 1.946922 2 0.615889 -0.285043 0.201804 -0.656065 3 0.159351 -1.151883 -1.858024 0.088460 4 1.066735 1.015585 0.586550 -1.898469> </code></pre> This is more subtle, it looks like it did something but in fact it just returns the method address, here ipython is just pretty printing it <pre class="prettyprint"><code>In[17]: df.index Out[17]: RangeIndex(start=0, stop=5, step=1) </code></pre> Above we have ambiguous intentions, because the index is a member it's returned that instead of the column <code>'index'</code>. So you should stop accessing columns as attributes and always use square brackets as it avoids all the problems above

When should I use dt.column vs dt['column'] pandas?

Tags:

python

pandas

I was doing some calculations and row manipulations and realised that for some tasks such as mathematical operations they both worked e.g.

d['c3'] = d.c1 / d. c2
d['c3'] = d['c1'] / d['c2']

I was wondering whether there are some instances where using one is better than the other or what most people used.

294

asked Jun 28 '17 08:06

Tank

Video Answer

1 Answers

You should really just stop accessing columns as attributes and get into the habit of accessing using square brackets []. This avoids errors where your column names have illegal characters in python, embedded spaces, where your column name shares the same name as a built-in method, and ambiguous usage where for instance you have a column named index:

In[13]:
df = pd.DataFrame(np.random.randn(5,4), columns=[' a', 'mean', 'index', '2'])
df.columns.tolist()

Out[13]: [' a', 'mean', 'index', '2']

So if we now try to access column 2:

In[14]:
df.2
  File "<ipython-input-14-0490d6ae2ca0>", line 1
    df.2
       ^
SyntaxError: invalid syntax

It fails as it's an invalid name but df['2'] would work

In[15]:

df.a
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-b9872a8755ac> in <module>()
----> 1 df.a

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   3079             if name in self._info_axis:
   3080                 return self[name]
-> 3081             return object.__getattribute__(self, name)
   3082 
   3083     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'a'

So because this is really ' a' with a leading space (this would also fail if there were spaces anywhere in the column name) it fails on KeyError

In[16]:
df.mean

Out[16]: 
<bound method DataFrame.mean of           a      mean     index         2
0 -0.022122  1.858308  1.823314  0.238105
1 -0.461662  0.482116  1.848322  1.946922
2  0.615889 -0.285043  0.201804 -0.656065
3  0.159351 -1.151883 -1.858024  0.088460
4  1.066735  1.015585  0.586550 -1.898469>

This is more subtle, it looks like it did something but in fact it just returns the method address, here ipython is just pretty printing it

In[17]:
df.index

Out[17]: RangeIndex(start=0, stop=5, step=1)

Above we have ambiguous intentions, because the index is a member it's returned that instead of the column 'index'.

So you should stop accessing columns as attributes and always use square brackets as it avoids all the problems above

answered Oct 16 '22 05:10

EdChum

Related questions
                            
                                How to process data before storing to database in python eve
                            
                                What is the meaning of <cycle 5> function in the output of cProfile analyzed using KchacheGrind?
                            
                                Pick up lines from a file based on line numbers in another file
                            
                                structures with functions and python ctypes
                            
                                Keras Neural Network Error: Setting an Array Element with a Sequence
                            
                                Understanding Character Level Embedding in Keras LSTM
                            
                                Least Squares method in practice
                            
                                Export seaborn heatmap to full pgf
                            
                                modify flask url before routing
                            
                                Building a DataFrame with column names in Python
                            
                                How to join wagtail and django sitemaps?
                            
                                How to plot parallel coordinates on pandas DataFrame with some columns containing strings?
                            
                                Django admin interface: using horizontal_filter with ManyToMany field with intermediate table
                            
                                pandas row values to column headers
                            
                                Plot city names for lon,lat coordinates
                            
                                how to add multiple argument options in python using argparse?
                            
                                How do I implement a Schwartzian Transform in Python?
                            
                                Understanding Keras weight matrix of each layers
                            
                                Inheritance and inner classes in Python?
                            
                                Django views does not exist or could not import

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With