I was doing some calculations and row manipulations and realised that for some tasks such as mathematical operations they both worked e.g.
d['c3'] = d.c1 / d. c2
d['c3'] = d['c1'] / d['c2']
I was wondering whether there are some instances where using one is better than the other or what most people used.
dt can be used to access the values of the series as datetimelike and return several properties. Pandas Series. dt. year attribute return a numpy array containing year of the datetime in the underlying data of the given series object.
The results show that apply massively outperforms iterrows . As mentioned previously, this is because apply is optimized for looping through dataframe rows much quicker than iterrows does. While slower than apply , itertuples is quicker than iterrows , so if looping is required, try implementing itertuples instead.
While the process takes 16.62 seconds for Pandas, Datatable is only at 6.55 seconds. Overall Datatable is 2 times faster than Pandas.
It can be thought of as a dict-like container for Series objects. This is the primary data structure of the Pandas. Pandas DataFrame. columns attribute return the column labels of the given Dataframe.
You should really just stop accessing columns as attributes and get into the habit of accessing using square brackets []
. This avoids errors where your column names have illegal characters in python, embedded spaces, where your column name shares the same name as a built-in method, and ambiguous usage where for instance you have a column named index
:
In[13]:
df = pd.DataFrame(np.random.randn(5,4), columns=[' a', 'mean', 'index', '2'])
df.columns.tolist()
Out[13]: [' a', 'mean', 'index', '2']
So if we now try to access column 2
:
In[14]:
df.2
File "<ipython-input-14-0490d6ae2ca0>", line 1
df.2
^
SyntaxError: invalid syntax
It fails as it's an invalid name but df['2']
would work
In[15]:
df.a
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-15-b9872a8755ac> in <module>()
----> 1 df.a
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
3079 if name in self._info_axis:
3080 return self[name]
-> 3081 return object.__getattribute__(self, name)
3082
3083 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'a'
So because this is really ' a'
with a leading space (this would also fail if there were spaces anywhere in the column name) it fails on KeyError
In[16]:
df.mean
Out[16]:
<bound method DataFrame.mean of a mean index 2
0 -0.022122 1.858308 1.823314 0.238105
1 -0.461662 0.482116 1.848322 1.946922
2 0.615889 -0.285043 0.201804 -0.656065
3 0.159351 -1.151883 -1.858024 0.088460
4 1.066735 1.015585 0.586550 -1.898469>
This is more subtle, it looks like it did something but in fact it just returns the method address, here ipython is just pretty printing it
In[17]:
df.index
Out[17]: RangeIndex(start=0, stop=5, step=1)
Above we have ambiguous intentions, because the index is a member it's returned that instead of the column 'index'
.
So you should stop accessing columns as attributes and always use square brackets as it avoids all the problems above
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With