I've been exploring how to optimize my code and ran across <code>pandas</code> <code>.at</code> method. Per the documentation <blockquote> Fast label-based scalar accessor Similarly to loc, at provides label based scalar lookups. You can also set using these indexers. </blockquote> So I ran some samples: <h3>Setup</h3> <pre class="prettyprint"><code>import pandas as pd import numpy as np from string import letters, lowercase, uppercase lt = list(letters) lc = list(lowercase) uc = list(uppercase) def gdf(rows, cols, seed=None): """rows and cols are what you'd pass to pd.MultiIndex.from_product()""" gmi = pd.MultiIndex.from_product df = pd.DataFrame(index=gmi(rows), columns=gmi(cols)) np.random.seed(seed) df.iloc[:, :] = np.random.rand(*df.shape) return df seed = [3, 1415] df = gdf([lc, uc], [lc, uc], seed) print df.head().T.head().T </code></pre> <code>df</code> looks like: <pre class="prettyprint"><code> a A B C D E a A 0.444939 0.407554 0.460148 0.465239 0.462691 B 0.032746 0.485650 0.503892 0.351520 0.061569 C 0.777350 0.047677 0.250667 0.602878 0.570528 D 0.927783 0.653868 0.381103 0.959544 0.033253 E 0.191985 0.304597 0.195106 0.370921 0.631576 </code></pre> Lets use <code>.at</code> and <code>.loc</code> and ensure I get the same thing <pre class="prettyprint"><code>print "using .loc", df.loc[('a', 'A'), ('c', 'C')] print "using .at ", df.at[('a', 'A'), ('c', 'C')] using .loc 0.37374090276 using .at 0.37374090276 </code></pre> Test speed using <code>.loc</code> <pre class="prettyprint"><code>%%timeit df.loc[('a', 'A'), ('c', 'C')] 10000 loops, best of 3: 180 µs per loop </code></pre> Test speed using <code>.at</code> <pre class="prettyprint"><code>%%timeit df.at[('a', 'A'), ('c', 'C')] The slowest run took 6.11 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 8 µs per loop </code></pre> This looks to be a huge speed increase. Even at the caching stage <code>6.11 * 8</code> is a lot faster than <code>180</code> <h3>Question</h3> What are the limitations of <code>.at</code>? I'm motivated to use it. The documentation says it's similar to <code>.loc</code> but it doesn't behave similarly. Example: <pre class="prettyprint"><code># small df sdf = gdf([lc[:2]], [uc[:2]], seed) print sdf.loc[:, :] A B a 0.444939 0.407554 b 0.460148 0.465239 </code></pre> where as <code>print sdf.at[:, :]</code> results in <code>TypeError: unhashable type</code> So obviously not the same even if the intent is to be similar. That said, who can provide guidance on what can and cannot be done with the <code>.at</code> method?

Update: <code>df.get_value</code> is deprecated as of version 0.21.0. Using <code>df.at</code> or <code>df.iat</code> is the recommended method going forward. <hr> <code>df.at</code> can only access a single value at a time. <code>df.loc</code> can select multiple rows and/or columns. Note that there is also <code>df.get_value</code>, which may be even quicker at accessing single values: <pre class="prettyprint"><code>In [25]: %timeit df.loc[('a', 'A'), ('c', 'C')] 10000 loops, best of 3: 187 µs per loop In [26]: %timeit df.at[('a', 'A'), ('c', 'C')] 100000 loops, best of 3: 8.33 µs per loop In [35]: %timeit df.get_value(('a', 'A'), ('c', 'C')) 100000 loops, best of 3: 3.62 µs per loop </code></pre> <hr> Under the hood, <code>df.at[...]</code> calls <code>df.get_value</code>, but it also does some type checking on the keys.

As you asked about the limitations of <code>.at</code>, here is one thing I recently ran into (using pandas 0.22). Let's use the example from the documentation: <pre class="prettyprint"><code>df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]], index=[4, 5, 6], columns=['A', 'B', 'C']) df2 = df.copy() A B C 4 0 2 3 5 0 4 1 6 10 20 30 </code></pre> If I now do <pre class="prettyprint"><code>df.at[4, 'B'] = 100 </code></pre> the result looks as expected <pre class="prettyprint"><code> A B C 4 0 100 3 5 0 4 1 6 10 20 30 </code></pre> However, when I try to do <pre class="prettyprint"><code> df.at[4, 'C'] = 10.05 </code></pre> it seems that <code>.at</code> tries to conserve the datatype (here: <code>int</code>): <pre class="prettyprint"><code> A B C 4 0 100 10 5 0 4 1 6 10 20 30 </code></pre> That seems to be a difference to <code>.loc</code>: <pre class="prettyprint"><code>df2.loc[4, 'C'] = 10.05 </code></pre> yields the desired <pre class="prettyprint"><code> A B C 4 0 2 10.05 5 0 4 1.00 6 10 20 30.00 </code></pre> The risky thing in the example above is that it happens silently (the conversion from <code>float</code> to <code>int</code>). When one tries the same with strings it will throw an error: <pre class="prettyprint"><code>df.at[5, 'A'] = 'a_string' </code></pre> <blockquote> ValueError: invalid literal for int() with base 10: 'a_string' </blockquote> It will work, however, if one uses a string on which <code>int()</code> actually works as noted by @n1k31t4 in the comments, e.g. <pre class="prettyprint"><code>df.at[5, 'A'] = '123' A B C 4 0 2 3 5 123 4 1 6 10 20 30 </code></pre>

Adding to the above, Pandas documentation for the <code>at</code> function states: <blockquote> Access a single value for a row/column label pair. Similar to loc, in that both provide label-based lookups. Use at if you only need to get or set a single value in a DataFrame or Series. </blockquote> For setting data <code>loc</code> and <code>at</code> are similar, for example: <pre class="prettyprint"><code>df = pd.DataFrame({'A': [1,2,3], 'B': [11,22,33]}, index=[0,0,1]) </code></pre> Both <code>loc</code> and <code>at</code> will produce the same result <pre class="prettyprint"><code>df.at[0, 'A'] = [101,102] df.loc[0, 'A'] = [101,102] A B 0 101 11 0 102 22 1 3 33 df.at[0, 'A'] = 103 df.loc[0, 'A'] = 103 A B 0 103 11 0 103 22 1 3 33 </code></pre> Also, for accessing a single value, both are the same <pre class="prettyprint"><code>df.loc[1, 'A'] # returns a single value (<class 'numpy.int64'>) df.at[1, 'A'] # returns a single value (<class 'numpy.int64'>) 3 </code></pre> However, when matching multiple values, <code>loc</code> will return a group of rows/cols from the DataFrame while <code>at</code> will return an array of values <pre class="prettyprint"><code>df.loc[0, 'A'] # returns a Series (<class 'pandas.core.series.Series'>) 0 103 0 103 Name: A, dtype: int64 df.at[0, 'A'] # returns array of values (<class 'numpy.ndarray'>) array([103, 103]) </code></pre> And more so, <code>loc</code> can be used to match a group of row/cols and can be given only an index, while <code>at</code> must receive the column <pre class="prettyprint"><code>df.loc[0] # returns a DataFrame view (<class 'pandas.core.frame.DataFrame'>) A B 0 103 11 0 103 22 # df.at[0] # ERROR: must receive column </code></pre>

pandas .at versus .loc

pandas .at versus .loc

Tags:

python

pandas

dataframe

Setup

Question

piRSquared

People also ask

4 Answers

unutbu

Cleb

emem

Vikranth Inti

Related questions

Recent Activity

Donate For Us