Say <code>df</code> is a pandas dataframe. <ul> <li> <code>df.loc[]</code> only accepts names </li> <li> <code>df.iloc[]</code> only accepts integers (actual placements)</li> <li> <code>df.ix[]</code> accepts both names and integers:</li> </ul> When referencing rows, <code>df.ix[row_idx, ]</code> only wants to be given names. e.g. <pre class="prettyprint"><code>df = pd.DataFrame({'a' : ['one', 'two', 'three','four', 'five', 'six'], '1' : np.arange(6)}) df = df.ix[2:6] print(df) 1 a 2 2 three 3 3 four 4 4 five 5 5 six df.ix[0, 'a'] </code></pre> throws an error, it doesn't give return 'two'. When referencing columns, iloc is prefers integers, not names. e.g. <pre class="prettyprint"><code>df.ix[2, 1] </code></pre> returns 'three', not 2. (Although <code>df.idx[2, '1']</code> does return <code>2</code>). Oddly, I'd like the exact opposite functionality. Usually my column names are very meaningful, so in my code I reference them directly. But due to a lot of observation cleaning, the row names in my pandas data frames don't usually correspond to <code>range(len(df))</code>. I realize I can use: <pre class="prettyprint"><code>df.iloc[0].loc['a'] # returns three </code></pre> But it seems ugly! Does anyone know of a better way to do this, so that the code would look like this? <pre class="prettyprint"><code>df.foo[0, 'a'] # returns three </code></pre> In fact, is it possible to add on my own new method to <code>pandas.core.frame.DataFrame</code>s, so e.g. <code>df.idx(rows, cols)</code> is in fact <code>df.iloc[rows].loc[cols]</code>?

The existing answers seem short-sighted to me. <h3>Problematic Solutions</h3> <ol> <li> <code>df.loc[df.index[0], 'a']</code> The strategy here is to get the row label of the 0th row and then use <code>.loc</code> as normal. I see two issues. <ol> <li>If df has repeated row labels, <code>df.loc[df.index[0], 'a']</code> could return multiple rows.</li> <li> <code>.loc</code> is slower than <code>.iloc</code> so you're sacrificing speed here.</li> </ol> </li> <li> <code>df.reset_index(drop=True).loc[0, 'a']</code> The strategy here is to reset the index so the row labels become 0, 1, 2, ... thus <code>.loc[0]</code> gives the same result as <code>.iloc[0]</code>. Still, the problem here is runtime, as <code>.loc</code> is slower than <code>.iloc</code> and you'll incur a cost for resetting the index. </li> </ol> <h3>Better Solution</h3> I suggest following @Landmaster's comment: <pre class="prettyprint"><code>df.iloc[0, df.columns.get_loc("a")] </code></pre> Essentially, this is the same as <code>df.iloc[0, 0]</code> except we get the column index dynamically using <code>df.columns.get_loc("a")</code>. To index multiple columns such as <code>['a', 'b', 'c']</code>, use: <pre class="prettyprint"><code>df.iloc[0, [df.columns.get_loc(c) for c in ['a', 'b', 'c']]] </code></pre> <h3>Update</h3> This is discussed here as part of my course on Pandas.

Indexing Pandas data frames: integer rows, named columns

Tags:

python

pandas

dataframe

Say df is a pandas dataframe.

df.loc[] only accepts names
df.iloc[] only accepts integers (actual placements)
df.ix[] accepts both names and integers:

When referencing rows, df.ix[row_idx, ] only wants to be given names. e.g.

df = pd.DataFrame({'a' : ['one', 'two', 'three','four', 'five', 'six'],                    '1' : np.arange(6)}) df = df.ix[2:6] print(df)     1      a 2  2  three 3  3   four 4  4   five 5  5    six  df.ix[0, 'a']

throws an error, it doesn't give return 'two'.

When referencing columns, iloc is prefers integers, not names. e.g.

df.ix[2, 1]

returns 'three', not 2. (Although df.idx[2, '1'] does return 2).

Oddly, I'd like the exact opposite functionality. Usually my column names are very meaningful, so in my code I reference them directly. But due to a lot of observation cleaning, the row names in my pandas data frames don't usually correspond to range(len(df)).

I realize I can use:

df.iloc[0].loc['a'] # returns three

But it seems ugly! Does anyone know of a better way to do this, so that the code would look like this?

df.foo[0, 'a'] # returns three

In fact, is it possible to add on my own new method to pandas.core.frame.DataFrames, so e.g. df.idx(rows, cols) is in fact df.iloc[rows].loc[cols]?

587

asked Feb 26 '15 23:02

Hillary Sanders

2 Answers

It's a late answer, but @unutbu's comment is still valid and a great solution to this problem.

To index a DataFrame with integer rows and named columns (labeled columns):

df.loc[df.index[#], 'NAME'] where # is a valid integer index and NAME is the name of the column.

128

answered Sep 19 '22 13:09

brunston

The existing answers seem short-sighted to me.

Problematic Solutions

df.loc[df.index[0], 'a']
The strategy here is to get the row label of the 0th row and then use .loc as normal. I see two issues.
1. If df has repeated row labels, df.loc[df.index[0], 'a'] could return multiple rows.
2. .loc is slower than .iloc so you're sacrificing speed here.
df.reset_index(drop=True).loc[0, 'a']
The strategy here is to reset the index so the row labels become 0, 1, 2, ... thus .loc[0] gives the same result as .iloc[0]. Still, the problem here is runtime, as .loc is slower than .iloc and you'll incur a cost for resetting the index.

Better Solution

I suggest following @Landmaster's comment:

df.iloc[0, df.columns.get_loc("a")]

Essentially, this is the same as df.iloc[0, 0] except we get the column index dynamically using df.columns.get_loc("a").

To index multiple columns such as ['a', 'b', 'c'], use:

df.iloc[0, [df.columns.get_loc(c) for c in ['a', 'b', 'c']]]

Update

This is discussed here as part of my course on Pandas.

answered Sep 21 '22 13:09

Ben

Related questions
                            
                                How to remove all duplicate items from a list [duplicate]
                            
                                Run Python Console via docker-compose on Pycharm
                            
                                Distributing Cython based extensions using LAPACK
                            
                                What does conda do when "solving environment"
                            
                                NumPy array slice using None
                            
                                Does python logging flush every log?
                            
                                Is .ix() always better than .loc() and .iloc() since it is faster and supports integer and label access?
                            
                                Why do we need to use rabbitmq
                            
                                Why are complex numbers in Python denoted with 'j' instead of 'i'?
                            
                                Association between naming classes and naming their files in python (convention?)
                            
                                Python executables: py2exe or PyInstaller?
                            
                                What's the difference between python shebangs with /usr/bin/env rather than hard-path?
                            
                                SQLite, python, unicode, and non-utf data
                            
                                When and how to use Python's RLock
                            
                                How do I import a Python script from a sibling directory?
                            
                                Multiple variables in SciPy's optimize.minimize
                            
                                Can Python be used for client side web development? [closed]
                            
                                Python equivalent to 'hold on' in Matlab
                            
                                What is a virtualenv, and why should I use one?
                            
                                one-to-many inline select with django admin

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With