Say df
is a pandas dataframe.
df.loc[]
only accepts names df.iloc[]
only accepts integers (actual placements)df.ix[]
accepts both names and integers:When referencing rows, df.ix[row_idx, ]
only wants to be given names. e.g.
df = pd.DataFrame({'a' : ['one', 'two', 'three','four', 'five', 'six'], '1' : np.arange(6)}) df = df.ix[2:6] print(df) 1 a 2 2 three 3 3 four 4 4 five 5 5 six df.ix[0, 'a']
throws an error, it doesn't give return 'two'.
When referencing columns, iloc is prefers integers, not names. e.g.
df.ix[2, 1]
returns 'three', not 2. (Although df.idx[2, '1']
does return 2
).
Oddly, I'd like the exact opposite functionality. Usually my column names are very meaningful, so in my code I reference them directly. But due to a lot of observation cleaning, the row names in my pandas data frames don't usually correspond to range(len(df))
.
I realize I can use:
df.iloc[0].loc['a'] # returns three
But it seems ugly! Does anyone know of a better way to do this, so that the code would look like this?
df.foo[0, 'a'] # returns three
In fact, is it possible to add on my own new method to pandas.core.frame.DataFrame
s, so e.g. df.idx(rows, cols)
is in fact df.iloc[rows].loc[cols]
?
Select Rows & Columns by Name or Index in Pandas DataFrame using [ ], loc & iloc. Indexing in Pandas means selecting rows and columns of data from a Dataframe.
Indexing in Pandas : Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection.
Construction. pandas can represent integer data with possibly missing values using arrays.
It's a late answer, but @unutbu's comment is still valid and a great solution to this problem.
To index a DataFrame with integer rows and named columns (labeled columns):
df.loc[df.index[#], 'NAME']
where #
is a valid integer index and NAME
is the name of the column.
The existing answers seem short-sighted to me.
df.loc[df.index[0], 'a']
The strategy here is to get the row label of the 0th row and then use .loc
as normal. I see two issues.
df.loc[df.index[0], 'a']
could return multiple rows..loc
is slower than .iloc
so you're sacrificing speed here.df.reset_index(drop=True).loc[0, 'a']
The strategy here is to reset the index so the row labels become 0, 1, 2, ... thus .loc[0]
gives the same result as .iloc[0]
. Still, the problem here is runtime, as .loc
is slower than .iloc
and you'll incur a cost for resetting the index.
I suggest following @Landmaster's comment:
df.iloc[0, df.columns.get_loc("a")]
Essentially, this is the same as df.iloc[0, 0]
except we get the column index dynamically using df.columns.get_loc("a")
.
To index multiple columns such as ['a', 'b', 'c']
, use:
df.iloc[0, [df.columns.get_loc(c) for c in ['a', 'b', 'c']]]
This is discussed here as part of my course on Pandas.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With