Can someone explain how these two methods of slicing are different? I've seen the docs, and I've seen these answers, but I still find myself unable to understand how the three are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing. For example, say we want to get the first five rows of a <code>DataFrame</code>. How is it that these two work? <pre class="prettyprint"><code>df.loc[:5] df.iloc[:5] </code></pre> Can someone present three cases where the distinction in uses are clearer? <hr> Once upon a time, I also wanted to know how these two functions differ from <code>df.ix[:5]</code> but <code>ix</code> has been removed from pandas 1.0, so I don't care anymore.

<h3>Label vs. Location</h3> The main distinction between the two methods is: <ul> <li> <code>loc</code> gets rows (and/or columns) with particular labels. </li> <li> <code>iloc</code> gets rows (and/or columns) at integer locations. </li> </ul> To demonstrate, consider a series <code>s</code> of characters with a non-monotonic integer index: <pre class="prettyprint"><code>>>> s = pd.Series(list("abcdef"), index=[49, 48, 47, 0, 1, 2]) 49 a 48 b 47 c 0 d 1 e 2 f >>> s.loc[0] # value at index label 0 'd' >>> s.iloc[0] # value at index location 0 'a' >>> s.loc[0:1] # rows at index labels between 0 and 1 (inclusive) 0 d 1 e >>> s.iloc[0:1] # rows at index location between 0 and 1 (exclusive) 49 a </code></pre> Here are some of the differences/similarities between <code>s.loc</code> and <code>s.iloc</code> when passed various objects: <div class="s-table-container"> <table class="s-table"> <thead><tr> <th><object></th> <th>description</th> <th><code>s.loc[<object>]</code></th> <th><code>s.iloc[<object>]</code></th> </tr></thead> <tbody> <tr> <td><code>0</code></td> <td>single item</td> <td>Value at index label <code>0</code> (the string <code>'d'</code>)</td> <td>Value at index location 0 (the string <code>'a'</code>)</td> </tr> <tr> <td><code>0:1</code></td> <td>slice</td> <td> Two rows (labels <code>0</code> and <code>1</code>)</td> <td> One row (first row at location 0)</td> </tr> <tr> <td><code>1:47</code></td> <td>slice with out-of-bounds end</td> <td> Zero rows (empty Series)</td> <td> Five rows (location 1 onwards)</td> </tr> <tr> <td><code>1:47:-1</code></td> <td>slice with negative step</td> <td> three rows (labels <code>1</code> back to <code>47</code>)</td> <td> Zero rows (empty Series)</td> </tr> <tr> <td><code>[2, 0]</code></td> <td>integer list</td> <td> Two rows with given labels</td> <td> Two rows with given locations</td> </tr> <tr> <td><code>s > 'e'</code></td> <td>Bool series (indicating which values have the property)</td> <td> One row (containing <code>'f'</code>)</td> <td><code>NotImplementedError</code></td> </tr> <tr> <td><code>(s>'e').values</code></td> <td>Bool array</td> <td> One row (containing <code>'f'</code>)</td> <td>Same as <code>loc</code> </td> </tr> <tr> <td><code>999</code></td> <td>int object not in index</td> <td><code>KeyError</code></td> <td> <code>IndexError</code> (out of bounds)</td> </tr> <tr> <td><code>-1</code></td> <td>int object not in index</td> <td><code>KeyError</code></td> <td>Returns last value in <code>s</code> </td> </tr> <tr> <td><code>lambda x: x.index[3]</code></td> <td>callable applied to series (here returning 3rd item in index)</td> <td><code>s.loc[s.index[3]]</code></td> <td><code>s.iloc[s.index[3]]</code></td> </tr> </tbody> </table> </div> <code>loc</code>'s label-querying capabilities extend well-beyond integer indexes and it's worth highlighting a couple of additional examples. Here's a Series where the index contains string objects: <pre class="prettyprint"><code>>>> s2 = pd.Series(s.index, index=s.values) >>> s2 a 49 b 48 c 47 d 0 e 1 f 2 </code></pre> Since <code>loc</code> is label-based, it can fetch the first value in the Series using <code>s2.loc['a']</code>. It can also slice with non-integer objects: <pre class="prettyprint"><code>>>> s2.loc['c':'e'] # all rows lying between 'c' and 'e' (inclusive) c 47 d 0 e 1 </code></pre> For DateTime indexes, we don't need to pass the exact date/time to fetch by label. For example: <pre class="prettyprint"><code>>>> s3 = pd.Series(list('abcde'), pd.date_range('now', periods=5, freq='M')) >>> s3 2021-01-31 16:41:31.879768 a 2021-02-28 16:41:31.879768 b 2021-03-31 16:41:31.879768 c 2021-04-30 16:41:31.879768 d 2021-05-31 16:41:31.879768 e </code></pre> Then to fetch the row(s) for March/April 2021 we only need: <pre class="prettyprint"><code>>>> s3.loc['2021-03':'2021-04'] 2021-03-31 17:04:30.742316 c 2021-04-30 17:04:30.742316 d </code></pre> <h3>Rows and Columns</h3> <code>loc</code> and <code>iloc</code> work the same way with DataFrames as they do with Series. It's useful to note that both methods can address columns and rows together. When given a tuple, the first element is used to index the rows and, if it exists, the second element is used to index the columns. Consider the DataFrame defined below: <pre class="prettyprint"><code>>>> import numpy as np >>> df = pd.DataFrame(np.arange(25).reshape(5, 5), index=list('abcde'), columns=['x','y','z', 8, 9]) >>> df x y z 8 9 a 0 1 2 3 4 b 5 6 7 8 9 c 10 11 12 13 14 d 15 16 17 18 19 e 20 21 22 23 24 </code></pre> Then for example: <pre class="prettyprint"><code>>>> df.loc['c': , :'z'] # rows 'c' and onwards AND columns up to 'z' x y z c 10 11 12 d 15 16 17 e 20 21 22 >>> df.iloc[:, 3] # all rows, but only the column at index location 3 a 3 b 8 c 13 d 18 e 23 </code></pre> Sometimes we want to mix label and positional indexing methods for the rows and columns, somehow combining the capabilities of <code>loc</code> and <code>iloc</code>. For example, consider the following DataFrame. How best to slice the rows up to and including 'c' and take the first four columns? <pre class="prettyprint"><code>>>> import numpy as np >>> df = pd.DataFrame(np.arange(25).reshape(5, 5), index=list('abcde'), columns=['x','y','z', 8, 9]) >>> df x y z 8 9 a 0 1 2 3 4 b 5 6 7 8 9 c 10 11 12 13 14 d 15 16 17 18 19 e 20 21 22 23 24 </code></pre> We can achieve this result using <code>iloc</code> and the help of another method: <pre class="prettyprint"><code>>>> df.iloc[:df.index.get_loc('c') + 1, :4] x y z 8 a 0 1 2 3 b 5 6 7 8 c 10 11 12 13 </code></pre> <code>get_loc()</code> is an index method meaning "get the position of the label in this index". Note that since slicing with <code>iloc</code> is exclusive of its endpoint, we must add 1 to this value if we want row 'c' as well.

<code>iloc</code> works based on integer positioning. So no matter what your row labels are, you can always, e.g., get the first row by doing <pre class="prettyprint"><code>df.iloc[0] </code></pre> or the last five rows by doing <pre class="prettyprint"><code>df.iloc[-5:] </code></pre> You can also use it on the columns. This retrieves the 3rd column: <pre class="prettyprint"><code>df.iloc[:, 2] # the : in the first position indicates all rows </code></pre> You can combine them to get intersections of rows and columns: <pre class="prettyprint"><code>df.iloc[:3, :3] # The upper-left 3 X 3 entries (assuming df has 3+ rows and columns) </code></pre> On the other hand, <code>.loc</code> use named indices. Let's set up a data frame with strings as row and column labels: <pre class="prettyprint"><code>df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name']) </code></pre> Then we can get the first row by <pre class="prettyprint"><code>df.loc['a'] # equivalent to df.iloc[0] </code></pre> and the second two rows of the <code>'date'</code> column by <pre class="prettyprint"><code>df.loc['b':, 'date'] # equivalent to df.iloc[1:, 1] </code></pre> and so on. Now, it's probably worth pointing out that the default row and column indices for a <code>DataFrame</code> are integers from 0 and in this case <code>iloc</code> and <code>loc</code> would work in the same way. This is why your three examples are equivalent. If you had a non-numeric index such as strings or datetimes, <code>df.loc[:5]</code> would raise an error. Also, you can do column retrieval just by using the data frame's <code>__getitem__</code>: <pre class="prettyprint"><code>df['time'] # equivalent to df.loc[:, 'time'] </code></pre> Now suppose you want to mix position and named indexing, that is, indexing using names on rows and positions on columns (to clarify, I mean select from our data frame, rather than creating a data frame with strings in the row index and integers in the column index). This is where <code>.ix</code> comes in: <pre class="prettyprint"><code>df.ix[:2, 'time'] # the first two rows of the 'time' column </code></pre> I think it's also worth mentioning that you can pass boolean vectors to the <code>loc</code> method as well. For example: <pre class="prettyprint"><code> b = [True, False, True] df.loc[b] </code></pre> Will return the 1st and 3rd rows of <code>df</code>. This is equivalent to <code>df[b]</code> for selection, but it can also be used for assigning via boolean vectors: <pre class="prettyprint"><code>df.loc[b, 'name'] = 'Mary', 'John' </code></pre>

How are iloc and loc different?

Tags:

python

indexing

pandas

dataframe

Can someone explain how these two methods of slicing are different?
I've seen the docs, and I've seen these answers, but I still find myself unable to understand how the three are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing.

For example, say we want to get the first five rows of a DataFrame. How is it that these two work?

df.loc[:5] df.iloc[:5]

Can someone present three cases where the distinction in uses are clearer?

Once upon a time, I also wanted to know how these two functions differ from df.ix[:5] but ix has been removed from pandas 1.0, so I don't care anymore.

686

asked Jul 23 '15 16:07

AZhao

2 Answers

Label vs. Location

The main distinction between the two methods is:

loc gets rows (and/or columns) with particular labels.
iloc gets rows (and/or columns) at integer locations.

To demonstrate, consider a series s of characters with a non-monotonic integer index:

>>> s = pd.Series(list("abcdef"), index=[49, 48, 47, 0, 1, 2])  49    a 48    b 47    c 0     d 1     e 2     f  >>> s.loc[0]    # value at index label 0 'd'  >>> s.iloc[0]   # value at index location 0 'a'  >>> s.loc[0:1]  # rows at index labels between 0 and 1 (inclusive) 0    d 1    e  >>> s.iloc[0:1] # rows at index location between 0 and 1 (exclusive) 49    a

Here are some of the differences/similarities between s.loc and s.iloc when passed various objects:

<object>	description	`s.loc[<object>]`	`s.iloc[<object>]`
`0`	single item	Value at index label `0` (the string `'d'`)	Value at index location 0 (the string `'a'`)
`0:1`	slice	Two rows (labels `0` and `1`)	One row (first row at location 0)
`1:47`	slice with out-of-bounds end	Zero rows (empty Series)	Five rows (location 1 onwards)
`1:47:-1`	slice with negative step	three rows (labels `1` back to `47`)	Zero rows (empty Series)
`[2, 0]`	integer list	Two rows with given labels	Two rows with given locations
`s > 'e'`	Bool series (indicating which values have the property)	One row (containing `'f'`)	`NotImplementedError`
`(s>'e').values`	Bool array	One row (containing `'f'`)	Same as `loc`
`999`	int object not in index	`KeyError`	`IndexError` (out of bounds)
`-1`	int object not in index	`KeyError`	Returns last value in `s`
`lambda x: x.index[3]`	callable applied to series (here returning 3^rd item in index)	`s.loc[s.index[3]]`	`s.iloc[s.index[3]]`

loc's label-querying capabilities extend well-beyond integer indexes and it's worth highlighting a couple of additional examples.

Here's a Series where the index contains string objects:

>>> s2 = pd.Series(s.index, index=s.values) >>> s2 a    49 b    48 c    47 d     0 e     1 f     2

Since loc is label-based, it can fetch the first value in the Series using s2.loc['a']. It can also slice with non-integer objects:

>>> s2.loc['c':'e']  # all rows lying between 'c' and 'e' (inclusive) c    47 d     0 e     1

For DateTime indexes, we don't need to pass the exact date/time to fetch by label. For example:

>>> s3 = pd.Series(list('abcde'), pd.date_range('now', periods=5, freq='M'))  >>> s3 2021-01-31 16:41:31.879768    a 2021-02-28 16:41:31.879768    b 2021-03-31 16:41:31.879768    c 2021-04-30 16:41:31.879768    d 2021-05-31 16:41:31.879768    e

Then to fetch the row(s) for March/April 2021 we only need:

>>> s3.loc['2021-03':'2021-04'] 2021-03-31 17:04:30.742316    c 2021-04-30 17:04:30.742316    d

Rows and Columns

loc and iloc work the same way with DataFrames as they do with Series. It's useful to note that both methods can address columns and rows together.

When given a tuple, the first element is used to index the rows and, if it exists, the second element is used to index the columns.

Consider the DataFrame defined below:

>>> import numpy as np  >>> df = pd.DataFrame(np.arange(25).reshape(5, 5),                         index=list('abcde'),                        columns=['x','y','z', 8, 9]) >>> df     x   y   z   8   9 a   0   1   2   3   4 b   5   6   7   8   9 c  10  11  12  13  14 d  15  16  17  18  19 e  20  21  22  23  24

Then for example:

>>> df.loc['c': , :'z']  # rows 'c' and onwards AND columns up to 'z'     x   y   z c  10  11  12 d  15  16  17 e  20  21  22  >>> df.iloc[:, 3]        # all rows, but only the column at index location 3 a     3 b     8 c    13 d    18 e    23

Sometimes we want to mix label and positional indexing methods for the rows and columns, somehow combining the capabilities of loc and iloc.

For example, consider the following DataFrame. How best to slice the rows up to and including 'c' and take the first four columns?

>>> import numpy as np  >>> df = pd.DataFrame(np.arange(25).reshape(5, 5),                         index=list('abcde'),                        columns=['x','y','z', 8, 9]) >>> df     x   y   z   8   9 a   0   1   2   3   4 b   5   6   7   8   9 c  10  11  12  13  14 d  15  16  17  18  19 e  20  21  22  23  24

We can achieve this result using iloc and the help of another method:

>>> df.iloc[:df.index.get_loc('c') + 1, :4]     x   y   z   8 a   0   1   2   3 b   5   6   7   8 c  10  11  12  13

get_loc() is an index method meaning "get the position of the label in this index". Note that since slicing with iloc is exclusive of its endpoint, we must add 1 to this value if we want row 'c' as well.

115

answered Sep 27 '22 22:09

Alex Riley

iloc works based on integer positioning. So no matter what your row labels are, you can always, e.g., get the first row by doing

df.iloc[0]

or the last five rows by doing

df.iloc[-5:]

You can also use it on the columns. This retrieves the 3rd column:

df.iloc[:, 2]    # the : in the first position indicates all rows

You can combine them to get intersections of rows and columns:

df.iloc[:3, :3] # The upper-left 3 X 3 entries (assuming df has 3+ rows and columns)

On the other hand, .loc use named indices. Let's set up a data frame with strings as row and column labels:

df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name'])

Then we can get the first row by

df.loc['a']     # equivalent to df.iloc[0]

and the second two rows of the 'date' column by

df.loc['b':, 'date']   # equivalent to df.iloc[1:, 1]

and so on. Now, it's probably worth pointing out that the default row and column indices for a DataFrame are integers from 0 and in this case iloc and loc would work in the same way. This is why your three examples are equivalent. If you had a non-numeric index such as strings or datetimes, df.loc[:5] would raise an error.

Also, you can do column retrieval just by using the data frame's __getitem__:

df['time']    # equivalent to df.loc[:, 'time']

Now suppose you want to mix position and named indexing, that is, indexing using names on rows and positions on columns (to clarify, I mean select from our data frame, rather than creating a data frame with strings in the row index and integers in the column index). This is where .ix comes in:

df.ix[:2, 'time']    # the first two rows of the 'time' column

I think it's also worth mentioning that you can pass boolean vectors to the loc method as well. For example:

 b = [True, False, True]  df.loc[b]

Will return the 1st and 3rd rows of df. This is equivalent to df[b] for selection, but it can also be used for assigning via boolean vectors:

df.loc[b, 'name'] = 'Mary', 'John'

answered Sep 27 '22 22:09

JoeCondron

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How are iloc and loc different?

Tags:

python

indexing

pandas

dataframe

AZhao

People also ask

2 Answers

Label vs. Location

Rows and Columns

Alex Riley

JoeCondron

Recent Activity

Donate For Us

How are iloc and loc different?

Tags:

python

indexing

pandas

dataframe

AZhao

People also ask

2 Answers

Label vs. Location

Rows and Columns

Alex Riley

JoeCondron

Related questions

Recent Activity

Donate For Us