I have a dataframe like this: <pre class="prettyprint"><code> id type city 0 2 d H 1 7 c J 2 7 x Y 3 2 o G 4 6 i F 5 5 b E 6 6 v G 7 8 u L 8 1 g L 9 8 k U </code></pre> I would like to get the similar output using pandas as in SQL command: <pre class="prettyprint"><code>select id,type from df order by type desc limit 4 offset 2 </code></pre> The required result is: <pre class="prettyprint"><code> id type 0 8 u 1 2 o 2 8 k 3 6 i </code></pre> I tried to follow the official tutorial https://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html#top-n-rows-with-offset <pre class="prettyprint"><code>df.nlargest(4+2, columns='type').tail(4) </code></pre> But, this fails. How to solve the problem? UPDATE <pre class="prettyprint"><code>import numpy as np import pandas as pd import pandasql as pdsql from pandasql import sqldf pysqldf = lambda q: sqldf(q, globals()) df = pd.read_csv('http://ourairports.com/data/airports.csv') q = ''' select id,type from df order by type desc limit 4 offset 2 ''' print(pysqldf(q)) ``` id type 0 6525 small_airport 1 322127 small_airport 2 6527 small_airport 3 6528 small_airport ``` </code></pre> Using pandas: <pre class="prettyprint"><code>print(df.sort_values('type', ascending=False).iloc[2:2+4][['id','type']]) id type 43740 37023 small_airport 43739 37022 small_airport 24046 308281 small_airport 24047 309587 small_airport </code></pre>

Yes, integer location, where iloc starting index is the 'offset' and ending index is incremented by 'limit': <pre class="prettyprint"><code>df.sort_values('type', ascending=False).iloc[2:6] </code></pre> Output: <pre class="prettyprint"><code> id type city 7 8 u L 3 2 o G 9 8 k U 4 6 i F </code></pre> And you can add <code>reset_index</code> to clean up indexing. <pre class="prettyprint"><code>print(df.sort_values('type', ascending=False).iloc[2:6].reset_index(drop=True)) </code></pre> Output: <pre class="prettyprint"><code> id type city 0 8 u L 1 2 o G 2 8 k U 3 6 i F </code></pre> <h3>Update let's sort by type and index:</h3> <pre class="prettyprint"><code>df.index.name = 'index' df[['id','type']].sort_values(['type','index'], ascending=[False,True]).iloc[2:6] </code></pre> Output: <pre class="prettyprint"><code> index id type 0 3 6525 small_airport 1 5 322127 small_airport 2 6 6527 small_airport 3 7 6528 small_airport </code></pre>

You could use <code>sort_values</code> with <code>ascending=False</code>, and use <code>.loc()</code> to slice the result (having reset the index) with the rows and columns of interest: <pre class="prettyprint"><code>offset = 2 limit = 4 (df.sort_values(by='type', ascending=False).reset_index(drop=True) .loc[offset : offset+limit-1, ['id','type']]) id type 2 8 u 3 2 o 4 8 k 5 6 i </code></pre>

Equivalent of LIMIT and OFFSET of SQL in pandas?

Tags:

python

pandas

I have a dataframe like this:

   id type city
0   2    d    H
1   7    c    J
2   7    x    Y
3   2    o    G
4   6    i    F
5   5    b    E
6   6    v    G
7   8    u    L
8   1    g    L
9   8    k    U

I would like to get the similar output using pandas as in SQL command:

select id,type
from df
order by type desc
limit 4
offset 2

The required result is:

I tried to follow the official tutorial https://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html#top-n-rows-with-offset

df.nlargest(4+2, columns='type').tail(4)

But, this fails.

How to solve the problem?

UPDATE

import numpy as np
import pandas as pd
import pandasql as pdsql
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
df = pd.read_csv('http://ourairports.com/data/airports.csv')


q = '''
select id,type
from df
order by type desc
limit 4
offset 2
'''

print(pysqldf(q))

```
       id           type
0    6525  small_airport
1  322127  small_airport
2    6527  small_airport
3    6528  small_airport
```

Using pandas:

print(df.sort_values('type', ascending=False).iloc[2:2+4][['id','type']])
           id           type
43740   37023  small_airport
43739   37022  small_airport
24046  308281  small_airport
24047  309587  small_airport

282

asked Dec 26 '18 16:12

BhishanPoudel

2 Answers

Yes, integer location, where iloc starting index is the 'offset' and ending index is incremented by 'limit':

df.sort_values('type', ascending=False).iloc[2:6]

Output:

   id type city
7   8    u    L
3   2    o    G
9   8    k    U
4   6    i    F

And you can add reset_index to clean up indexing.

print(df.sort_values('type', ascending=False).iloc[2:6].reset_index(drop=True))

Output:

   id type city
0   8    u    L
1   2    o    G
2   8    k    U
3   6    i    F

Update let's sort by type and index:

df.index.name = 'index'
df[['id','type']].sort_values(['type','index'], ascending=[False,True]).iloc[2:6]

Output:

   index      id           type
0      3    6525  small_airport
1      5  322127  small_airport
2      6    6527  small_airport
3      7    6528  small_airport

138

answered Sep 22 '22 18:09

Scott Boston

You could use sort_values with ascending=False, and use .loc() to slice the result (having reset the index) with the rows and columns of interest:

offset = 2
limit = 4
(df.sort_values(by='type', ascending=False).reset_index(drop=True)
               .loc[offset : offset+limit-1, ['id','type']])

   id type
2   8    u
3   2    o
4   8    k
5   6    i

answered Sep 20 '22 18:09

yatu

Related questions
                            
                                Tensorflow to Keras: import graph def error on Keras model
                            
                                No module named 'automl' when unpickle auto-trained model
                            
                                Speedily calculate base 3 value of real huge integer number with Python 3
                            
                                How to add recurrent dropout to CuDNNGRU or CuDNNLSTM in Keras
                            
                                Use anaconda in pycharm (Import libraries error, updating anaconda and virtual environment)
                            
                                numpy.array vs img_to_array
                            
                                What is the difference bettween numpy.mod() and numpy.remainder()?
                            
                                How to install pjsua2 packages for python?
                            
                                How to add text to an image segment
                            
                                seaborn heatmap from pandas dataframe with NaNs
                            
                                Extending a class in Python inside a decorator
                            
                                Plotting Pandas DataFrame from pivot
                            
                                Can all __future__ statements be removed from python code, without affecting its functionality using python 3.7.1?
                            
                                Make patches bigger used as legend inside matplotlib
                            
                                How to filter a pandas DataFrame according to a list of tuples?
                            
                                Detect changes to a nested dictionary with Python
                            
                                Why do I need to shuffle my PCollection for it to autoscale on Cloud Dataflow?
                            
                                how to get rid of spaces between variables and strings when printed
                            
                                Errno 13 Permission denied when running virtualenv
                            
                                How to show labels in Seaborn plots (No handles with labels found to put in legend.)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Equivalent of LIMIT and OFFSET of SQL in pandas?

Tags:

python

pandas

BhishanPoudel

People also ask

2 Answers

Update let's sort by type and index:

Scott Boston

yatu

Recent Activity

Donate For Us