Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Equivalent of LIMIT and OFFSET of SQL in pandas?

Tags:

python

pandas

I have a dataframe like this:

   id type city
0   2    d    H
1   7    c    J
2   7    x    Y
3   2    o    G
4   6    i    F
5   5    b    E
6   6    v    G
7   8    u    L
8   1    g    L
9   8    k    U

I would like to get the similar output using pandas as in SQL command:

select id,type
from df
order by type desc
limit 4
offset 2

The required result is:

   id type
0   8    u
1   2    o
2   8    k
3   6    i

I tried to follow the official tutorial https://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html#top-n-rows-with-offset

df.nlargest(4+2, columns='type').tail(4)

But, this fails.

How to solve the problem?

UPDATE

import numpy as np
import pandas as pd
import pandasql as pdsql
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
df = pd.read_csv('http://ourairports.com/data/airports.csv')


q = '''
select id,type
from df
order by type desc
limit 4
offset 2
'''

print(pysqldf(q))

```
       id           type
0    6525  small_airport
1  322127  small_airport
2    6527  small_airport
3    6528  small_airport
```

Using pandas:

print(df.sort_values('type', ascending=False).iloc[2:2+4][['id','type']])
           id           type
43740   37023  small_airport
43739   37022  small_airport
24046  308281  small_airport
24047  309587  small_airport
like image 282
BhishanPoudel Avatar asked Dec 26 '18 16:12

BhishanPoudel


People also ask

Is there a limit to pandas Dataframe?

The long answer is the size limit for pandas DataFrames is 100 gigabytes (GB) of memory instead of a set number of cells.

Do pandas have a limit rows?

Set Max Number of Rows The default number is 60. As shown, if we have a data frame with more than 60 rows, 50 rows in the middle will be truncated. If we set the option larger than the number of rows of our data frame, all the rows will be displayed.

Is Panda faster than SQL?

This main difference can mean that the two tools are separate, however, you can also perform several of the same functions in each respective tool, for example, you can create new features from existing columns in pandas, perhaps easier and faster than in SQL.


2 Answers

Yes, integer location, where iloc starting index is the 'offset' and ending index is incremented by 'limit':

df.sort_values('type', ascending=False).iloc[2:6]

Output:

   id type city
7   8    u    L
3   2    o    G
9   8    k    U
4   6    i    F

And you can add reset_index to clean up indexing.

print(df.sort_values('type', ascending=False).iloc[2:6].reset_index(drop=True))

Output:

   id type city
0   8    u    L
1   2    o    G
2   8    k    U
3   6    i    F

Update let's sort by type and index:

df.index.name = 'index'
df[['id','type']].sort_values(['type','index'], ascending=[False,True]).iloc[2:6]

Output:

   index      id           type
0      3    6525  small_airport
1      5  322127  small_airport
2      6    6527  small_airport
3      7    6528  small_airport
like image 138
Scott Boston Avatar answered Sep 22 '22 18:09

Scott Boston


You could use sort_values with ascending=False, and use .loc() to slice the result (having reset the index) with the rows and columns of interest:

offset = 2
limit = 4
(df.sort_values(by='type', ascending=False).reset_index(drop=True)
               .loc[offset : offset+limit-1, ['id','type']])

   id type
2   8    u
3   2    o
4   8    k
5   6    i
like image 42
yatu Avatar answered Sep 20 '22 18:09

yatu