I have a dataframe like this:
id type city
0 2 d H
1 7 c J
2 7 x Y
3 2 o G
4 6 i F
5 5 b E
6 6 v G
7 8 u L
8 1 g L
9 8 k U
I would like to get the similar output using pandas as in SQL command:
select id,type
from df
order by type desc
limit 4
offset 2
The required result is:
id type
0 8 u
1 2 o
2 8 k
3 6 i
I tried to follow the official tutorial https://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html#top-n-rows-with-offset
df.nlargest(4+2, columns='type').tail(4)
But, this fails.
How to solve the problem?
UPDATE
import numpy as np
import pandas as pd
import pandasql as pdsql
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
df = pd.read_csv('http://ourairports.com/data/airports.csv')
q = '''
select id,type
from df
order by type desc
limit 4
offset 2
'''
print(pysqldf(q))
```
id type
0 6525 small_airport
1 322127 small_airport
2 6527 small_airport
3 6528 small_airport
```
Using pandas:
print(df.sort_values('type', ascending=False).iloc[2:2+4][['id','type']])
id type
43740 37023 small_airport
43739 37022 small_airport
24046 308281 small_airport
24047 309587 small_airport
The long answer is the size limit for pandas DataFrames is 100 gigabytes (GB) of memory instead of a set number of cells.
Set Max Number of Rows The default number is 60. As shown, if we have a data frame with more than 60 rows, 50 rows in the middle will be truncated. If we set the option larger than the number of rows of our data frame, all the rows will be displayed.
This main difference can mean that the two tools are separate, however, you can also perform several of the same functions in each respective tool, for example, you can create new features from existing columns in pandas, perhaps easier and faster than in SQL.
Yes, integer location, where iloc starting index is the 'offset' and ending index is incremented by 'limit':
df.sort_values('type', ascending=False).iloc[2:6]
Output:
id type city
7 8 u L
3 2 o G
9 8 k U
4 6 i F
And you can add reset_index
to clean up indexing.
print(df.sort_values('type', ascending=False).iloc[2:6].reset_index(drop=True))
Output:
id type city
0 8 u L
1 2 o G
2 8 k U
3 6 i F
df.index.name = 'index'
df[['id','type']].sort_values(['type','index'], ascending=[False,True]).iloc[2:6]
Output:
index id type
0 3 6525 small_airport
1 5 322127 small_airport
2 6 6527 small_airport
3 7 6528 small_airport
You could use sort_values
with ascending=False
, and use .loc()
to slice the result (having reset the index) with the rows and columns of interest:
offset = 2
limit = 4
(df.sort_values(by='type', ascending=False).reset_index(drop=True)
.loc[offset : offset+limit-1, ['id','type']])
id type
2 8 u
3 2 o
4 8 k
5 6 i
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With