SQL-like window functions in PANDAS: Row Numbering in Python Pandas Dataframe

People also ask

How do I reference row numbers in pandas?

Get Number of Rows in DataFrame You can use len(df. index) to find the number of rows in pandas DataFrame, df. index returns RangeIndex(start=0, stop=8, step=1) and use it on len() to get the count.

Which function is used to get the number of rows in a DataFrame?

len() method is used to get the number of rows and number of columns individually.

How do I add a row number to a DataFrame in pandas?

Use concat() to Add a Row at Top of DataFrame Use pd. concat([new_row,df. loc[:]]). reset_index(drop=True) to add the row to the first position of the DataFrame as Index starts from zero.

What is window function in pandas?

Advertisements. For working on numerical data, Pandas provide few variants like rolling, expanding and exponentially moving weights for window statistics. Among these are sum, mean, median, variance, covariance, correlation, etc.

you can also use sort_values(), groupby() and finally cumcount() + 1:

Click to copy

df['RN'] = df.sort_values(['data1','data2'], ascending=[True,False]) \
             .groupby(['key1']) \
             .cumcount() + 1
print(df)

yields:

Click to copy

   data1  data2 key1  RN
0      1      1    a   1
1      2     10    a   2
2      2      2    a   3
3      3      3    b   1
4      3     30    a   4

PS tested with pandas 0.18

You can do this by using groupby twice along with the rank method:

Click to copy

In [11]: g = df.groupby('key1')

Use the min method argument to give values which share the same data1 the same RN:

Click to copy

In [12]: g['data1'].rank(method='min')
Out[12]:
0    1
1    2
2    2
3    1
4    4
dtype: float64

In [13]: df['RN'] = g['data1'].rank(method='min')

And then groupby these results and add the rank with respect to data2:

Click to copy

In [14]: g1 = df.groupby(['key1', 'RN'])

In [15]: g1['data2'].rank(ascending=False) - 1
Out[15]:
0    0
1    0
2    1
3    0
4    0
dtype: float64

In [16]: df['RN'] += g1['data2'].rank(ascending=False) - 1

In [17]: df
Out[17]:
   data1  data2 key1  RN
0      1      1    a   1
1      2     10    a   2
2      2      2    a   3
3      3      3    b   1
4      3     30    a   4

It feels like there ought to be a native way to do this (there may well be!...).

Use groupby.rank function. Here the working example.

Click to copy

df = pd.DataFrame({'C1':['a', 'a', 'a', 'b', 'b'], 'C2': [1, 2, 3, 4, 5]})
df

C1 C2
a  1
a  2
a  3
b  4
b  5

df["RANK"] = df.groupby("C1")["C2"].rank(method="first", ascending=True)
df

C1 C2 RANK
a  1  1
a  2  2
a  3  3
b  4  1
b  5  2

You can use transform and Rank together Here is an example

Click to copy

df = pd.DataFrame({'C1' : ['a','a','a','b','b'],
           'C2' : [1,2,3,4,5]})
df['Rank'] = df.groupby(by=['C1'])['C2'].transform(lambda x: x.rank())
df

enter image description here

Have a look at Pandas Rank method for more information

Related questions
                            
                                How do I plot Shapely polygons and objects using Matplotlib?
                            
                                how to dynamically create an instance of a class in python?
                            
                                how to query seed used by random.random()?
                            
                                How to find recursively for a tag of XML using LXML?
                            
                                How to edit and save text files (.py) in Google Colab?
                            
                                What is the difference between rb and r+b modes in file objects
                            
                                What is the meaning of curly braces? [closed]
                            
                                matplotlib colorbar in each subplot
                            
                                OpenCV/Python: read specific frame using VideoCapture
                            
                                pandas: how to run a pivot with a multi-index?
                            
                                1D numpy concatenate: TypeError: only integer scalar arrays can be converted to a scalar index [duplicate]
                            
                                How can I fake request.POST and GET params for unit testing in Flask?
                            
                                Get total number of hours from a Pandas Timedelta?
                            
                                Displaying a webcam feed using OpenCV and Python
                            
                                Python requests exception handling
                            
                                Conditionally fill column values based on another columns value in pandas
                            
                                Normalize numpy array columns in python
                            
                                find length of sequences of identical values in a numpy array (run length encoding)
                            
                                how to use tempfile.NamedTemporaryFile() in python
                            
                                Should I Return None or (None, None)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SQL-like window functions in PANDAS: Row Numbering in Python Pandas Dataframe

Tags:

python

pandas

dataframe

numpy

People also ask

Recent Activity

Donate For Us