Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Update index after sorting data-frame

Tags:

python

pandas

People also ask

How do you change the index value of a data frame?

To change the index values we need to use the set_index method which is available in pandas allows specifying the indexes. where, inplace parameter accepts True or False, which specifies that change in index is permanent or temporary. True indicates that change is Permanent.

How do I reindex a DataFrame in Python?

One can reindex a single column or multiple columns by using reindex() method and by specifying the axis we want to reindex. Default values in the new index that are not present in the dataframe are assigned NaN.


You can reset the index using reset_index to get back a default index of 0, 1, 2, ..., n-1 (and use drop=True to indicate you want to drop the existing index instead of adding it as an additional column to your dataframe):

In [19]: df2 = df2.reset_index(drop=True)

In [20]: df2
Out[20]:
   x  y
0  0  0
1  0  1
2  0  2
3  1  0
4  1  1
5  1  2
6  2  0
7  2  1
8  2  2

Since pandas 1.0.0 df.sort_values has a new parameter ignore_index which does exactly what you need:

In [1]: df2 = df.sort_values(by=['x','y'],ignore_index=True)

In [2]: df2
Out[2]:
   x  y
0  0  0
1  0  1
2  0  2
3  1  0
4  1  1
5  1  2
6  2  0
7  2  1
8  2  2

df.sort() is deprecated, use df.sort_values(...): https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html

Then follow joris' answer by doing df.reset_index(drop=True)


You can set new indices by using set_index:

df2.set_index(np.arange(len(df2.index)))

Output:

   x  y
0  0  0
1  0  1
2  0  2
3  1  0
4  1  1
5  1  2
6  2  0
7  2  1
8  2  2

The following works!

  1. If you want to change the existing dataframe itself, you may directly use

     df.sort_values(by=['col1'], inplace=True)
     df.reset_index(drop=True, inplace=True)
    
     df
     >>     col1  col2  col3 col4
         0    A     2     0    a
         1    A     1     1    B
         2    B     9     9    c
         5    C     4     3    F
         4    D     7     2    e
         3  NaN     8     4    D
    
  2. Else, if you don't want to change the existing dataframe but want to store the sorted dataframe into another variable separately, you may use:

    df_sorted = df.sort_values(by=['col1']).reset_index(drop=True)
    
    df_sorted
    >>     col1  col2  col3 col4
        0    A     2     0    a
        1    A     1     1    B
        2    B     9     9    c
        3    C     4     3    F
        4    D     7     2    e
        5  NaN     8     4    D
    
    df
    >>       col1  col2  col3 col4
          0    A     2     0    a
          1    A     1     1    B
          2    B     9     9    c
          3  NaN     8     4    D
          4    D     7     2    e
          5    C     4     3    F