Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When using Pandas DataFrame.sort(), can I make it actually renumber the rows?

Tags:

python

pandas

I am always surprised by this:

> data = DataFrame({'x':[1, 2], 'y':[2, 1]})
> data = data.sort('y')
> data
   x  y
1  2  1
0  1  2

> data['x'][0]
1

Is there a way I can cause the indices to be reassigned to fit the new ordering?

like image 378
Owen Avatar asked Jul 13 '13 15:07

Owen


People also ask

How do you reset the index of a DataFrame after sorting?

Create a DataFrame. Drop some rows from the DataFrame using the drop() method. Reset the index of the DataFrame using the reset_index() method.

Is Pandas sort stable?

Pandas ensures that sorting by multiple columns uses NumPy's mergesort . Mergesort in NumPy actually uses Timsort or Radix sort algorithms. These are stable sorting algorithms and stable sorting is necessary when sorting by multiple columns.

Does Pandas preserve row order?

Answer. Yes, by default, concatenating dataframes will preserve their row order.


1 Answers

For my part, I'm glad that sort doesn't throw away the index information. If it did, there wouldn't be much point to having an index in the first place, as opposed to another column.

If you want to reset the index to a range, you could:

>>> data
   x  y
1  2  1
0  1  2
>>> data.reset_index(drop=True)
   x  y
0  2  1
1  1  2

Where you could reassign or use inplace=True as you liked. If instead the real issue is that you want to access by position independent of index, you could use iloc:

>>> data['x']
1    2
0    1
Name: x, dtype: int64
>>> data['x'][0]
1
>>> data['x'].iloc[0]
2
like image 175
DSM Avatar answered Sep 29 '22 19:09

DSM