Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check whether non-index column sorted in Pandas

Tags:

python

pandas

Is there a way to test whether a dataframe is sorted by a given column that's not an index (i.e. is there an equivalent to is_monotonic() for non-index columns) without calling a sort all over again, and without converting a column into an index?

like image 251
nick_eu Avatar asked Feb 09 '15 21:02

nick_eu


People also ask

How do you check if a column is sorted Pandas?

To check if the index of a DataFrame is sorted in ascending order use the is_monotonic_increasing property. Similarly, to check for descending order use the is_monotonic_decreasing property.

Is Pandas DataFrame sorted?

Sort DataFrame in Pandas based on Multiple ColumnsThe DataFrame is first sorted by the column weight and then by height. Order Matters! See, how the results are different when you use different orders of the columns! Furthermore, you can also sort by multiple columns in different orders.

How do you sort DataFrame based on column values?

To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order. It does not modify the original DataFrame.

Is DataFrame sorted by index?

To sort a Pandas DataFrame by index, you can use DataFrame. sort_index() method. To specify whether the method has to sort the DataFrame in ascending or descending order of index, you can set the named boolean argument ascending to True or False respectively. When the index is sorted, respective rows are rearranged.


2 Answers

Meanwhile, since 0.19.0, there is pandas.Series.is_monotonic_increasing, pandas.Series.is_monotonic_decreasing, and pandas.Series.is_monotonic.

like image 127
Konstantin Avatar answered Sep 17 '22 12:09

Konstantin


There are a handful of functions in pd.algos which might be of use. They're all undocumented implementation details, so they might change from release to release:

>>> pd.algos.is[TAB] pd.algos.is_lexsorted          pd.algos.is_monotonic_float64  pd.algos.is_monotonic_object pd.algos.is_monotonic_bool     pd.algos.is_monotonic_int32 pd.algos.is_monotonic_float32  pd.algos.is_monotonic_int64     

The is_monotonic_* functions take an array of the specified dtype and a "timelike" boolean that should be False for most use cases. (Pandas sets it to True for a case involving times represented as integers.) The return value is a tuple whose first element represents whether the array is monotonically non-decreasing, and whose second element represents whether the array is monotonically non-increasing. Other tuple elements are version-dependent:

>>> df = pd.DataFrame({"A": [1,2,2], "B": [2,3,1]}) >>> pd.algos.is_monotonic_int64(df.A.values, False)[0] True >>> pd.algos.is_monotonic_int64(df.B.values, False)[0] False 

All these functions assume a specific input dtype, even is_lexsorted, which assumes the input is a list of int64 arrays. Pass it the wrong dtype, and it gets really confused:

In [32]: pandas.algos.is_lexsorted([np.array([-2, -1], dtype=np.int64)]) Out[32]: True In [33]: pandas.algos.is_lexsorted([np.array([-2, -1], dtype=float)]) Out[33]: False In [34]: pandas.algos.is_lexsorted([np.array([-1, -2, 0], dtype=float)]) Out[34]: True 

I'm not entirely sure why Series don't already have some kind of short-circuiting is_sorted. There might be something which makes it trickier than it seems.

like image 36
DSM Avatar answered Sep 19 '22 12:09

DSM