Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I sort within partitions defined by one column but leave the partitions where they are?

Consider the dataframe df

df = pd.DataFrame(dict(
        A=list('XXYYXXYY'),
        B=range(8, 0, -1)
    ))

print(df)

   A  B
0  X  8
1  X  7
2  Y  6
3  Y  5
4  X  4
5  X  3
6  Y  2
7  Y  1

With the 'X' group defined by column 'A', I want to sort [8, 7, 4, 3] to the expected [3, 4, 7, 8]. However, I want to leave those rows where they are.

   A  B
5  X  3  <-- Notice all X are in same positions
4  X  4  <-- However, `[3, 4, 7, 8]` have shifted
7  Y  1
6  Y  2
1  X  7  <-- 
0  X  8  <-- 
3  Y  5
2  Y  6
like image 240
piRSquared Avatar asked Apr 04 '17 00:04

piRSquared


2 Answers

You can use transform to get back your new desired index order, then use reindex to reorder your DataFrame:

# Use transform to return the new ordered index values.
new_idx = df.groupby('A')['B'].transform(lambda grp: grp.sort_values().index)

# Reindex.
df = df.reindex(new_idx.rename(None))

You could combine the two lines above into one long line, if so desired.

The resulting output:

   A  B
5  X  3
4  X  4
7  Y  1
6  Y  2
1  X  7
0  X  8
3  Y  5
2  Y  6

Note that if you don't care about maintaing your old index, you can directly reassign from the transform:

df['B'] = df.groupby('A')['B'].transform(lambda grp: grp.sort_values())

Which yields:

   A  B
0  X  3
1  X  4
2  Y  1
3  Y  2
4  X  7
5  X  8
6  Y  5
7  Y  6
like image 89
root Avatar answered Oct 19 '22 00:10

root


The only way I figured how to solve this efficiently was to sort twice and unwind once.

v = df.values

# argsort just first column with kind='mergesort' to preserve subgroup order
a1 = v[:, 0].argsort(kind='mergesort')

# Fill in an un-sort array to unwind the `a1` argsort
a_ = np.empty_like(a1)
a_[a1] = np.arange(len(a1))

# argsort by both columns... not exactly what I want, yet.
a2 = np.lexsort(v.T[::-1])

# Sort with `a2` then unwind the first layer with `a_`
pd.DataFrame(v[a2][a_], df.index[a2][a_], df.columns)

   A  B
5  X  3
4  X  4
7  Y  1
6  Y  2
1  X  7
0  X  8
3  Y  5
2  Y  6

Testing

Code

def np_intra_sort(df):
    v = df.values
    a1 = v[:, 0].argsort(kind='mergesort')
    a_ = np.empty_like(a1)
    a_[a1] = np.arange(len(a1))
    a2 = np.lexsort(v.T[::-1])
    return pd.DataFrame(v[a2][a_], df.index[a2][a_], df.columns)

def pd_intra_sort(df):

    def sub_sort(x):
        return x.sort_values().index

    idx = df.groupby('A').B.transform(sub_sort).values

    return df.reindex(idx)

Small data

Enter image description here

Large data

df = pd.DataFrame(dict(
        A=list('XXYYXXYY') * 10000,
        B=range(8 * 10000, 0, -1)
    ))

Enter image description here

like image 42
piRSquared Avatar answered Oct 19 '22 00:10

piRSquared