I have a dataframe like:

The above df is just a very small sample. Actually I have around 8K + columns , I want to sort my dataframe such that all the columns ending with "_t1" comes in the end.

I can definitely filter out a subset of the dataframe with code like:
data = data [data.columns[data .columns.str.endswith("_t1")]]
and then merging. Can there be any other simpler way of sorting a panda dataframe based on regex pattern of column names?
You can create masks:
m = data.columns.str.endswith("_t1")
Or mask by regex:
m = data.columns.str.contains("_t1$")
Then join together:
cols = data.columns[~m].append(data.columns[m])
Or:
cols = data.columns[~m].tolist() + data.columns[m].tolist()
And change order of columns by subset:
df = data[cols]
Another option is to use np.lexsort to sort columns then reorder by positional index; to separate columns that end with "_t1" to those that don't.
df = pd.DataFrame(columns=['abc', 'abc_t1', 'abcd', 'abcd_t1', 'xyz', 'xyz_t1'])
df
# Empty DataFrame
# Columns: [abc, abc_t1, abcd, abcd_t1, xyz, xyz_t1]
# Index: []
df.iloc[:, np.lexsort((df.columns.str.endswith('_t1'), ))]
# Alternatively,
df.iloc[:, np.argsort(df.columns.str.endswith('_t1'))]
# Empty DataFrame
# Columns: [abc, abcd, xyz, abc_t1, abcd_t1, xyz_t1]
# Index: []
If you need to handle more complicated regex suffixes then you can always extend the input to .endswith with an appropriate regex.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With