I'm having trouble finding the solution to a fairly simple problem.
I would like to alphabetically arrange certain columns of a pandas dataframe that has over 100 columns (i.e. so many that I don't want to list them manually).
Example df:
import pandas as pd
subject = [1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,4,4,4,4,4,4]
timepoint = [1,2,3,4,5,6,1,2,3,4,5,6,1,2,4,1,2,3,4,5,6]
c = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]
d = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]
a = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]
b = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]
df = pd.DataFrame({'subject':subject,
                   'timepoint':timepoint,
                   'c':c,
                   'd':d,
                   'a':a,
                   'b':b})
df.head()
   subject  timepoint  c  d  a  b
0        1          1  2  2  2  2
1        1          2  3  3  3  3
2        1          3  4  4  4  4
3        1          4  5  5  5  5
4        1          5  6  6  6  6
How could I rearrange the column names to generate a df.head() that looks like this:
   subject  timepoint  a  b  c  d
0        1          1  2  2  2  2
1        1          2  3  3  3  3
2        1          3  4  4  4  4
3        1          4  5  5  5  5
4        1          5  6  6  6  6
i.e. keep the first two columns where they are and then alphabetically arrange the remaining column names.
Thanks in advance.
You can split your your dataframe based on column names, using normal indexing operator [], sort alphabetically the other columns using sort_index(axis=1), and concat back together:
>>> pd.concat([df[['subject','timepoint']],
           df[df.columns.difference(['subject', 'timepoint'])]\
               .sort_index(axis=1)],ignore_index=False,axis=1)
    subject  timepoint  a  b  c  d
0         1          1  2  2  2  2
1         1          2  3  3  3  3
2         1          3  4  4  4  4
3         1          4  5  5  5  5
4         1          5  6  6  6  6
5         1          6  7  7  7  7
6         2          1  3  3  3  3
7         2          2  4  4  4  4
8         2          3  1  1  1  1
9         2          4  2  2  2  2
10        2          5  3  3  3  3
11        2          6  4  4  4  4
12        3          1  5  5  5  5
13        3          2  4  4  4  4
14        3          4  5  5  5  5
15        4          1  8  8  8  8
16        4          2  4  4  4  4
17        4          3  5  5  5  5
18        4          4  6  6  6  6
19        4          5  2  2  2  2
20        4          6  3  3  3  3
Specify the first two columns you want to keep (or determine them from the data), then sort all of the other columns. Use .loc with the correct list to then "sort" the DataFrame.
import numpy as np
first_cols = ['subject', 'timepoint']
#first_cols = df.columns[0:2].tolist()  # OR determine first two
other_cols = np.sort(df.columns.difference(first_cols)).tolist()
df = df.loc[:, first_cols+other_cols]
print(df.head())
   subject  timepoint  a  b  c  d
0        1          1  2  2  2  2
1        1          2  3  3  3  3
2        1          3  4  4  4  4
3        1          4  5  5  5  5
4        1          5  6  6  6  6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With