Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding up columns and selecting columns with largest sum

I am looking to sort a dataframe. I have this dataframe:

Y    X1  X2  X3
Y1   1   0   1
Y2   1   0   0
Y3   1   0   0
Y4   0   1   0

There are a lot of columns. I want to select the X values with the largest sum if you added down the columns.

I have been trying to do this by adding a row like so:

Y    X1  X2  X3
Y1   1   0   1
Y2   1   0   0
Y3   1   0   0
Y4   0   1   1
sum  3   1   2

and then I would sort it by the sum row

Y    X1  X3  X2
Y1   1   1   0
Y2   1   0   0
Y3   1   0   0
Y4   0   1   1
sum  3   2   1

and select 30 columns to use. However, I can only get a sum of the rows like so:

Y    X1  X3  X2  sum
Y1   1   1   0    2
Y2   1   0   0    1
Y3   1   0   0    1
Y4   0   1   1    2

using

pivot_table['sum'] = pivot_table.sum(axis=1)

I also tried

pivot_table['sum'] = pivot_table.sum(axis=0)

and attempted to add .transpose() but this isn't working. I also think there is probably a faster way to do this than the step-by-step attempt I am making.

like image 594
jenryb Avatar asked Feb 22 '26 16:02

jenryb


1 Answers

You can call sum on the df, this will return a Series, you can then sort this series and then use the index of the series to reorder your df:

In [249]:
# note that column 'X3' will produce a sum value of 2
t="""Y    X1  X2  X3
Y1   1   0   1
Y2   1   0   1
Y3   1   0   0
Y4   0   1   0"""
# load the data
df = pd.read_csv(io.StringIO(t), sep='\s+', index_col=[0])
df

Out[249]:
    X1  X2  X3
Y             
Y1   1   0   1
Y2   1   0   1
Y3   1   0   0
Y4   0   1   0

The result from sum will return a series we want to sort this and pass params inplace=False so it returns a copy and ascending=False:

In [250]:
# now calculate the sum, call sort on the series
s = df.sum().sort(ascending=False, inplace=False)
s
​
Out[250]:
X1    3
X3    2
X2    1
dtype: int64

In [251]:
# now use fancy indexing to reorder the df
df.ix[:,s.index]

Out[251]:
    X1  X3  X2
Y             
Y1   1   1   0
Y2   1   1   0
Y3   1   0   0
Y4   0   0   1

You can slice the index if you want just the top n columns:

In [254]:
df = df[s.index[:2]]
df

Out[254]:
    X1  X3
Y         
Y1   1   1
Y2   1   1
Y3   1   0
Y4   0   0
like image 196
EdChum Avatar answered Feb 25 '26 08:02

EdChum



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!