Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sum only numeric columns in pandas

Tags:

python

pandas

I have a df like below, (initial 2 rows are text and 1st column is date)

In [4]: df
Out[4]: 
           test          bs         dv         if          ir         md         qb         sy          tb
0       TESTacc      a10900     a10900     a10900    IJJMKK11     a10900     a10900     a10900      a10900
1   01-Feb-2019  18.8668013  4.6021207  0.9330807  13.9766832  2.9002571  0.2824343  0.8280988   0.8587644
2   04-Feb-2019   16.187526  3.1000162  0.4145835  14.6465183   2.848472  0.2516608  0.8618771    0.218063

I need to get have this csv with 3 decimal precision Also I need to add a "Total" Column (rightmost column) I have tried the below things, but these are not proper

To add the total column I did:

ndf=df.iloc[2:,1:] #take only numerics in ndf
ndf = ndf.apply(pd.to_numeric)
ndf=ndf.round(3)
df['total']=ndf.sum(axis=1)

This is not a proper way of doing simple thing like adding a total column

So I tried df=df.apply(pd.to_numeric,errors='ignore') but round still wont work on df My intent is to just add a Total column and have all numbers rounded to 3 decimals. Additional: Once this is done I would add a last row as median row, having median for each column

like image 246
pythonRcpp Avatar asked Mar 11 '19 05:03

pythonRcpp


People also ask

How do you sum all numeric columns in pandas?

sum() function return the sum of the values for the requested axis. If the input is index axis then it adds all the values in a column and repeats the same for all the columns and returns a series containing the sum of all the values in each column.

How do I sum multiple columns in pandas DataFrame?

Sum all columns in a Pandas DataFrame into new column If we want to summarize all the columns, then we can simply use the DataFrame sum() method.

How do I get numeric columns in pandas?

To select columns that are only of numeric datatype from a Pandas DataFrame, call DataFrame. select_dtypes() method and pass np. number or 'number' as argument for include parameter. The DataFrame.

How do I sum only one column in pandas?

Sum of a single column You can use the pandas series sum() function to get the sum of values in individual columns (which essentially are pandas series).


2 Answers

According to the latest pandas documentation 1.0.3 you can sum only numeric columns with the following code:

df_sum = df.sum(numeric_only = True)

This will sum all numeric columns in df and assign it to variable df_sum.

like image 104
Decision Scientist Avatar answered Oct 21 '22 13:10

Decision Scientist


IIUC, you may need:

df['sum']=df.apply(lambda x: pd.to_numeric(x,errors='coerce')).sum(axis=1).round(3)
#for median: df.apply(lambda x: pd.to_numeric(x,errors='coerce')).median(axis=1).round(3)
print(df)

          test          bs         dv         if          ir         md  \
0      TESTacc      a10900     a10900     a10900    IJJMKK11     a10900   
1  01-Feb-2019  18.8668013  4.6021207  0.9330807  13.9766832  2.9002571   
2  04-Feb-2019   16.187526  3.1000162  0.4145835  14.6465183   2.848472   

          qb         sy         tb     sum  
0     a10900     a10900     a10900   0.000  
1  0.2824343  0.8280988  0.8587644  43.248  
2  0.2516608  0.8618771   0.218063  38.529  

EDIT you can use, df.where() to round all neumerics as :

df['sum']=df.apply(lambda x: pd.to_numeric(x,errors='coerce')).sum(axis=1)
df=(df.where(df.apply(lambda x: pd.to_numeric(x,errors='coerce')).isna(),
         df.apply(lambda x: pd.to_numeric(x,errors='coerce')).round(3)))
print(df)

          test      bs      dv      if        ir      md      qb      sy  \
0      TESTacc  a10900  a10900  a10900  IJJMKK11  a10900  a10900  a10900   
1  01-Feb-2019  18.867   4.602   0.933    13.977     2.9   0.282   0.828   
2  04-Feb-2019  16.188     3.1   0.415    14.647   2.848   0.252   0.862   

       tb     sum  
0  a10900   0.000  
1   0.859  86.496  
2   0.218  77.057  
like image 2
anky Avatar answered Oct 21 '22 13:10

anky