I have a df like below, (initial 2 rows are text and 1st column is date) <pre class="prettyprint"><code>In [4]: df Out[4]: test bs dv if ir md qb sy tb 0 TESTacc a10900 a10900 a10900 IJJMKK11 a10900 a10900 a10900 a10900 1 01-Feb-2019 18.8668013 4.6021207 0.9330807 13.9766832 2.9002571 0.2824343 0.8280988 0.8587644 2 04-Feb-2019 16.187526 3.1000162 0.4145835 14.6465183 2.848472 0.2516608 0.8618771 0.218063 </code></pre> I need to get have this csv with 3 decimal precision Also I need to add a "Total" Column (rightmost column) I have tried the below things, but these are not proper To add the total column I did: <pre class="prettyprint"><code>ndf=df.iloc[2:,1:] #take only numerics in ndf ndf = ndf.apply(pd.to_numeric) ndf=ndf.round(3) df['total']=ndf.sum(axis=1) </code></pre> This is not a proper way of doing simple thing like adding a total column So I tried <code>df=df.apply(pd.to_numeric,errors='ignore')</code> but round still wont work on df My intent is to just add a Total column and have all numbers rounded to 3 decimals. Additional: Once this is done I would add a last row as median row, having median for each column

According to the latest pandas documentation 1.0.3 you can sum only numeric columns with the following code: <pre class="prettyprint"><code>df_sum = df.sum(numeric_only = True) </code></pre> This will sum all numeric columns in <code>df</code> and assign it to variable <code>df_sum</code>.

IIUC, you may need: <pre class="prettyprint"><code>df['sum']=df.apply(lambda x: pd.to_numeric(x,errors='coerce')).sum(axis=1).round(3) #for median: df.apply(lambda x: pd.to_numeric(x,errors='coerce')).median(axis=1).round(3) print(df) test bs dv if ir md \ 0 TESTacc a10900 a10900 a10900 IJJMKK11 a10900 1 01-Feb-2019 18.8668013 4.6021207 0.9330807 13.9766832 2.9002571 2 04-Feb-2019 16.187526 3.1000162 0.4145835 14.6465183 2.848472 qb sy tb sum 0 a10900 a10900 a10900 0.000 1 0.2824343 0.8280988 0.8587644 43.248 2 0.2516608 0.8618771 0.218063 38.529 </code></pre> EDIT you can use, <code>df.where()</code> to round all neumerics as : <pre class="prettyprint"><code>df['sum']=df.apply(lambda x: pd.to_numeric(x,errors='coerce')).sum(axis=1) df=(df.where(df.apply(lambda x: pd.to_numeric(x,errors='coerce')).isna(), df.apply(lambda x: pd.to_numeric(x,errors='coerce')).round(3))) print(df) test bs dv if ir md qb sy \ 0 TESTacc a10900 a10900 a10900 IJJMKK11 a10900 a10900 a10900 1 01-Feb-2019 18.867 4.602 0.933 13.977 2.9 0.282 0.828 2 04-Feb-2019 16.188 3.1 0.415 14.647 2.848 0.252 0.862 tb sum 0 a10900 0.000 1 0.859 86.496 2 0.218 77.057 </code></pre>

Sum only numeric columns in pandas

Tags:

python

pandas

I have a df like below, (initial 2 rows are text and 1st column is date)

In [4]: df
Out[4]: 
           test          bs         dv         if          ir         md         qb         sy          tb
0       TESTacc      a10900     a10900     a10900    IJJMKK11     a10900     a10900     a10900      a10900
1   01-Feb-2019  18.8668013  4.6021207  0.9330807  13.9766832  2.9002571  0.2824343  0.8280988   0.8587644
2   04-Feb-2019   16.187526  3.1000162  0.4145835  14.6465183   2.848472  0.2516608  0.8618771    0.218063

I need to get have this csv with 3 decimal precision Also I need to add a "Total" Column (rightmost column) I have tried the below things, but these are not proper

To add the total column I did:

ndf=df.iloc[2:,1:] #take only numerics in ndf
ndf = ndf.apply(pd.to_numeric)
ndf=ndf.round(3)
df['total']=ndf.sum(axis=1)

This is not a proper way of doing simple thing like adding a total column

So I tried df=df.apply(pd.to_numeric,errors='ignore') but round still wont work on df My intent is to just add a Total column and have all numbers rounded to 3 decimals. Additional: Once this is done I would add a last row as median row, having median for each column

246

asked Mar 11 '19 05:03

pythonRcpp

2 Answers

According to the latest pandas documentation 1.0.3 you can sum only numeric columns with the following code:

df_sum = df.sum(numeric_only = True)

This will sum all numeric columns in df and assign it to variable df_sum.

104

answered Oct 21 '22 13:10

Decision Scientist

IIUC, you may need:

df['sum']=df.apply(lambda x: pd.to_numeric(x,errors='coerce')).sum(axis=1).round(3)
#for median: df.apply(lambda x: pd.to_numeric(x,errors='coerce')).median(axis=1).round(3)
print(df)

          test          bs         dv         if          ir         md  \
0      TESTacc      a10900     a10900     a10900    IJJMKK11     a10900   
1  01-Feb-2019  18.8668013  4.6021207  0.9330807  13.9766832  2.9002571   
2  04-Feb-2019   16.187526  3.1000162  0.4145835  14.6465183   2.848472   

          qb         sy         tb     sum  
0     a10900     a10900     a10900   0.000  
1  0.2824343  0.8280988  0.8587644  43.248  
2  0.2516608  0.8618771   0.218063  38.529

EDIT you can use, df.where() to round all neumerics as :

df['sum']=df.apply(lambda x: pd.to_numeric(x,errors='coerce')).sum(axis=1)
df=(df.where(df.apply(lambda x: pd.to_numeric(x,errors='coerce')).isna(),
         df.apply(lambda x: pd.to_numeric(x,errors='coerce')).round(3)))
print(df)

          test      bs      dv      if        ir      md      qb      sy  \
0      TESTacc  a10900  a10900  a10900  IJJMKK11  a10900  a10900  a10900   
1  01-Feb-2019  18.867   4.602   0.933    13.977     2.9   0.282   0.828   
2  04-Feb-2019  16.188     3.1   0.415    14.647   2.848   0.252   0.862   

       tb     sum  
0  a10900   0.000  
1   0.859  86.496  
2   0.218  77.057

answered Oct 21 '22 13:10

anky

Related questions
                            
                                Why opening and iterating over file handle over twice as fast in Python 2 vs Python 3?
                            
                                Reusing Tensorflow session in multiple threads causes crash
                            
                                InvalidArgumentError: input_1:0 is both fed and fetched
                            
                                Moving QSlider to Mouse Click Position
                            
                                Better method to iterate over 3 lists
                            
                                Can static variables be declared as private in python?
                            
                                compare a list with values in dictionary
                            
                                Modify seaborn line relplot legend title
                            
                                Dataflow/apache beam - how to access current filename when passing in pattern?
                            
                                Rename the less frequent categories by "OTHER" python
                            
                                Python error when building Python package Docker Image
                            
                                Percentage of array between values
                            
                                AttributeError: 'int' object has no attribute 'lower' in TFIDF and CountVectorizer
                            
                                Parallel loading of Input Files in Pandas Dataframe
                            
                                How to execute file.py on HTML button press using Django?
                            
                                sort Persian strings for python [duplicate]
                            
                                convert Dataframe to 2d Array
                            
                                More efficient method of finding minimum sum after k operations
                            
                                How To Call Postgres 11 Stored Procedure From Python
                            
                                Could not find a version that satisfies the requirement flask (from versions: ) No matching distribution found for flask

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With