I have a df like below, (initial 2 rows are text and 1st column is date)
In [4]: df
Out[4]:
test bs dv if ir md qb sy tb
0 TESTacc a10900 a10900 a10900 IJJMKK11 a10900 a10900 a10900 a10900
1 01-Feb-2019 18.8668013 4.6021207 0.9330807 13.9766832 2.9002571 0.2824343 0.8280988 0.8587644
2 04-Feb-2019 16.187526 3.1000162 0.4145835 14.6465183 2.848472 0.2516608 0.8618771 0.218063
I need to get have this csv with 3 decimal precision Also I need to add a "Total" Column (rightmost column) I have tried the below things, but these are not proper
To add the total column I did:
ndf=df.iloc[2:,1:] #take only numerics in ndf
ndf = ndf.apply(pd.to_numeric)
ndf=ndf.round(3)
df['total']=ndf.sum(axis=1)
This is not a proper way of doing simple thing like adding a total column
So I tried
df=df.apply(pd.to_numeric,errors='ignore')
but round still wont work on df
My intent is to just add a Total column and have all numbers rounded to 3 decimals.
Additional: Once this is done I would add a last row as median row, having median for each column
sum() function return the sum of the values for the requested axis. If the input is index axis then it adds all the values in a column and repeats the same for all the columns and returns a series containing the sum of all the values in each column.
Sum all columns in a Pandas DataFrame into new column If we want to summarize all the columns, then we can simply use the DataFrame sum() method.
To select columns that are only of numeric datatype from a Pandas DataFrame, call DataFrame. select_dtypes() method and pass np. number or 'number' as argument for include parameter. The DataFrame.
Sum of a single column You can use the pandas series sum() function to get the sum of values in individual columns (which essentially are pandas series).
According to the latest pandas documentation 1.0.3 you can sum only numeric columns with the following code:
df_sum = df.sum(numeric_only = True)
This will sum all numeric columns in df
and assign it to variable df_sum
.
IIUC, you may need:
df['sum']=df.apply(lambda x: pd.to_numeric(x,errors='coerce')).sum(axis=1).round(3)
#for median: df.apply(lambda x: pd.to_numeric(x,errors='coerce')).median(axis=1).round(3)
print(df)
test bs dv if ir md \
0 TESTacc a10900 a10900 a10900 IJJMKK11 a10900
1 01-Feb-2019 18.8668013 4.6021207 0.9330807 13.9766832 2.9002571
2 04-Feb-2019 16.187526 3.1000162 0.4145835 14.6465183 2.848472
qb sy tb sum
0 a10900 a10900 a10900 0.000
1 0.2824343 0.8280988 0.8587644 43.248
2 0.2516608 0.8618771 0.218063 38.529
EDIT you can use, df.where()
to round all neumerics as :
df['sum']=df.apply(lambda x: pd.to_numeric(x,errors='coerce')).sum(axis=1)
df=(df.where(df.apply(lambda x: pd.to_numeric(x,errors='coerce')).isna(),
df.apply(lambda x: pd.to_numeric(x,errors='coerce')).round(3)))
print(df)
test bs dv if ir md qb sy \
0 TESTacc a10900 a10900 a10900 IJJMKK11 a10900 a10900 a10900
1 01-Feb-2019 18.867 4.602 0.933 13.977 2.9 0.282 0.828
2 04-Feb-2019 16.188 3.1 0.415 14.647 2.848 0.252 0.862
tb sum
0 a10900 0.000
1 0.859 86.496
2 0.218 77.057
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With