Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas to_csv: suppress scientific notation in csv file when writing pandas to csv

I am writing a pandas df to a csv. When I write it to a csv file, some of the elements in one of the columns are being incorrectly converted to scientific notation/numbers. For example, col_1 has strings such as '104D59' in it. The strings are mostly represented as strings in the csv file, as they should be. However, occasional strings, such as '104E59', are being converted into scientific notation (e.g., 1.04 E 61) and represented as integers in the ensuing csv file.

I am trying to export the csv file into a software package (i.e., pandas -> csv -> software_new) and this change in data type is causing problems with that export.

Is there a way to write the df to a csv, ensuring that all elements in df['problem_col'] are represented as string in the resulting csv or not converted to scientific notation?

Here is the code I have used to write the pandas df to a csv:

df.to_csv('df.csv', encoding='utf-8')

I also check the dtype of the problem column:

for df.dtype, df['problem_column'] is an object
like image 782
ansonw Avatar asked Apr 10 '14 18:04

ansonw


People also ask

How do you avoid Python Pandas creating an index in a saved CSV file?

pandas DataFrame to CSV with no index can be done by using index=False param of to_csv() method. With this, you can specify ignore index while writing/exporting DataFrame to CSV file.

Does Pandas to_csv overwrite?

Does pandas To_csv overwrite? If the file already exists, it will be overwritten. If no path is given, then the Frame will be serialized into a string, and that string will be returned.


2 Answers

For python 3.xx (Python 3.7.2)&

In [2]: pd.__version__ Out[2]: '0.23.4':

Options and Settings

For visualization of the dataframe pandas.set_option

import pandas as pd #import pandas package

# for visualisation fo the float data once we read the float data:

pd.set_option('display.html.table_schema', True) # to can see the dataframe/table as a html
pd.set_option('display.precision', 5) # setting up the precision point so can see the data how looks, here is 5
df = pd.DataFrame(np.random.randn(20,4)* 10 ** -12) # create random dataframe

Output of the data:

df.dtypes # check datatype for columns

[output]:
0    float64
1    float64
2    float64
3    float64
dtype: object

Dataframe:

df # output of the dataframe

[output]:
0   1   2   3
0   -2.01082e-12    1.25911e-12 1.05556e-12 -5.68623e-13
1   -6.87126e-13    1.91950e-12 5.25925e-13 3.72696e-13
2   -1.48068e-12    6.34885e-14 -1.72694e-12    1.72906e-12
3   -5.78192e-14    2.08755e-13 6.80525e-13 1.49018e-12
4   -9.52408e-13    1.61118e-13 2.09459e-13 2.10940e-13
5   -2.30242e-13    -1.41352e-13    2.32575e-12 -5.08936e-13
6   1.16233e-12 6.17744e-13 1.63237e-12 1.59142e-12
7   1.76679e-13 -1.65943e-12    2.18727e-12 -8.45242e-13
8   7.66469e-13 1.29017e-13 -1.61229e-13    -3.00188e-13
9   9.61518e-13 9.71320e-13 8.36845e-14 -6.46556e-13
10  -6.28390e-13    -1.17645e-12    -3.59564e-13    8.68497e-13
11  3.12497e-13 2.00065e-13 -1.10691e-12    -2.94455e-12
12  -1.08365e-14    5.36770e-13 1.60003e-12 9.19737e-13
13  -1.85586e-13    1.27034e-12 -1.04802e-12    -3.08296e-12
14  1.67438e-12 7.40403e-14 3.28035e-13 5.64615e-14
15  -5.31804e-13    -6.68421e-13    2.68096e-13 8.37085e-13
16  -6.25984e-13    1.81094e-13 -2.68336e-13    1.15757e-12
17  7.38247e-13 -1.76528e-12    -4.72171e-13    -3.04658e-13
18  -1.06099e-12    -1.31789e-12    -2.93676e-13    -2.40465e-13
19  1.38537e-12 9.18101e-13 5.96147e-13 -2.41401e-12

And now write to_csv using the float_format='%.15f' parameter

df.to_csv('estc.csv',sep=',', float_format='%.15f') # write with precision .15

file output:

,0,1,2,3
0,-0.000000000002011,0.000000000001259,0.000000000001056,-0.000000000000569
1,-0.000000000000687,0.000000000001919,0.000000000000526,0.000000000000373
2,-0.000000000001481,0.000000000000063,-0.000000000001727,0.000000000001729
3,-0.000000000000058,0.000000000000209,0.000000000000681,0.000000000001490
4,-0.000000000000952,0.000000000000161,0.000000000000209,0.000000000000211
5,-0.000000000000230,-0.000000000000141,0.000000000002326,-0.000000000000509
6,0.000000000001162,0.000000000000618,0.000000000001632,0.000000000001591
7,0.000000000000177,-0.000000000001659,0.000000000002187,-0.000000000000845
8,0.000000000000766,0.000000000000129,-0.000000000000161,-0.000000000000300
9,0.000000000000962,0.000000000000971,0.000000000000084,-0.000000000000647
10,-0.000000000000628,-0.000000000001176,-0.000000000000360,0.000000000000868
11,0.000000000000312,0.000000000000200,-0.000000000001107,-0.000000000002945
12,-0.000000000000011,0.000000000000537,0.000000000001600,0.000000000000920
13,-0.000000000000186,0.000000000001270,-0.000000000001048,-0.000000000003083
14,0.000000000001674,0.000000000000074,0.000000000000328,0.000000000000056
15,-0.000000000000532,-0.000000000000668,0.000000000000268,0.000000000000837
16,-0.000000000000626,0.000000000000181,-0.000000000000268,0.000000000001158
17,0.000000000000738,-0.000000000001765,-0.000000000000472,-0.000000000000305
18,-0.000000000001061,-0.000000000001318,-0.000000000000294,-0.000000000000240
19,0.000000000001385,0.000000000000918,0.000000000000596,-0.000000000002414

And now write to_csv using the float_format='%f' parameter

df.to_csv('estc.csv',sep=',', float_format='%f') # this will remove the extra zeros after the '.'

For more details check pandas.DataFrame.to_csv

like image 195
n1tk Avatar answered Oct 20 '22 07:10

n1tk


Use the float_format argument:

In [11]: df = pd.DataFrame(np.random.randn(3, 3) * 10 ** 12)

In [12]: df
Out[12]:
              0             1             2
0  1.757189e+12 -1.083016e+12  5.812695e+11
1  7.889034e+11  5.984651e+11  2.138096e+11
2 -8.291878e+11  1.034696e+12  8.640301e+08

In [13]: print(df.to_string(float_format='{:f}'.format))
                     0                     1                   2
0 1757188536437.788086 -1083016404775.687134 581269533538.170288
1  788903446803.216797   598465111695.240601 213809584103.112457
2 -829187757358.493286  1034695767987.889160    864030095.691202

Which works similarly for to_csv:

df.to_csv('df.csv', float_format='{:f}'.format, encoding='utf-8')
like image 21
Andy Hayden Avatar answered Oct 20 '22 08:10

Andy Hayden