<p>I am writing a pandas df to a csv. When I write it to a csv file, some of the elements in one of the columns are being incorrectly converted to scientific notation/numbers. For example, <code>col_1</code> has strings such as <code>'104D59'</code> in it. The strings are mostly represented as strings in the csv file, as they should be. However, occasional strings, such as <code>'104E59'</code>, are being converted into scientific notation (e.g., 1.04 E 61) and represented as integers in the ensuing csv file.</p> <p>I am trying to export the csv file into a software package (i.e., pandas -> csv -> software_new) and this change in data type is causing problems with that export.</p> <p>Is there a way to write the df to a csv, ensuring that all elements in <code>df['problem_col']</code> are represented as string in the resulting csv or not converted to scientific notation?</p> <p>Here is the code I have used to write the pandas df to a csv:</p> <pre class="prettyprint"><code>df.to_csv('df.csv', encoding='utf-8') </code></pre> <p>I also check the dtype of the problem column:</p> <pre class="prettyprint"><code>for df.dtype, df['problem_column'] is an object </code></pre>

<blockquote> <p>For python 3.xx (<code>Python 3.7.2</code>)&</p> <p><code>In [2]: pd.__version__</code> <code>Out[2]: '0.23.4'</code>:</p> </blockquote> <p>Options and Settings</p> <p>For visualization of the dataframe pandas.set_option</p> <pre class="prettyprint"><code>import pandas as pd #import pandas package # for visualisation fo the float data once we read the float data: pd.set_option('display.html.table_schema', True) # to can see the dataframe/table as a html pd.set_option('display.precision', 5) # setting up the precision point so can see the data how looks, here is 5 df = pd.DataFrame(np.random.randn(20,4)* 10 ** -12) # create random dataframe </code></pre> <h3>Output of the data:</h3> <pre class="prettyprint"><code>df.dtypes # check datatype for columns [output]: 0 float64 1 float64 2 float64 3 float64 dtype: object </code></pre> <h3>Dataframe:</h3> <pre class="prettyprint"><code>df # output of the dataframe [output]: 0 1 2 3 0 -2.01082e-12 1.25911e-12 1.05556e-12 -5.68623e-13 1 -6.87126e-13 1.91950e-12 5.25925e-13 3.72696e-13 2 -1.48068e-12 6.34885e-14 -1.72694e-12 1.72906e-12 3 -5.78192e-14 2.08755e-13 6.80525e-13 1.49018e-12 4 -9.52408e-13 1.61118e-13 2.09459e-13 2.10940e-13 5 -2.30242e-13 -1.41352e-13 2.32575e-12 -5.08936e-13 6 1.16233e-12 6.17744e-13 1.63237e-12 1.59142e-12 7 1.76679e-13 -1.65943e-12 2.18727e-12 -8.45242e-13 8 7.66469e-13 1.29017e-13 -1.61229e-13 -3.00188e-13 9 9.61518e-13 9.71320e-13 8.36845e-14 -6.46556e-13 10 -6.28390e-13 -1.17645e-12 -3.59564e-13 8.68497e-13 11 3.12497e-13 2.00065e-13 -1.10691e-12 -2.94455e-12 12 -1.08365e-14 5.36770e-13 1.60003e-12 9.19737e-13 13 -1.85586e-13 1.27034e-12 -1.04802e-12 -3.08296e-12 14 1.67438e-12 7.40403e-14 3.28035e-13 5.64615e-14 15 -5.31804e-13 -6.68421e-13 2.68096e-13 8.37085e-13 16 -6.25984e-13 1.81094e-13 -2.68336e-13 1.15757e-12 17 7.38247e-13 -1.76528e-12 -4.72171e-13 -3.04658e-13 18 -1.06099e-12 -1.31789e-12 -2.93676e-13 -2.40465e-13 19 1.38537e-12 9.18101e-13 5.96147e-13 -2.41401e-12 </code></pre> <h3>And now write <em>to_csv</em> using the <em>float_format='%.15f'</em> parameter</h3> <pre class="prettyprint"><code>df.to_csv('estc.csv',sep=',', float_format='%.15f') # write with precision .15 </code></pre> <h3>file output:</h3> <pre class="prettyprint"><code>,0,1,2,3 0,-0.000000000002011,0.000000000001259,0.000000000001056,-0.000000000000569 1,-0.000000000000687,0.000000000001919,0.000000000000526,0.000000000000373 2,-0.000000000001481,0.000000000000063,-0.000000000001727,0.000000000001729 3,-0.000000000000058,0.000000000000209,0.000000000000681,0.000000000001490 4,-0.000000000000952,0.000000000000161,0.000000000000209,0.000000000000211 5,-0.000000000000230,-0.000000000000141,0.000000000002326,-0.000000000000509 6,0.000000000001162,0.000000000000618,0.000000000001632,0.000000000001591 7,0.000000000000177,-0.000000000001659,0.000000000002187,-0.000000000000845 8,0.000000000000766,0.000000000000129,-0.000000000000161,-0.000000000000300 9,0.000000000000962,0.000000000000971,0.000000000000084,-0.000000000000647 10,-0.000000000000628,-0.000000000001176,-0.000000000000360,0.000000000000868 11,0.000000000000312,0.000000000000200,-0.000000000001107,-0.000000000002945 12,-0.000000000000011,0.000000000000537,0.000000000001600,0.000000000000920 13,-0.000000000000186,0.000000000001270,-0.000000000001048,-0.000000000003083 14,0.000000000001674,0.000000000000074,0.000000000000328,0.000000000000056 15,-0.000000000000532,-0.000000000000668,0.000000000000268,0.000000000000837 16,-0.000000000000626,0.000000000000181,-0.000000000000268,0.000000000001158 17,0.000000000000738,-0.000000000001765,-0.000000000000472,-0.000000000000305 18,-0.000000000001061,-0.000000000001318,-0.000000000000294,-0.000000000000240 19,0.000000000001385,0.000000000000918,0.000000000000596,-0.000000000002414 </code></pre> <h3>And now write <em>to_csv</em> using the <em>float_format='%f'</em> parameter</h3> <pre class="prettyprint"><code>df.to_csv('estc.csv',sep=',', float_format='%f') # this will remove the extra zeros after the '.' </code></pre> <p>For more details check pandas.DataFrame.to_csv</p>

pandas to_csv: suppress scientific notation in csv file when writing pandas to csv

Tags:

python

type-conversion

pandas

csv

scientific-notation

I am writing a pandas df to a csv. When I write it to a csv file, some of the elements in one of the columns are being incorrectly converted to scientific notation/numbers. For example, col_1 has strings such as '104D59' in it. The strings are mostly represented as strings in the csv file, as they should be. However, occasional strings, such as '104E59', are being converted into scientific notation (e.g., 1.04 E 61) and represented as integers in the ensuing csv file.

I am trying to export the csv file into a software package (i.e., pandas -> csv -> software_new) and this change in data type is causing problems with that export.

Is there a way to write the df to a csv, ensuring that all elements in df['problem_col'] are represented as string in the resulting csv or not converted to scientific notation?

Here is the code I have used to write the pandas df to a csv:

df.to_csv('df.csv', encoding='utf-8')

I also check the dtype of the problem column:

for df.dtype, df['problem_column'] is an object

782

asked Apr 10 '14 18:04

ansonw

2 Answers

For python 3.xx (Python 3.7.2)&

In [2]: pd.__version__ Out[2]: '0.23.4':

Options and Settings

For visualization of the dataframe pandas.set_option

import pandas as pd #import pandas package

# for visualisation fo the float data once we read the float data:

pd.set_option('display.html.table_schema', True) # to can see the dataframe/table as a html
pd.set_option('display.precision', 5) # setting up the precision point so can see the data how looks, here is 5
df = pd.DataFrame(np.random.randn(20,4)* 10 ** -12) # create random dataframe

Output of the data:

df.dtypes # check datatype for columns

[output]:
0    float64
1    float64
2    float64
3    float64
dtype: object

Dataframe:

df # output of the dataframe

[output]:
0   1   2   3
0   -2.01082e-12    1.25911e-12 1.05556e-12 -5.68623e-13
1   -6.87126e-13    1.91950e-12 5.25925e-13 3.72696e-13
2   -1.48068e-12    6.34885e-14 -1.72694e-12    1.72906e-12
3   -5.78192e-14    2.08755e-13 6.80525e-13 1.49018e-12
4   -9.52408e-13    1.61118e-13 2.09459e-13 2.10940e-13
5   -2.30242e-13    -1.41352e-13    2.32575e-12 -5.08936e-13
6   1.16233e-12 6.17744e-13 1.63237e-12 1.59142e-12
7   1.76679e-13 -1.65943e-12    2.18727e-12 -8.45242e-13
8   7.66469e-13 1.29017e-13 -1.61229e-13    -3.00188e-13
9   9.61518e-13 9.71320e-13 8.36845e-14 -6.46556e-13
10  -6.28390e-13    -1.17645e-12    -3.59564e-13    8.68497e-13
11  3.12497e-13 2.00065e-13 -1.10691e-12    -2.94455e-12
12  -1.08365e-14    5.36770e-13 1.60003e-12 9.19737e-13
13  -1.85586e-13    1.27034e-12 -1.04802e-12    -3.08296e-12
14  1.67438e-12 7.40403e-14 3.28035e-13 5.64615e-14
15  -5.31804e-13    -6.68421e-13    2.68096e-13 8.37085e-13
16  -6.25984e-13    1.81094e-13 -2.68336e-13    1.15757e-12
17  7.38247e-13 -1.76528e-12    -4.72171e-13    -3.04658e-13
18  -1.06099e-12    -1.31789e-12    -2.93676e-13    -2.40465e-13
19  1.38537e-12 9.18101e-13 5.96147e-13 -2.41401e-12

And now write to_csv using the float_format='%.15f' parameter

df.to_csv('estc.csv',sep=',', float_format='%.15f') # write with precision .15

file output:

,0,1,2,3
0,-0.000000000002011,0.000000000001259,0.000000000001056,-0.000000000000569
1,-0.000000000000687,0.000000000001919,0.000000000000526,0.000000000000373
2,-0.000000000001481,0.000000000000063,-0.000000000001727,0.000000000001729
3,-0.000000000000058,0.000000000000209,0.000000000000681,0.000000000001490
4,-0.000000000000952,0.000000000000161,0.000000000000209,0.000000000000211
5,-0.000000000000230,-0.000000000000141,0.000000000002326,-0.000000000000509
6,0.000000000001162,0.000000000000618,0.000000000001632,0.000000000001591
7,0.000000000000177,-0.000000000001659,0.000000000002187,-0.000000000000845
8,0.000000000000766,0.000000000000129,-0.000000000000161,-0.000000000000300
9,0.000000000000962,0.000000000000971,0.000000000000084,-0.000000000000647
10,-0.000000000000628,-0.000000000001176,-0.000000000000360,0.000000000000868
11,0.000000000000312,0.000000000000200,-0.000000000001107,-0.000000000002945
12,-0.000000000000011,0.000000000000537,0.000000000001600,0.000000000000920
13,-0.000000000000186,0.000000000001270,-0.000000000001048,-0.000000000003083
14,0.000000000001674,0.000000000000074,0.000000000000328,0.000000000000056
15,-0.000000000000532,-0.000000000000668,0.000000000000268,0.000000000000837
16,-0.000000000000626,0.000000000000181,-0.000000000000268,0.000000000001158
17,0.000000000000738,-0.000000000001765,-0.000000000000472,-0.000000000000305
18,-0.000000000001061,-0.000000000001318,-0.000000000000294,-0.000000000000240
19,0.000000000001385,0.000000000000918,0.000000000000596,-0.000000000002414

And now write to_csv using the float_format='%f' parameter

df.to_csv('estc.csv',sep=',', float_format='%f') # this will remove the extra zeros after the '.'

For more details check pandas.DataFrame.to_csv

195

answered Oct 20 '22 07:10

n1tk

Use the float_format argument:

In [11]: df = pd.DataFrame(np.random.randn(3, 3) * 10 ** 12)

In [12]: df
Out[12]:
              0             1             2
0  1.757189e+12 -1.083016e+12  5.812695e+11
1  7.889034e+11  5.984651e+11  2.138096e+11
2 -8.291878e+11  1.034696e+12  8.640301e+08

In [13]: print(df.to_string(float_format='{:f}'.format))
                     0                     1                   2
0 1757188536437.788086 -1083016404775.687134 581269533538.170288
1  788903446803.216797   598465111695.240601 213809584103.112457
2 -829187757358.493286  1034695767987.889160    864030095.691202

Which works similarly for to_csv:

df.to_csv('df.csv', float_format='{:f}'.format, encoding='utf-8')

answered Oct 20 '22 08:10

Andy Hayden

Related questions
                            
                                How can I open UTF-16 files on Python 2.x?
                            
                                Accessing class variables via instance
                            
                                use slugify in template
                            
                                Python multiprocessing keyword arguments
                            
                                Check if a directory exists in a zip file with Python
                            
                                Handling directories with spaces Python subprocess.call()
                            
                                Python: How to check if a string is a valid IRI?
                            
                                Understanding pandas dataframe indexing
                            
                                what does this operator means in django `reduce(operator.and_, query_list)`
                            
                                What's the most pythonic way to iterate over all the lines of multiple files?
                            
                                Python: How to check for RSS updates with feedparser and etags
                            
                                How do I fix this "TypeError: 'str' object is not callable" error?
                            
                                How does one append large amounts of data to a Pandas HDFStore and get a natural unique index?
                            
                                How do I access embedded json objects in a Pandas DataFrame?
                            
                                how to differentiate tcp/udp when programming sockets
                            
                                Raspberry pi flashing LED issue - Python vs Java
                            
                                Building list of lists from CSV file
                            
                                Python can't define tuples in a function [duplicate]
                            
                                Factory Design Pattern
                            
                                What is the L in numpy.shape and 32 in numpy.type of an array?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With