I have a csv file containing some float data. the code is simple
df = pd.read_csv(my_csv_vile)
print(df.iloc[:2,:4]
600663.XSHG 000877.XSHE 600523.XSHG 601311.XSHG
2016-01-04 09:31:00 49.40 8.05 22.79 21.80
2016-01-04 09:32:00 49.55 8.03 22.79 21.75
then I convert it to float32 to save memory usage.
short_df = df.astype(np.float32)
print(short_df.iloc[:2,:4])
600663.XSHG 000877.XSHE 600523.XSHG 601311.XSHG
2016-01-04 09:31:00 49.400002 8.05 22.790001 21.799999
2016-01-04 09:32:00 49.549999 8.03 22.790001 21.750000
the value just changed! How could I keep the data unchanged?
(I also tried short_df.round(2),but print still get the same output)
Many decimal floating point numbers can not be accurately represented with a float64 or float32. Review e.g. The Floating-Point Guide if you are unfamiliar with that issue.
Pandas defaults to displaying floating points with a precision of 6, and trailing 0s are dropped in the default output.
float64 can accurately represent the example numbers up to (and beyond) precision 6, whereas float32 can not:
>>> print("%.6f" % np.float64(49.40))
49.400000
>>> print("%.6f" % np.float32(49.40))
49.400002
If you are not interested in the precision beyond the 2nd digit when printing the df, you can set the display precision:
pd.set_option('precision', 2)
Then you get the same output even with float32s:
>>> df.astype(np.float32)
600663.XSHG 000877.XSHE 600523.XSHG 601311.XSHG
2016-01-04 09:31:00 49.40 8.05 22.79 21.80
09:32:00 49.55 8.03 22.79 21.75
If you want to drop everything beyond the 2nd digit when writing back the csv file, use float_format:
df.to_csv(file_name, float_format="%.2f")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With