How do I suppress scientific notation output from dataframe.describe():
contrib_df["AMNT"].describe() count 1.979680e+05 mean 5.915134e+02 std 1.379618e+04 min -1.750000e+05 25% 4.000000e+01 50% 1.000000e+02 75% 2.500000e+02 max 3.000000e+06 Name: AMNT, dtype: float64
My data is of type float64:
contrib_df["AMNT"].dtypes dtype('float64')
Approach: If you are using an older version of Python, use %. nf to suppress the scientific notation to a floating-point value upto n decimal places.
Within a given f-string, you can use the {...:f} format specifier to tell Python to use floating point notation for the number preceding the :f suffix. Thus, to print the number my_float = 0.00001 non-scientifically, use the expression print(f'{my_float:f}') .
There is no direct way to configure and stop scientific notation in spark however you can apply format_number function to display number in proper decimal format rather than exponential format.
Here’s how we can use the set_option () method to suppress scientific notation in Pandas dataframe: In the code chunk above, we used Pandas dataframe method to convert a NumPy array to a dataframe. This dataframe, when printed, will show the numbers in scientific form.
Python automatically represents small floating-point values in scientific notation. By suppressing the scientific notation we are basically representing these numbers in a full decimal format which also means that it will take a lot more space to store the value.
Or to almost completely suppress scientific notation without losing precision, try this: If you would like to use the values as formated string in a list, say as part of csvfile csv.writier, the numbers can be formated before creating a list: but nothing worked for me.
Scientific notation (numbers with e) is a way of writing very large or very small numbers. A number is written in scientific notation when a number between 1 and 10 is multiplied by a power of 10.
For single column:
contrib_df["AMNT"].describe().apply(lambda x: format(x, 'f'))
For entire DataFrame (as suggested by @databyte )
df.describe().apply(lambda s: s.apply('{0:.5f}'.format))
For whole DataFrame (as suggested by @Jayen):
contrib_df.describe().apply(lambda s: s.apply(lambda x: format(x, 'g')))
As the function describe returns a data frame, what the above function does is, it simply formats each row to the regular format. I wrote this answer because I was having a though, in my mind, that was ** It's pointless to get the count of 95 as 95.00000e+01** Also in our regular format its easier to compare.
Before applying the above function we were getting
count 9.500000e+01 mean 5.621943e+05 std 2.716369e+06 min 4.770000e+02 25% 2.118160e+05 50% 2.599960e+05 75% 3.121170e+05 max 2.670423e+07 Name: salary, dtype: float64
After applying, we get
count 95.000000 mean 562194.294737 std 2716369.154553 min 477.000000 25% 211816.000000 50% 259996.000000 75% 312117.000000 max 26704229.000000 Name: salary, dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With