I have a dataframe of floats and I need make a function that will take a column and round all the values to N number of significant figures
So the column might look something like:
123.949
23.87
1.9865
0.0129500
and if I wanted to round to 3 significant figures I would pass the column and 3 to the function to get this
124.0
23.9
1.99
0.013
How can I do this efficiently without looping through the column?
I have an equation that will calculate the significant figures for a number
round(x, N-int(floor(log10(abs(x))))
but it doesn't work on a series or dataframe
Rounding-off rules If the first non-significant digit is less than 5, then the least significant digit remains unchanged. If the first non-significant digit is greater than 5, the least significant digit is incremented by 1.
You can use, pandas.Series.apply
which implements a function element wise across an axis (column or row):
df.col.apply(lambda x: round(x, N - int(floor(log10(abs(x))))))
Note that you can't really use pandas.DataFrame.apply
here since the round function should be element wise, not on the entire axis.
The difference would be that your function input is float
instead of getting an array
.
Another option would be applymap
which implements a function element-wise on the entire pandas.DataFrame
.
df.applymap(lambda x: round(x, N - int(floor(log10(abs(x))))))
Here is another take at applying your custom function over the series in a dataframe. However, the in-built round() seems to round fractional parts down when the last digit is 5, so in your example you'd actually get 0.0129 instead of 0.013. I tried to remedy this. Also added the ability to set the number of significant figures as an argument to get the rounder you want to apply.
import pandas as pd
from math import floor, log10
df = pd.DataFrame({'floats':[123.949, 23.87, 1.9865, 0.0129500]})
def smarter_round(sig):
def rounder(x):
offset = sig - floor(log10(abs(x)))
initial_result = round(x, offset)
if str(initial_result)[-1] == '5' and initial_result == x:
return round(x, offset - 2)
else:
return round(x, offset - 1)
return rounder
print(df['floats'].apply(smarter_round(3)))
Out:
0 124.000
1 23.900
2 1.990
3 0.013
Name: floats, dtype: float64
With large dataframes, .apply can be slow. The best solution I have seen came from Scott Gigante addressing the same question directly for numpy.
Here is a lightly modified version of his answer simply adding some pandas wrapping. The solution is fast and robust.
from typing import Union
import pandas as pd
import numpy as np
def significant_digits(df: Union[pd.DataFrame, pd.Series],
significance: int,
inplace: bool = False) -> Union[pd.DataFrame, pd.Series, None]:
# Create a positive data vector with a place holder for NaN / inf data
data = df.values
data_positive = np.where(np.isfinite(data) & (data != 0), np.abs(data), 10**(significance-1))
# Align data by magnitude, round, and scale back to original
magnitude = 10 ** (significance - 1 - np.floor(np.log10(data_positive)))
data_rounded = np.round(data * magnitude) / magnitude
# Place back into Series or DataFrame
if inplace:
df.loc[:] = data_rounded
else:
if isinstance(df, pd.DataFrame):
return pd.DataFrame(data=data_rounded, index=df.index, columns=df.columns)
else:
return pd.Series(data=data_rounded, index=df.index)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With