Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Round a series to N number of significant figures

I have a dataframe of floats and I need make a function that will take a column and round all the values to N number of significant figures

So the column might look something like:

123.949
23.87 
1.9865
0.0129500

and if I wanted to round to 3 significant figures I would pass the column and 3 to the function to get this

124.0
23.9
1.99
0.013

How can I do this efficiently without looping through the column?

I have an equation that will calculate the significant figures for a number

round(x, N-int(floor(log10(abs(x))))

but it doesn't work on a series or dataframe

like image 966
Gingerhaze Avatar asked Aug 21 '19 10:08

Gingerhaze


People also ask

What are the rules to round off a number to n significant figures?

Rounding-off rules If the first non-significant digit is less than 5, then the least significant digit remains unchanged. If the first non-significant digit is greater than 5, the least significant digit is incremented by 1.


3 Answers

You can use, pandas.Series.apply which implements a function element wise across an axis (column or row):

df.col.apply(lambda x: round(x, N - int(floor(log10(abs(x))))))

Note that you can't really use pandas.DataFrame.apply here since the round function should be element wise, not on the entire axis.

The difference would be that your function input is float instead of getting an array.

Another option would be applymap which implements a function element-wise on the entire pandas.DataFrame.

df.applymap(lambda x: round(x, N - int(floor(log10(abs(x))))))
like image 74
iDrwish Avatar answered Oct 02 '22 23:10

iDrwish


Here is another take at applying your custom function over the series in a dataframe. However, the in-built round() seems to round fractional parts down when the last digit is 5, so in your example you'd actually get 0.0129 instead of 0.013. I tried to remedy this. Also added the ability to set the number of significant figures as an argument to get the rounder you want to apply.

import pandas as pd
from math import floor, log10

df = pd.DataFrame({'floats':[123.949, 23.87, 1.9865, 0.0129500]})

def smarter_round(sig):
    def rounder(x):
        offset = sig - floor(log10(abs(x)))
        initial_result = round(x, offset)
        if str(initial_result)[-1] == '5' and initial_result == x:
            return round(x, offset - 2)
        else:
            return round(x, offset - 1)
    return rounder

print(df['floats'].apply(smarter_round(3)))

Out:
    0    124.000
    1     23.900
    2      1.990
    3      0.013
    Name: floats, dtype: float64
like image 32
Ivan Popov Avatar answered Oct 03 '22 23:10

Ivan Popov


With large dataframes, .apply can be slow. The best solution I have seen came from Scott Gigante addressing the same question directly for numpy.

Here is a lightly modified version of his answer simply adding some pandas wrapping. The solution is fast and robust.

from typing import Union
import pandas as pd
import numpy as np

def significant_digits(df: Union[pd.DataFrame, pd.Series], 
                       significance: int, 
                       inplace: bool = False) -> Union[pd.DataFrame, pd.Series, None]:
    
    # Create a positive data vector with a place holder for NaN / inf data
    data = df.values
    data_positive = np.where(np.isfinite(data) & (data != 0), np.abs(data), 10**(significance-1))

    # Align data by magnitude, round, and scale back to original
    magnitude = 10 ** (significance - 1 - np.floor(np.log10(data_positive)))
    data_rounded = np.round(data * magnitude) / magnitude

    # Place back into Series or DataFrame
    if inplace:
        df.loc[:] = data_rounded
    else:
        if isinstance(df, pd.DataFrame):
            return pd.DataFrame(data=data_rounded, index=df.index, columns=df.columns)
        else:
            return pd.Series(data=data_rounded, index=df.index)
like image 34
RSamson78 Avatar answered Oct 03 '22 23:10

RSamson78