Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create Pandas Series with Decimal?

I'm calculating some standard deviations which are giving FloatingPointErrors. I wanted to try converting the data series to Decimal (using https://docs.python.org/3/library/decimal.html), to see if this fixes my issue.

I can't seem to make a pandas series of decimal.

How can I take a normal pd.Series of float64 and convert to a pd.Series of decimal, such that I can do:

Series.pct_change().ewm(span=35, min_periods=35).std()
like image 992
cjm2671 Avatar asked Jun 29 '16 08:06

cjm2671


2 Answers

I think you can create the DataFrame directly with Decimal types and operate with the values

import pandas as pd
import numpy as np
from decimal import *

df = pd.DataFrame({
    'DECIMAL_1': [Decimal('2342.2345234'), Decimal('564.5678'), Decimal('76867.8923892')],
    'DECIMAL_2': [Decimal('67867.43534534323'), Decimal('67876.345345'), Decimal('234234.2345345')]
})
df['DECIMAL_3'] = df['DECIMAL_1'] + df['DECIMAL_2']
df.dtypes

The drawback could be that the columns dtype is going to be object and the performance will decrease, I am afraid. Anyway, I think that any operation with the Decimal will require more computation than operating with floats.

Maybe the best solution is to have a copy of the DataFrame. One DF with floats and the other one with Decimal. If you need to make fast operations you can use the DF with floats, if you need to compare or assign new values to some cells with some specific precision you can use the DF created with Decimal.

Tell me what you think about my suggestions.

Note: I made my example with DataFrame, but a DataFrame is built with Series

like image 94
ChesuCR Avatar answered Sep 21 '22 09:09

ChesuCR


would something like this work?

def column_round(decimals):
     return partial(Series.round, decimals=decimals)

df.apply(column_round(2))

alternatively lets use np.vectorize so we can use decimal.quantize function to do rounding, this will leave the variable as a decimal instead of np.float64

npquantize = np.vectorize(decimal.Decimal.quantize)

I have been looking into it and this seems to solve the issue with pct_change

ts.diff().div(ts.shift(1))
like image 43
SerialDev Avatar answered Sep 17 '22 09:09

SerialDev