Create multiple columns in Pandas Dataframe from one function

Tags:

I'm a python newbie, so I hope my two questions are clear and complete. I posted the actual code and a test data set in csv format below.

I've been able to construct the following code (mostly with the help from the StackOverflow contributors) to calculate the Implied Volatility of an option contract using Newton-Raphson method. The process calculates Vega when determining the Implied Volatility. Although I'm able to create a new DataFrame column for Implied Volatility using the Pandas DataFrame apply method, I'm unable to create a second column for Vega. Is there a way create two separate DataFrame columns when the function to returns IV & Vega together?

I tried:

return iv, vega from function
df[['myIV', 'Vega']] = df.apply(newtonRap, axis=1)
Got ValueError: Shape of passed values is (56, 2), indices imply (56, 13)

Also tried:

return iv, vega from function
df['myIV'], df['Vega'] = df.apply(newtonRap, axis=1)
Got ValueError: Shape of passed values is (56, 2), indices imply (56, 13)

Additionally, the calculation process is slow. I imported numba and implemented the @jit(nogil=True) decorator, but I only see a performance improvement of 25%. The test data set is the performance test has almost 900,000 records. The run time is 2 hours and 9 minutes without numba or with numba, but witout nogil=True. The run time when using numba and @jit(nogil=True) is 1 hour and 32 minutes. Can I do better?

from datetime import datetime
from math import sqrt, pi, log, exp, isnan
from scipy.stats import norm
from numba import jit


# dff = Daily Fed Funds (Posted rate is usually one day behind)
dff = pd.read_csv('https://research.stlouisfed.org/fred2/data/DFF.csv', parse_dates=[0], index_col='DATE')
rf = float('%.4f' % (dff['VALUE'][-1:][0] / 100))
# rf = .0015                        # Get Fed Funds Rate https://research.stlouisfed.org/fred2/data/DFF.csv
tradingMinutesDay = 450             # 7.5 hours per day * 60 minutes per hour
tradingMinutesAnnum = 113400        # trading minutes per day * 252 trading days per year
cal = USFederalHolidayCalendar()    # Load US Federal holiday calendar


@jit(nogil=True)                                # nogil=True arg improves performance by 25%
def newtonRap(row):
    """Estimate Implied Volatility (IV) using Newton-Raphson method

    :param row (dataframe):  Options contract params for function
        TimeStamp (datetime): Close date
        Expiry (datetime): Option contract expiration date
        Strike (float): Option strike
        OptType (object): 'C' for call; 'P' for put
        RootPrice (float): Underlying close price
        Bid (float): Option contact closing bid
        Ask (float): Option contact closing ask

    :return:
        float: Estimated implied volatility
    """
    if row['Bid'] == 0.0 or row['Ask'] == 0.0 or row['RootPrice'] == 0.0 or row['Strike'] == 0.0 or \
       row['TimeStamp'] == row['Expiry']:
        iv, vega = 0.0, 0.0         # Set iv and vega to zero if option contract is invalid or expired
    else:
        # dte (Days to expiration) uses pandas bdate_range method to determine the number of business days to expiration
        #   minus USFederalHolidays minus constant of 1 for the TimeStamp date
        dte = float(len(pd.bdate_range(row['TimeStamp'], row['Expiry'])) -
                    len(cal.holidays(row['TimeStamp'], row['Expiry']).to_pydatetime()) - 1)
        mark = (row['Bid'] + row['Ask']) / 2
        cp = 1 if row['OptType'] == 'C' else -1
        S = row['RootPrice']
        K = row['Strike']
        # T = the number of trading minutes to expiration divided by the number of trading minutes in year
        T = (dte * tradingMinutesDay) / tradingMinutesAnnum
        # TODO get dividend value
        d = 0.00
        iv = sqrt(2 * pi / T) * mark / S        # Closed form estimate of IV Brenner and Subrahmanyam (1988)
        vega = 0.0
        for i in range(1, 100):
            d1 = (log(S / K) + T * (rf - d + iv ** 2 / 2)) / (iv * sqrt(T))
            d2 = d1 - iv * sqrt(T)
            vega = S * norm.pdf(d1) * sqrt(T)
            model = cp * S * norm.cdf(cp * d1) - cp * K * exp(-rf * T) * norm.cdf(cp * d2)
            iv -= (model - mark) / vega
            if abs(model - mark) < 1.0e-9:
                break
        if isnan(iv) or isnan(vega):
            iv, vega = 0.0, 0.0
    # TODO Return vega with iv if add'l pandas column possible
    # return iv, vega
    return iv


if __name__ == "__main__":
    # test function from baseline data
    get_csv = True

    if get_csv:
        csvHeaderList = ['TimeStamp', 'OpraSymbol', 'RootSymbol', 'Expiry', 'Strike', 'OptType', 'RootPrice', 'Last',
                         'Bid', 'Ask', 'Volume', 'OpenInt', 'IV']
        fileName = 'C:/tmp/test-20150930-56records.csv'
        df = pd.read_csv(fileName, parse_dates=[0, 3], names=csvHeaderList)
    else:
        pass

    start = datetime.now()
    # TODO Create add'l pandas dataframe column, if possible, for vega
    # df[['myIV', 'Vega']] = df.apply(newtonRap, axis=1)
    # df['myIV'], df['Vega'] = df.apply(newtonRap, axis=1)
    df['myIV'] = df.apply(newtonRap, axis=1)
    end = datetime.now()
    print end - start

Test Data: C:/tmp/test-20150930-56records.csv

2015-09-30 16:00:00,AAPL151016C00109000,AAPL,2015-10-16 16:00:00,109,C,109.95,3.46,3.6,3.7,1565,1290,0.3497 2015-09-30 16:00:00,AAPL151016P00109000,AAPL,2015-10-16 16:00:00,109,P,109.95,2.4,2.34,2.42,3790,3087,0.3146 2015-09-30 16:00:00,AAPL151016C00110000,AAPL,2015-10-16 16:00:00,110,C,109.95,3,2.86,3,10217,28850,0.3288 2015-09-30 16:00:00,AAPL151016P00110000,AAPL,2015-10-16 16:00:00,110,P,109.95,2.81,2.74,2.8,12113,44427,0.3029 2015-09-30 16:00:00,AAPL151016C00111000,AAPL,2015-10-16 16:00:00,111,C,109.95,2.35,2.44,2.45,6674,2318,0.3187 2015-09-30 16:00:00,AAPL151016P00111000,AAPL,2015-10-16 16:00:00,111,P,109.95,3.2,3.1,3.25,2031,3773,0.2926 2015-09-30 16:00:00,AAPL151120C00110000,AAPL,2015-11-20 16:00:00,110,C,109.95,5.9,5.7,5.95,5330,17112,0.3635 2015-09-30 16:00:00,AAPL151120P00110000,AAPL,2015-11-20 16:00:00,110,P,109.95,6.15,6.1,6.3,3724,15704,0.3842

557

asked Oct 09 '15 17:10

vlmercado

2 Answers

If I understand you right, what you should be doing is returning a Series from your function. Something like:

return pandas.Series({"IV": iv, "Vega": vega})

If you want to put the result into new columns of the same input DataFrame, then just do:

df[["IV", "Vega"]] = df.apply(newtonRap, axis=1)

answered Sep 18 '22 20:09

BrenBarn

As far as the performance with numba is concerned, numba doesn't know anything about pandas dataframes and cannot compile operations on them down to fast machine code. Your best bet is to profile what part of your method is slow (using line_profiler for example), and then offload that part to another method that you construct the inputs using the .values attributes of the dataframe columns, which gives you access to the underlying numpy array. Otherwise numba is just going to operate mostly in "object mode" (see the numba glossary) and won't improve performance drastically

answered Sep 18 '22 20:09

JoshAdel

Related questions
                            
                                matplotlib bwr-colormap, always centered on zero
                            
                                How to get filename from a python logger
                            
                                Is there a way to add a task to the windows task scheduler via python 3?
                            
                                Seasonal Decomposition of Time Series by Loess with Python
                            
                                Python-load data and do multi Gaussian fit
                            
                                Conditional statements in a class, but outside of scope of the function
                            
                                Setting up static folder path in Flask
                            
                                Python falcon and async operations
                            
                                Changing fontsize in python subplots
                            
                                Expected conditions in protractor
                            
                                Scrapy: 'str' object has no attribute 'iter'
                            
                                File.open(readme) in setup.py isn't found
                            
                                Meaning of the retval return value in cv2.CalibrateCamera
                            
                                WARNING:root:Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER. With Requests and Beastuifulsoup
                            
                                Elasticsearch python client: Getting the ES version through API call
                            
                                Python Flask calling functions using buttons
                            
                                AttributeError: 'TimedeltaProperties' object has no attribute 'years' in Pandas
                            
                                Pandas: Filter dataframe for values that are too frequent or too rare
                            
                                Python Check if String split succeeded
                            
                                How to create a CSV file if it does not exist and then only append to it Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Create multiple columns in Pandas Dataframe from one function

Tags:

python

pandas

multiple-columns

jit

numba

vlmercado

People also ask

2 Answers

BrenBarn

JoshAdel

Recent Activity

Donate For Us