Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to apply the Hurst Exponent in Python in a rolling window

I am trying to apply the Hurst Exponent on SPY closing prices on a rolling window. The below code (which I got from here: https://www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing) works well if I apply it on the closing prices column. However this gives me a static value. I would like to apply the Hurst Exponent on a rolling window considering the last 200 closing prices. My objective is to get a column in which the Hurst Exponent is updated in each row considering the last 200 closing prices.

from numpy import cumsum, log, polyfit, sqrt, std, subtract
from numpy.random import randn
import pandas_datareader as dr
from datetime import date

df = dr.data.get_data_yahoo('SPY',start='23-01-1991',end=date.today())

def hurst(ts):
    """Returns the Hurst Exponent of the time series vector ts"""
    # Create the range of lag values
    lags = range(2, 100)

    # Calculate the array of the variances of the lagged differences
    tau = [sqrt(std(subtract(ts[lag:], ts[:-lag]))) for lag in lags]

    # Use a linear fit to estimate the Hurst Exponent
    poly = polyfit(log(lags), log(tau), 1)

    # Return the Hurst exponent from the polyfit output
    return poly[0]*2.0

print ("Hurst(SPY): %s" % hurst(df['Close']))

## I've tried the next lines of code but unfortunately they are not working:
df['Hurst_Column']= [0]
for aRowINDEX in range( 1, 200 ):
    df['Hurst_Column'][-aRowINDEX] = hurst (df[u'Close'][:-aRowINDEX]) 

I am very new at Python, and I have tried different things with no luck. Anyone could help me with it, please? Any help would be more than welcome. Thank you!

like image 567
Martingale Avatar asked Jul 18 '19 14:07

Martingale


People also ask

How do you calculate Hurst exponent in Python?

To calculate the Hurst exponent, we first calculate the standard deviation of the differences between a series and its lagged version, for a range of possible lags. We then estimate the Hurst exponent as the slope of the log-log plot of the number of lags versus the mentioned standard deviations.

Can Hurst exponent be greater than 1?

It is used to measure long range dependence in a time series. While the significant Hurst Exponent value is between 0 and 1, it is possible for DFA to produce Hurst Exponent values greater than 1. Hurst values greater than 1 indicate non-stationarity or unsuccessful detrending (Bryce et al., 2001).


2 Answers

Let me offer you a two step way forwards:

Step 1: a bit more robust Hurst Exponent implementation with test data

Step 2: a simple way to produce a "sliding-window"-alike calculation

Step 3: a bit more complicated way - if a ROLLING WINDOW is a MUST ...

Bonus: What should I write under the code of my question to have it done?


Step 1: a bit more robust Hurst Exponent implementation with test data :

Here, I will post a function implementation, taken from QuantFX module, as-is ( Py2.7 will not make troubles in most places, yet any xrange() ought be replaced with range() in Py3.x ).

This code contains a few improvements and some sort of self-healing, if tests show, that there are problems with data-segment ( QuantFX uses a convention of a natural flow of the time, where data[0] is the "oldest" time-series cell and data[-1] being the "most recent" one ).

Calling the HurstEXP() without any parameter will yield a demo-run, showing some tests and explanations of the subject matter.

Also the print( HurstEXP.__doc__ ) is self-explanatory:

def HurstEXP( ts = [ None, ] ):                                         # TESTED: HurstEXP()                Hurst exponent ( Browninan Motion & other observations measure ) 100+ BARs back(!)
            """                                                         __doc__
            USAGE:
                        HurstEXP( ts = [ None, ] )

                        Returns the Hurst Exponent of the time series vector ts[]

            PARAMETERS:
                        ts[,]   a time-series, with 100+ elements
                                ( or [ None, ] that produces a demo run )

            RETURNS:
                        float - a Hurst Exponent approximation,
                                as a real value
                                or
                                an explanatory string on an empty call
            THROWS:
                        n/a
            EXAMPLE:
                        >>> HurstEXP()                                        # actual numbers will vary, as per np.random.randn() generator used
                        HurstEXP( Geometric Browian Motion ):    0.49447454
                        HurstEXP(    Mean-Reverting Series ):   -0.00016013
                        HurstEXP(          Trending Series ):    0.95748937
                        'SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator'

                        >>> HurstEXP( rolling_window( aDSEG[:,idxC], 100 ) )
            REF.s:
                        >>> www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing
            """
            #---------------------------------------------------------------------------------------------------------------------------<self-reflective>
            if ( ts[0] == None ):                                       # DEMO: Create a SYNTH Geometric Brownian Motion, Mean-Reverting and Trending Series:

                 gbm = np.log( 1000 + np.cumsum(     np.random.randn( 100000 ) ) )  # a Geometric Brownian Motion[log(1000 + rand), log(1000 + rand + rand ), log(1000 + rand + rand + rand ),... log(  1000 + rand + ... )]
                 mr  = np.log( 1000 +                np.random.randn( 100000 )   )  # a Mean-Reverting Series    [log(1000 + rand), log(1000 + rand        ), log(1000 + rand               ),... log(  1000 + rand       )]
                 tr  = np.log( 1000 + np.cumsum( 1 + np.random.randn( 100000 ) ) )  # a Trending Series          [log(1001 + rand), log(1002 + rand + rand ), log(1003 + rand + rand + rand ),... log(101000 + rand + ... )]

                                                                        # Output the Hurst Exponent for each of the above SYNTH series
                 print ( "HurstEXP( Geometric Browian Motion ):   {0: > 12.8f}".format( HurstEXP( gbm ) ) )
                 print ( "HurstEXP(    Mean-Reverting Series ):   {0: > 12.8f}".format( HurstEXP( mr  ) ) )
                 print ( "HurstEXP(          Trending Series ):   {0: > 12.8f}".format( HurstEXP( tr  ) ) )

                 return ( "SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator" )
            """                                                         # FIX:
            ===================================================================================================================
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
            0.47537688039105963
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
            -0.31081076640420308
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
            nan
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )

            Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
            C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
            warnings.warn(msg, RankWarning)
            0.026867491053098096
            """
            pass;     too_short_list = 101 - len( ts )                  # MUST HAVE 101+ ELEMENTS
            if ( 0 <  too_short_list ):                                 # IF NOT:
                 ts = too_short_list * ts[:1] + ts                      #    PRE-PEND SUFFICIENT NUMBER of [ts[0],]-as-list REPLICAS TO THE LIST-HEAD
            #---------------------------------------------------------------------------------------------------------------------------
            lags = range( 2, 100 )                                                              # Create the range of lag values
            tau  = [ np.sqrt( np.std( np.subtract( ts[lag:], ts[:-lag] ) ) ) for lag in lags ]  # Calculate the array of the variances of the lagged differences
            #oly = np.polyfit( np.log( lags ), np.log( tau ), 1 )                               # Use a linear fit to estimate the Hurst Exponent
            #eturn ( 2.0 * poly[0] )                                                            # Return the Hurst exponent from the polyfit output
            """ ********************************************************************************************************************************************************************* DONE:[MS]:ISSUE / FIXED ABOVE
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH] )
            C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice
              warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
            C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:94: RuntimeWarning: invalid value encountered in true_divide
              arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
            C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:114: RuntimeWarning: invalid value encountered in true_divide
              ret, rcount, out=ret, casting='unsafe', subok=False)
            QuantFX.py:23034: RuntimeWarning: divide by zero encountered in log
              return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] )                  # Return the Hurst exponent from the polyfit output ( a linear fit to estimate the Hurst Exponent )

            Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
            C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
              warnings.warn(msg, RankWarning)
            0.028471879418359915
            |
            |
            |# DATA:
            |
            |>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH]
            memmap([ 1763.31005859,  1765.01000977,  1765.44995117,  1764.80004883,
                     1765.83996582,  1768.91003418,  1771.04003906,  1769.43994141,
                     1771.4699707 ,  1771.61999512,  1774.76000977,  1769.55004883,
                     1773.4699707 ,  1773.32995605,  1770.08996582,  1770.20996094,
                     1768.34997559,  1768.02001953,  1767.59997559,  1767.23999023,
                     1768.41003418,  1769.06994629,  1769.56994629,  1770.7800293 ,
                     1770.56994629,  1769.7800293 ,  1769.90002441,  1770.44995117,
                     1770.9699707 ,  1771.04003906,  1771.16003418,  1769.81005859,
                     1768.76000977,  1769.39001465,  1773.23999023,  1771.91003418,
                     1766.92004395,  1765.56994629,  1762.65002441,  1760.18005371,
                     1755.        ,  1756.67004395,  1753.48999023,  1753.7199707 ,
                     1751.92004395,  1745.44995117,  1745.44995117,  1744.54003906,
                     1744.54003906,  1744.84997559,  1744.84997559,  1744.34997559,
                     1744.34997559,  1743.75      ,  1743.75      ,  1745.23999023,
                     1745.23999023,  1745.15002441,  1745.31005859,  1745.47998047,
                     1745.47998047,  1749.06994629,  1749.06994629,  1748.29003906,
                     1748.29003906,  1747.42004395,  1747.42004395,  1746.98999023,
                     1747.61999512,  1748.79003906,  1748.79003906,  1748.38000488,
                     1748.38000488,  1744.81005859,  1744.81005859,  1736.80004883,
                     1736.80004883,  1735.43005371,  1735.43005371,  1737.9699707
                     ], dtype=float32
                    )
            |
            |
            | # CONVERTED .tolist() to avoid .memmap-type artifacts:
            |
            |>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist()
            [1763.31005859375, 1765.010009765625, 1765.449951171875, 1764.800048828125, 1765.8399658203125, 1768.9100341796875, 1771.0400390625, 1769.43994140625, 1771.469970703125, 1771.6199951171875, 1774.760
            859375, 1743.75, 1743.75, 1745.239990234375, 1745.239990234375, 1745.1500244140625, 1745.31005859375, 1745.47998046875, 1745.47998046875, 1749.0699462890625, 1749.0699462890625, 1748.2900390625, 174
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist() )
            C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars
              ret = ret.dtype.type(ret / rcount)

            Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
            C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
              warnings.warn(msg, RankWarning)
            0.028471876494884543
            ===================================================================================================================
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
            0.47537688039105963
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
            -0.31081076640420308
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
            nan
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )

            Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
            C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
            warnings.warn(msg, RankWarning)
            0.026867491053098096
            """
            return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] )                  # Return the Hurst exponent from the polyfit output ( a linear fit to estimate the Hurst Exponent )

Step 2: a simple way to produce a "sliding-window" calculation :

 [ ( -i, HurstEXP( ts = df['Close'][:-i] ) ) for i in range( 1, 200 ) ] # should call the HurstEXP for the last 200 days

TEST-ME:

>>> df[u'Close']
Date
1993-01-29     43.937500
1993-02-01     44.250000
...
2019-07-17    297.739990
2019-07-18    297.429993
Name: Close, Length: 6665, dtype: float64
>>> 

>>> [ (                          -i,
         HurstEXP( df[u'Close'][:-i] )
         )                   for  i in range( 1, 10 )
         ]
[ ( -1, 0.4489364467179827  ),
  ( -2, 0.4489306967683502  ),
  ( -3, 0.44892205577752986 ),
  ( -4, 0.448931424819551   ),
  ( -5, 0.44895272101162326 ),
  ( -6, 0.44896713741862954 ),
  ( -7, 0.44898211557287204 ),
  ( -8, 0.4489941656580211  ),
  ( -9, 0.4490116318052649  )
  ]

Step 3: a bit more complicated way - if a ROLLING WINDOW is a MUST ... :

While not much memory / processing efficient, the "rolling window" trick may get injected into the game, whereas there is no memory, the less a processing efficiency benefit from doing so ( you spend a lot on syntactically plausible code, yet the processing efficiency does not get here any plus from doing it right this way, as convolved nature of HurstEXP() cannot help, without an attempt to re-vectorise also its internal code (why and what for ever?) any better from this ... just if professor or boss still wants you to do so ... ):

def rolling_window( aMatrix, aRollingWindowLENGTH ):                    #
            """                                                                 __doc__
            USAGE:   rolling_window( aMatrix, aRollingWindowLENGTH )

            PARAMS:  aMatrix                a numpy array
                     aRollingWindowLENGTH   a LENGTH of a rolling window

            RETURNS: a stride_trick'ed numpy array with rolling windows

            THROWS:  n/a

            EXAMPLE: >>> x = np.arange( 10 ).reshape( ( 2, 5 ) )

                     >>> rolling_window( x, 3 )
                     array([[[0, 1, 2], [1, 2, 3], [2, 3, 4]],
                            [[5, 6, 7], [6, 7, 8], [7, 8, 9]]])

                     >>> np.mean( rolling_window( x, 3 ), -1 )
                     array([[ 1.,  2.,  3.],
                            [ 6.,  7.,  8.]])
            """
            new_shape   = aMatrix.shape[:-1] + ( aMatrix.shape[-1] - aRollingWindowLENGTH + 1, aRollingWindowLENGTH )
            new_strides = aMatrix.strides    + ( aMatrix.strides[-1], )
            return np.lib.stride_tricks.as_strided( aMatrix,
                                                    shape   = new_shape,
                                                    strides = new_strides
                                                    )

>>> rolling_window( df[u'Close'], 100 ).shape
(6566, 100)

>>> rolling_window( df[u'Close'], 100 ).flags
    C_CONTIGUOUS    : False
    F_CONTIGUOUS    : False
    OWNDATA         : False <---------------- a VIEW, not a replica
    WRITEABLE       : True
    ALIGNED         : True
    WRITEBACKIFCOPY : False
    UPDATEIFCOPY    : False

You get an array of 6566 vectors with "rolling_window"-ed 100-day blocks of of SPY[Close]-s

>>> rolling_window( df[u'Close'], 100 )
array([[ 43.9375    ,  44.25      ,  44.34375   , ...,  44.5       ,  44.59375   ,  44.625     ],
       [ 44.25      ,  44.34375   ,  44.8125    , ...,  44.59375   ,  44.625     ,  44.21875   ],
       [ 44.34375   ,  44.8125    ,  45.        , ...,  44.625     ,  44.21875   ,  44.8125    ],
       ...,
       [279.14001465, 279.51998901, 279.32000732, ..., 300.6499939 , 300.75      , 299.77999878],
       [279.51998901, 279.32000732, 279.20001221, ..., 300.75      , 299.77999878, 297.73999023],
       [279.32000732, 279.20001221, 278.67999268, ..., 299.77999878, 297.73999023, 297.42999268]])

Q: What should I write under the code of my question to have it done?

for                         aRowINDEX in range( 1, 200 ):
    df[u'HurstEXP_COLUMN'][-aRowINDEX] = HurstEXP( df[u'Close'][:-aRowINDEX] )
    print( "[{0:>4d}]: DIFF( hurst() - HurstEXP() ) == {1:}".format( aRowINDEX,
                           ( hurst(    df[u'Close'][:-aRowINDEX] )
                           - HurstEXP( df[u'Close'][:-aRowINDEX] )
                             )
            )
like image 134
user3666197 Avatar answered Oct 24 '22 07:10

user3666197


A bit late but there doesn't seem to be a clean solution online. This seems to work for me - you need to install the hurst package first though.

from hurst import compute_Hc

H = lambda x: compute_Hc(x)[0]

window = 200
df['Hurst_Columns'] = df['Close'].rolling(window).apply(H)

like image 1
Neb Avatar answered Oct 24 '22 08:10

Neb