Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find two consecutive quarters of GDP decline, and ending with two consecutive quarters of GDP growth

I have the following df with data about the American quarterly GDP in billions of chained 2009 dollars, from 1947q1 to 2016q2:

df = pd.DataFrame(data = [1934.5, 1932.3, 1930.3, 1960.7, 1989.5, 2021.9, 2033.2, 2035.3, 2007.5, 2000.8, 2022.8, 2004.7, 2084.6, 2147.6, 2230.4, 2273.4, 2304.5, 2344.5, 2392.8, 2398.1, 2423.5, 2428.5, 2446.1, 2526.4, 2573.4, 2593.5, 2578.9, 2539.8, 2528.0, 2530.7, 2559.4, 2609.3, 2683.8, 2727.5, 2764.1, 2780.8, 2770.0, 2792.9, 2790.6, 2836.2, 2854.5, 2848.2, 2875.9, 2846.4, 2772.7, 2790.9, 2855.5, 2922.3, 2976.6, 3049.0, 3043.1, 3055.1, 3123.2, 3111.3, 3119.1, 3081.3, 3102.3, 3159.9, 3212.6, 3277.7, 3336.8, 3372.7, 3404.8, 3418.0, 3456.1, 3501.1, 3569.5, 3595.0, 3672.7, 3716.4, 3766.9, 3780.2, 3873.5, 3926.4, 4006.2, 4100.6, 4201.9, 4219.1, 4249.2, 4285.6, 4324.9, 4328.7, 4366.1, 4401.2, 4490.6, 4566.4, 4599.3, 4619.8, 4691.6, 4706.7, 4736.1, 4715.5, 4707.1, 4715.4, 4757.2, 4708.3, 4834.3, 4861.9, 4900.0, 4914.3, 5002.4, 5118.3, 5165.4, 5251.2, 5380.5, 5441.5, 5411.9, 5462.4, 5417.0, 5431.3, 5378.7, 5357.2, 5292.4, 5333.2, 5421.4, 5494.4, 5618.5, 5661.0, 5689.8, 5732.5, 5799.2, 5913.0, 6017.6, 6018.2, 6039.2, 6274.0, 6335.3, 6420.3, 6433.0, 6440.8, 6487.1, 6503.9, 6524.9, 6392.6, 6382.9, 6501.2, 6635.7, 6587.3, 6662.9, 6585.1, 6475.0, 6510.2, 6486.8, 6493.1, 6578.2, 6728.3, 6860.0, 7001.5, 7140.6, 7266.0, 7337.5, 7396.0, 7469.5, 7537.9, 7655.2, 7712.6, 7784.1, 7819.8, 7898.6, 7939.5, 7995.0, 8084.7, 8158.0, 8292.7, 8339.3, 8449.5, 8498.3, 8610.9, 8697.7, 8766.1, 8831.5, 8850.2, 8947.1, 8981.7, 8983.9, 8907.4, 8865.6, 8934.4, 8977.3, 9016.4, 9123.0, 9223.5, 9313.2, 9406.5, 9424.1, 9480.1, 9526.3, 9653.5, 9748.2, 9881.4, 9939.7, 10052.5, 10086.9, 10122.1, 10208.8, 10281.2, 10348.7, 10529.4, 10626.8, 10739.1, 10820.9, 10984.2, 11124.0, 11210.3, 11321.2, 11431.0, 11580.6, 11770.7, 11864.7, 11962.5, 12113.1, 12323.3, 12359.1, 12592.5, 12607.7, 12679.3, 12643.3, 12710.3, 12670.1, 12705.3, 12822.3, 12893.0, 12955.8, 12964.0, 13031.2, 13152.1, 13372.4, 13528.7, 13606.5, 13706.2, 13830.8, 13950.4, 14099.1, 14172.7, 14291.8, 14373.4, 14546.1, 14589.6, 14602.6, 14716.9, 14726.0, 14838.7, 14938.5, 14991.8, 14889.5, 14963.4, 14891.6, 14577.0, 14375.0, 14355.6, 14402.5, 14541.9, 14604.8, 14745.9, 14845.5, 14939.0, 14881.3, 14989.6, 15021.1, 15190.3, 15291.0, 15362.4, 15380.8, 15384.3, 15491.9, 15521.6, 15641.3, 15793.9, 15747.0, 15900.8, 16094.5, 16186.7, 16269.0, 16374.2, 16454.9, 16490.7, 16525.0, 16583.1],
index = ['1947q1', '1947q2', '1947q3', '1947q4', '1948q1', '1948q2', '1948q3',
   '1948q4', '1949q1', '1949q2', '1949q3', '1949q4', '1950q1', '1950q2',
   '1950q3', '1950q4', '1951q1', '1951q2', '1951q3', '1951q4', '1952q1',
   '1952q2', '1952q3', '1952q4', '1953q1', '1953q2', '1953q3', '1953q4',
   '1954q1', '1954q2', '1954q3', '1954q4', '1955q1', '1955q2', '1955q3',
   '1955q4', '1956q1', '1956q2', '1956q3', '1956q4', '1957q1', '1957q2',
   '1957q3', '1957q4', '1958q1', '1958q2', '1958q3', '1958q4', '1959q1',
   '1959q2', '1959q3', '1959q4', '1960q1', '1960q2', '1960q3', '1960q4',
   '1961q1', '1961q2', '1961q3', '1961q4', '1962q1', '1962q2', '1962q3',
   '1962q4', '1963q1', '1963q2', '1963q3', '1963q4', '1964q1', '1964q2',
   '1964q3', '1964q4', '1965q1', '1965q2', '1965q3', '1965q4', '1966q1',
   '1966q2', '1966q3', '1966q4', '1967q1', '1967q2', '1967q3', '1967q4',
   '1968q1', '1968q2', '1968q3', '1968q4', '1969q1', '1969q2', '1969q3',
   '1969q4', '1970q1', '1970q2', '1970q3', '1970q4', '1971q1', '1971q2',
   '1971q3', '1971q4', '1972q1', '1972q2', '1972q3', '1972q4', '1973q1',
   '1973q2', '1973q3', '1973q4', '1974q1', '1974q2', '1974q3', '1974q4',
   '1975q1', '1975q2', '1975q3', '1975q4', '1976q1', '1976q2', '1976q3',
   '1976q4', '1977q1', '1977q2', '1977q3', '1977q4', '1978q1', '1978q2',
   '1978q3', '1978q4', '1979q1', '1979q2', '1979q3', '1979q4', '1980q1',
   '1980q2', '1980q3', '1980q4', '1981q1', '1981q2', '1981q3', '1981q4',
   '1982q1', '1982q2', '1982q3', '1982q4', '1983q1', '1983q2', '1983q3',
   '1983q4', '1984q1', '1984q2', '1984q3', '1984q4', '1985q1', '1985q2',
   '1985q3', '1985q4', '1986q1', '1986q2', '1986q3', '1986q4', '1987q1',
   '1987q2', '1987q3', '1987q4', '1988q1', '1988q2', '1988q3', '1988q4',
   '1989q1', '1989q2', '1989q3', '1989q4', '1990q1', '1990q2', '1990q3',
   '1990q4', '1991q1', '1991q2', '1991q3', '1991q4', '1992q1', '1992q2',
   '1992q3', '1992q4', '1993q1', '1993q2', '1993q3', '1993q4', '1994q1',
   '1994q2', '1994q3', '1994q4', '1995q1', '1995q2', '1995q3', '1995q4',
   '1996q1', '1996q2', '1996q3', '1996q4', '1997q1', '1997q2', '1997q3',
   '1997q4', '1998q1', '1998q2', '1998q3', '1998q4', '1999q1', '1999q2',
   '1999q3', '1999q4', '2000q1', '2000q2', '2000q3', '2000q4', '2001q1',
   '2001q2', '2001q3', '2001q4', '2002q1', '2002q2', '2002q3', '2002q4',
   '2003q1', '2003q2', '2003q3', '2003q4', '2004q1', '2004q2', '2004q3',
   '2004q4', '2005q1', '2005q2', '2005q3', '2005q4', '2006q1', '2006q2',
   '2006q3', '2006q4', '2007q1', '2007q2', '2007q3', '2007q4', '2008q1',
   '2008q2', '2008q3', '2008q4', '2009q1', '2009q2', '2009q3', '2009q4',
   '2010q1', '2010q2', '2010q3', '2010q4', '2011q1', '2011q2', '2011q3',
   '2011q4', '2012q1', '2012q2', '2012q3', '2012q4', '2013q1', '2013q2',
   '2013q3', '2013q4', '2014q1', '2014q2', '2014q3', '2014q4', '2015q1',
   '2015q2', '2015q3', '2015q4', '2016q1', '2016q2'])
df.columns = ['GDP in billions of chained 2009 dollars']
df.index.rename('quarter', inplace = True)

A recession period is defined as starting with two consecutive quarters of GDP decline, and ending with two consecutive quarters of GDP growth. The goal is to create a function 'get_recession_periods()' that returns all of the recession periods between 1947q1 and 2016q2. The output could a dataframe with two columns (start and end) or a list of tuples [(start and end), ...] with all the recession periods found.

Here is my try:

get_recession_periods()
    lst_start = []
    for i in range(0,len(df['GDP in billions of chained 2009 dollars'])-2):
    if df['GDP in billions of chained 2009 dollars'][i] < df['GDP in billions of chained 2009 dollars'][i-1] and df['GDP in billions of chained 2009 dollars'][i+1] < df['GDP in billions of chained 2009 dollars'][i]:
        lst_start.append(df.index[i])
    start = lst_start[0]
    lst_end = []
    for j in range(df.index.get_loc(start),len(df['GDP in billions of chained 2009 dollars'])-2):
        if df['GDP in billions of chained 2009 dollars'][j] > df['GDP in billions of chained 2009 dollars'][j-1] and df['GDP in billions of chained 2009 dollars'][j+1] > df['GDP in billions of chained 2009 dollars'][j]:
            lst_end.append(df.index[j])
    return (lst_start[0], lst_end[0])

But with the function above, I am only able to get the start and end quarter of the first recession in 1947.

Any idea?

like image 220
Antonio Serrano Avatar asked Dec 28 '16 11:12

Antonio Serrano


1 Answers

This is probably overkill for this particular example... In a nutshell this is a bit more complicated than @zaq's answer but also much faster (about 9x here, and the difference would be much bigger on larger datasets) because it's vectorized instead of looped. But for this very small dataset here, clearly you would go with the simpler answer since even the slower way is fast enough. Finally, it stores the data in the dataframe itself rather than as a tuple (which could be an advantage or disadvantage, depending on the situation).

Thanks to @zaq for pointing out that I misread the question initially. I believe this now gives the same answer as zaq's except we have different implicit assumptions about the initial state of the world (beginning in recession or not) which is indeterminate in the data provided.

df['change'] = df.diff()        # change in GDP from prior quarter

start = (df.change<0) & (df.change.shift(-1)<0)  # potential start
end   = (df.change>0) & (df.change.shift(-1)>0)  # potential end

df['recess' ] = np.nan
df.loc[ start, 'recess' ] = -1
df.loc[ end,   'recess' ] =  1

df['recess'] = df.recess.ffill()    # if the current row doesn't fit the 
                                    # definition of a start or end, then  
                                    # fill it with the prior row value                                           
df['startend'] = np.nan
df.loc[ (df.recess==-1) & (df.recess.shift()== 1), 'startend'] = -1  # start
df.loc[ (df.recess== 1) & (df.recess.shift()==-1), 'startend'] =  1  # end

df[df.startend.notnull()] 

             GDP  change  recess  startend
quarter                                   
1947q4    1960.7    30.4     1.0       1.0
1949q1    2007.5   -27.8    -1.0      -1.0
1950q1    2084.6    79.9     1.0       1.0
1953q3    2578.9   -14.6    -1.0      -1.0
1954q2    2530.7     2.7     1.0       1.0
1957q4    2846.4   -29.5    -1.0      -1.0
1958q2    2790.9    18.2     1.0       1.0
1969q4    4715.5   -20.6    -1.0      -1.0
1970q2    4715.4     8.3     1.0       1.0
1974q3    5378.7   -52.6    -1.0      -1.0
1975q2    5333.2    40.8     1.0       1.0
1980q2    6392.6  -132.3    -1.0      -1.0
1980q4    6501.2   118.3     1.0       1.0
1981q4    6585.1   -77.8    -1.0      -1.0
1982q4    6493.1     6.3     1.0       1.0
1990q4    8907.4   -76.5    -1.0      -1.0
1991q2    8934.4    68.8     1.0       1.0
2008q3   14891.6   -71.8    -1.0      -1.0
2009q3   14402.5    46.9     1.0       1.0
like image 75
JohnE Avatar answered Sep 29 '22 21:09

JohnE