Analysis of the manufacturing new orders indicator and cyclical-defensive stocks spread in Python

Discussion in 'App Development' started by Kust, Dec 18, 2020.

  1. Kust

    Kust

    I tried to investigate the relationship between the manufacturing durable goods new orders indicator and the spread of cyclical and defensive stocks (consumer discretionary XLY and consumer staples XLP). Following is the Python code and textual comments.

    Durable goods orders monthly data is one of the leading economic indicators. In theory, we can use it to predict the future, such as the behavior of the cyclical and defensive stocks spread. See the details here. I have downloaded the indicator data from the FRED here.

    First, let's create the spread.

    Code:
    import pandas as pd
    import pandas_datareader.data as web
    import numpy as np
    from IPython.core.debugger import set_trace
    import plotly.graph_objects as go
    # from scipy import stats
    import plotly.io as pio
    
    symbol1 = 'XLY' # consumer discretionary
    symbol2 = 'XLP' # consumer staples
    
    stock1 = web.DataReader(symbol1, 'yahoo', '1-1-1993')
    stock2 = web.DataReader(symbol2, 'yahoo', '1-1-1993')
    
    s2 = pd.DataFrame(columns = stock1.columns)
    # set_trace()
    # s2['Open'] = stock1['Open']/stock2['Open']
    # s2['High'] = stock1['High']/stock2['High']
    # s2['Low'] = stock1['Low']/stock2['Low']
    s2['Close'] = stock1['Close']/stock2['Close']
    s2['Adj Close'] = stock1['Adj Close']/stock2['Adj Close']
    s2 = s2.drop('Volume', axis=1)
    s2 = s2.dropna()
    Now let's read the indicator data from the csv file.

    Code:
    newOrders = pd.read_csv("DGORDER.csv", index_col='DATE', parse_dates=True)
    Now we create the explanatory variables. I assume that in the first days of the current month, you can find out the data for the last month. This data will have an index of the first day of the previous month. For example, on the first day of December, they publish the November data. This data is indexed by November 1.

    Let's say the number of durable goods orders dropped from October to November. You can find it out no earlier than December 1st, or even a few days later. It would be a data analysis mistake to assume that this information became known earlier. That is why I use newOrders['DGORDER'].shift(1) in the code.

    Code:
    newOrders['Trend1months'] = (newOrders['DGORDER'].shift(1) - newOrders['DGORDER'].shift(2))/newOrders['DGORDER'].shift(2)
    newOrders['Trend2months'] = (newOrders['DGORDER'].shift(1) - newOrders['DGORDER'].shift(3))/newOrders['DGORDER'].shift(3)
    newOrders['Trend3months'] = (newOrders['DGORDER'].shift(1) - newOrders['DGORDER'].shift(4))/newOrders['DGORDER'].shift(4)
    Now let's create several resulting variables, i.e. the returns of the spread in the coming 1, 2, and 3 months.

    We have the spread data in a daily format. The resulting variables should be in a monthly format, just like our explanatory variables. So we have to use the Pandas resample function.

    Code:
    spreadReturn1Month = (s2['Adj Close'].resample('BMS').first().shift(-1) - s2['Adj Close'].resample('BMS').first())/s2['Adj Close'].resample('BMS').first()
    spreadReturn2Month = (s2['Adj Close'].resample('BMS').first().shift(-2) - s2['Adj Close'].resample('BMS').first())/s2['Adj Close'].resample('BMS').first()
    spreadReturn3Month = (s2['Adj Close'].resample('BMS').first().shift(-3) - s2['Adj Close'].resample('BMS').first())/s2['Adj Close'].resample('BMS').first()
    Now let's create the full DataFrame with all the explanatory variables and results.

    Code:
    full = pd.concat([newOrders['Trend1months'], newOrders['Trend2months'], newOrders['Trend3months']], axis=1)
    
    spreads = pd.concat([spreadReturn1Month, spreadReturn2Month, spreadReturn3Month], axis=1)
    spreads.columns = ['R1M', 'R2M', 'R3M']
    spreads.index = spreads.index.map(lambda t: t.replace(day=1)) # essential operation
    
    full = pd.concat([full, spreads], axis=1)
    full.dropna(inplace=True)
    First of all, I tried the linear regression.

    Code:
    import numpy as np
    import statsmodels.api as sm
    
    x = full['Trend3months']
    y = full['R3M']
    x, y = np.array(x), np.array(y)
    x = sm.add_constant(x) # essential operation to build the design matrix before fitting the model
    model = sm.OLS(y, x)
    results = model.fit()
    print(results.summary())
    The results were as follows.

    You can try all combinations of explanatory and result variables yourself and make sure none of them work. A different approach is needed. I tried to investigate situations where the indicator falls for 2, 3, or 4 months in a row.

    Code:
    full['Fall2Months'] = (full['Trend1months'] < 0) & (full['Trend1months'].shift(1) < 0)
    full['Fall3Months'] = (full['Trend1months'] < 0) & (full['Trend1months'].shift(1) < 0) & (full['Trend1months'].shift(2) < 0)
    full['Fall4Months'] = (full['Trend1months'] < 0) & (full['Trend1months'].shift(1) < 0) & (full['Trend1months'].shift(2) < 0) & (full['Trend1months'].shift(3) < 0)
    
    print('Fall 2 Months, Return 1 month')
    print('True', full[full['Fall2Months'] == True]['R1M'].mean())
    print('False', full[full['Fall2Months'] == False]['R1M'].mean())
    print()
    print('Fall 2 Months, Return 2 month')
    print('True', full[full['Fall2Months'] == True]['R2M'].mean())
    print('False', full[full['Fall2Months'] == False]['R2M'].mean())
    print()
    print('Fall 2 Months, Return 3 month')
    print('True', full[full['Fall2Months'] == True]['R3M'].mean())
    print('False', full[full['Fall2Months'] == False]['R3M'].mean())
    
    The results were as follows.

    After the indicator fell for two consecutive months, cyclical stocks are performing worse than defensive stocks. It is consistent with the hypothesis.

    Then I used the t-test for the means of two independent samples to determine if this result was statistically significant.

    The full code follows.

    Code:
    from scipy import stats
    
    print('Fall 2 Months, Return 1 month')
    print('True', full[full['Fall2Months'] == True]['R1M'].mean())
    print('False', full[full['Fall2Months'] == False]['R1M'].mean())
    t,p = stats.ttest_ind(full[full['Fall2Months'] == True]['R1M'], full[full['Fall2Months'] == False]['R1M'], equal_var = False)
    print('p-value', p)
    print()
    print('Fall 2 Months, Return 2 month')
    print('True', full[full['Fall2Months'] == True]['R2M'].mean())
    print('False', full[full['Fall2Months'] == False]['R2M'].mean())
    t,p = stats.ttest_ind(full[full['Fall2Months'] == True]['R2M'], full[full['Fall2Months'] == False]['R2M'], equal_var = False)
    print('p-value', p)
    print()
    print('Fall 2 Months, Return 3 month')
    print('True', full[full['Fall2Months'] == True]['R3M'].mean())
    print('False', full[full['Fall2Months'] == False]['R3M'].mean())
    t,p = stats.ttest_ind(full[full['Fall2Months'] == True]['R3M'], full[full['Fall2Months'] == False]['R3M'], equal_var = False)
    print('p-value', p)
    print('----------------------')
    
    print('Fall 3 Months, Return 1 month')
    print('True', full[full['Fall3Months'] == True]['R1M'].mean())
    print('False', full[full['Fall3Months'] == False]['R1M'].mean())
    t,p = stats.ttest_ind(full[full['Fall3Months'] == True]['R1M'], full[full['Fall3Months'] == False]['R1M'], equal_var = False)
    print('p-value', p)
    print()
    print('Fall 3 Months, Return 2 month')
    print('True', full[full['Fall3Months'] == True]['R2M'].mean())
    print('False', full[full['Fall3Months'] == False]['R2M'].mean())
    t,p = stats.ttest_ind(full[full['Fall3Months'] == True]['R2M'], full[full['Fall3Months'] == False]['R2M'], equal_var = False)
    print('p-value', p)
    print()
    print('Fall 3 Months, Return 3 month')
    print('True', full[full['Fall3Months'] == True]['R3M'].mean())
    print('False', full[full['Fall3Months'] == False]['R3M'].mean())
    t,p = stats.ttest_ind(full[full['Fall3Months'] == True]['R3M'], full[full['Fall3Months'] == False]['R3M'], equal_var = False)
    print('p-value', p)
    print('----------------------')
    
    print('Fall 4 Months, Return 1 month')
    print('True', full[full['Fall4Months'] == True]['R1M'].mean())
    print('False', full[full['Fall4Months'] == False]['R1M'].mean())
    t,p = stats.ttest_ind(full[full['Fall4Months'] == True]['R1M'], full[full['Fall4Months'] == False]['R1M'], equal_var = False)
    print('p-value', p)
    print()
    print('Fall 4 Months, Return 2 month')
    print('True', full[full['Fall4Months'] == True]['R2M'].mean())
    print('False', full[full['Fall4Months'] == False]['R2M'].mean())
    t,p = stats.ttest_ind(full[full['Fall4Months'] == True]['R2M'], full[full['Fall4Months'] == False]['R2M'], equal_var = False)
    print('p-value', p)
    print()
    print('Fall 4 Months, Return 3 month')
    print('True', full[full['Fall4Months'] == True]['R3M'].mean())
    print('False', full[full['Fall4Months'] == False]['R3M'].mean())
    t,p = stats.ttest_ind(full[full['Fall4Months'] == True]['R3M'], full[full['Fall4Months'] == False]['R3M'], equal_var = False)
    print('p-value', p)
    print('----------------------')
    The results.

    When preparing scientific articles, we want the p-value to be below 0.05. However, for practical applications, higher values are also appropriate. A drop in the indicator for two months in a row looks like a reliable signal to run away from cyclical stocks.

    At the same time, when the indicator falls for four consecutive months, stocks behave strangely. Why? Because the Fed immediately floods the system with liquidity? Or is the sample size just too small?

    I have also investigated the situations where the indicator has been growing for several months in a row, but I found nothing interesting except this.

    Questions:
    1. Am I wrong somewhere in the analysis?
    2. Was the use of the t-test justified?
    3. Is it worth using the obtained result for practical trading?
     
  2. 2rosy

    2rosy

  3. Kust

    Kust

    XLY is a proxy for cyclical stocks. XLP is a proxy for defensive stocks. If I change XLP to XLU, the results are very much the same.

    Manufacturing new orders is a leading indicator. Theory says that it can predict the state of the economy in the future. If it behaves good, cyclical stocks should outperform defensive stocks. I could not prove or disprove it. If manufacturing new orders begin to fall, one should go away from cyclical stocks before they fall. It seems to be true but I am not sure about the correctness of my methodology.

    Code:
    full['Fall2MonthsAfterGrowth'] = (full['Trend1months'] < 0) & (full['Trend1months'].shift(1) < 0) & (full['Trend1months'].shift(2) > 0)
    
    print('Fall 2 Months After Growth, Return 1 month')
    print('True', full[full['Fall2MonthsAfterGrowth'] == True]['R1M'].mean())
    print('False', full[full['Fall2MonthsAfterGrowth'] == False]['R1M'].mean())
    t,p = stats.ttest_ind(full[full['Fall2MonthsAfterGrowth'] == True]['R1M'], full[full['Fall2MonthsAfterGrowth'] == False]['R1M'], equal_var = False)
    print('p-value', p)
    print()
    print('Fall 2 Months After Growth, Return 2 month')
    print('True', full[full['Fall2MonthsAfterGrowth'] == True]['R2M'].mean())
    print('False', full[full['Fall2MonthsAfterGrowth'] == False]['R2M'].mean())
    t,p = stats.ttest_ind(full[full['Fall2MonthsAfterGrowth'] == True]['R2M'], full[full['Fall2MonthsAfterGrowth'] == False]['R2M'], equal_var = False)
    print('p-value', p)
    print()
    print('Fall 2 Months After Growth, Return 3 month')
    print('True', full[full['Fall2MonthsAfterGrowth'] == True]['R3M'].mean())
    print('False', full[full['Fall2MonthsAfterGrowth'] == False]['R3M'].mean())
    t,p = stats.ttest_ind(full[full['Fall2MonthsAfterGrowth'] == True]['R3M'], full[full['Fall2MonthsAfterGrowth'] == False]['R3M'], equal_var = False)
    print('p-value', p)
    print('----------------------')
    Seems to work fine.

    Do you see anything wrong with such approach?

    Currently, I see no benefits in using more complicated models. Maybe those benefits exist.
     
  4. The ETFs you selected are both related to consumers and consumer spending patterns. I'm not so sure that the manufacturing new orders indicator is strongly correlated to consumer spending. My guess is that a substantial portion of the indicator is influenced by business-to-business orders (e.g. factories buying new machines, navies buying new warships, or airline companies buying new planes). That is why I was asking you for the background of the choices you made. Anyway, I do hope that you'll find something that gives you an edge.