PYTHON
[|]
#Part 9

Macroeconomic Indicators Affecting Stock Market

You may have heard the terms like 'rising inflation', 'decelerating growth', 'central banks interventions', and 'unemployment' many times, but don't understand if this makes any sense to your investment process?
You're right, not everyone took courses in Economics and understands the potential links, even less can estimate the connection between the macro indicators and stock market.
In this article we'll show how to download and process the most popular macroeconomic time series, find the correlation with major indexes growth, and build a naive explanatory system to identify the most important signals.
Discussion in Telegram
Screencasts on Youtube
Articles on Medium
Code on Github

Introduction

A number of different aspects to stock investments inform the approaches that investors take: value (growth and dividends), technical indicators, daily vs. long-term, arbitrage, as well as many others.
"Bear" and "Bull" stock markets, which can often be mapped to periods of economic growth or contraction, add complexity to the strategies, as all the aforementioned approaches must respond to these basic cycles.

History teaches us that everything is interdependent, especially when something big and unexpected happens. One only has to look at any of the major financial events of the last 20 or so years to appreciate the extent to which these cycles have an impact on investments. Any event can trigger a big chain reaction in the world, resulting in a recession (e.g. the drop in oil prices in 1973, the default on Russian gov-t bond obligations in 1998, the Dot-Com bubble crisis in 2000, the US subprime mortgage loans crisis in the US in 2008). Investors can anticipate a drawdown of 5% (3 times a year on average), and even 10% once a year, but not when there's a 30% drop in one day. There have been at least 23 events of deep and long crashes in the last 35 years (source: "Stock market crashes and Bear Markets"), which have caused some structural market shifts and panic among financial markets in most of the cases.

It is, therefore, vital to have a good understanding of economic development, as well as potential changes in behaviour (confidence in the future, financial stress, expectations, etc.) of individual investors and institutions (ETF funds, Mutual funds, etc.) in order to react quickly to adverse scenarios when they arise.
Executive Summary
In this article we've examined more than 65 macro indicators (166 transformed time series) and found various correlations with stock markets growth. It is important to understand that Macroeconomics can't accurately predict stock market dynamics by itself, but can have a big impact in some (adverse) scenarios. Thus, you should try to look at the factors like GDP growth, Inflation and Consumer Prices, Debt, Unemployment, Financial Stress and Market Volatility, and many others and draw you conclusions before the news appear and stock market reacts.

It is quite straightforward to get the macro data with Python using Pandas Datareader, but some tricks need to be done for data transformation and merge. Check out our Github page for a full implementation code (Part 9 "Macro Indicators vs. Stock Indexes Growth").

Summary of Results

  • There is a number of macro series that are widely used in conjunction with stock markets
    GDP, CPI, Interest rates, Unemployment, Consumer confidence and debt burden, etc.
    01
  • Most of the data sources are FREELY available on FRED and other databases
    We've chosen a good set of 65 metrics from FRED, Nasdaq Data (former Quandl), and STOOQ to start with (not a comprehensive set).
    02
  • Several macro-series are highly correlated with SNP&500 and DJI index CURRENT growth
    The most important ones: Gold Volatility Index, Financial Stress Index, Industrial Production, Shiller P/E ratio, etc.
    03
  • The list of top correlated macro stats with the FUTURE growth numbers of SNP&500 and DJI is different and the correlation is weaker
    The most important ones: Velocity of Money, Personal Saving Rate, US Dollar Index, Total Public Debt, 10- and 5-Year Breakeven Inflation Rate, etc.
    04
  • We've got the marginal impact of each indicator using the Decision Tree model
    Top 5 indicators: Velocity of Money (M1), Dividend Ratio, Volatility index VIX, Corp. profits after tax, Non-cyclical Rate of Unemployment
    05

Types of Economic Factors and Their Potential Influence

To this end, here we shall list the fundamental macro factors that can potentially influence financial market performance. There is no unambiguous consensus on the direction of impact from each individual factor, but, combined, they tend to play a big overall role.

Growth (link): when the economy is in expansion mode - consumers are confident in the future and spend more, which leads to increased profits for the producers and higher earnings and dividends at the end of the year. Companies are able more easily to borrow money by issuing more stocks during the IPO (when there is a strong demand from investors eager to buy new stocks).

Prices and Inflation (link): when prices tend to go up and the real purchasing power of money is down. Consumers try to save the existing capital by investing into the riskier assets to 'cover' the potential value shrinkage from the inflation.

Money Supply (link): an increase in money supply leads to lower interest rates (as Fed is mostly printing more money by buying T-bills) and higher attractiveness of stocks.

Interest Rates (link): Interest rates refer to the cost someone pays for the use of someone else's money. In a broad sense, they regulate the price at which banks lend money to other banks, as well as interest rates on personal loans and mortgages. High interest rates also raise the interest on bonds, as they start to earn more risk-free percentage income.

Unemployment: rising unemployment can be a good sign during a rising economy (business is growing too fast and needs more people to hire) and a bad sign during a contraction. There are good indicators like weekly unemployment claims that can show early signs of massive changes before the quarterly GDP stats are collected and published.

Income and Expenditure (link): when disposable income increases, households have more money and either save or spend, the latter of which naturally leads to a growth in consumption. More consumption has a knock-on effect: the stronger the demand for goods, the more profitable companies are, the more jobs they create, the higher the wages and the greater the potential income to invest etc.

Government Debt (link, link2): countries issue more debt to cover the yearly budget deficit (current US debt is >100% of its GDP) or finance unexpected "force majeure"-type scenarios (like Covid-19). "In the future, countries will be forced to pay debt either through raising taxes or by printing more money to pay for that debt, which could end up slowing growth or risking higher inflation. Both of those things can impact equity and bond markets," [Rhea Thomas, senior economist at Wilmington Trust in Wilmington, Delaware].

Fx Rates (e.g. USD/EUR): a strong national currency makes domestic production appealing to investors and may attract more foreign investors.

Alternative investments (prices of oil/gold/bitcoin/etc.): growing returns in alternative classes of investments can shift money away from stocks.

    2020-2021 Macro Snapshot

    Here, we will briefly describe the latest changes in the macro setup for 2020 and 2021, and try to apply the forces from the previous paragraph to an analysis of the current situation.


    The World eagerly awaits (maybe too optimistically) the bounce-back from COVID-19 (49% of the global population received at least one dose of a COVID-19 vaccine) as it hopes to come back on track with GDP growth (US GDP: from -9% in Q2'20 to +3% in Q2'21). During the pandemic, government entities (the Fed and the ECB) boosted massive support programs ("Quantitative Easing (QE)") by printing money to help businesses and populations struggling from the lockdowns but they also increased Government debt from 107% to 125% to GDP. High money supply is driving expectations of inflation and more retail investors are thinking of taking up stocks investment to save their capital from inflation. The household savings rate was up from 10% to 30% during the first months of COVID-19 in 2020, and it is back to 10% in Aug-2021 (probably, the population is overly relaxed now with less risk from the deadly virus, hoping that Governments have the situation under control). Market volatility is low, and the financial stress index (for ordinary people) is returning to its normal level. Weekly unemployment claims have dropped sharply to a minimum value showing a quick recovery from last year's covid-related peak. Stock market indexes S&P500 and DJI reached all-time highs this year in Feb'21 and end-Aug'21 ("Closing milestones of the S&P 500"), despite a massive shock in GDP the previous year. There is a worrying uptrend with the Shiller P/E ratio almost reaching its previous peak from the 2000 "DotCom Bubble" (the ratio shows the increase in the largest stocks prices divided by their earnings).



    An Overview on YouTube

    Here is a quick hands-on video, where you can learn how to get the macro data from The Federal Reserve Bank of St.Louis' database (FRED) and other data sources with Python, merge them all together into one Pandas dataframe, and find correlations with S&P500 and DJI indexes growth. We also build a simple ML-model to understand the importance of the individual features and rank them accordingly.

    Getting the Macro Data

    The St. Louis Fed's "FRED" database is arguably the most amazing economics datasite on the internet.' [Business Insider, 2013]

    While the website contains more than 148,000 economic data time series overall, we've focused on the most popular ones (in relation to the stock market), which are coupled together in several groups: "Growth", "Prices and Inflation", "Money Supply", "Interest Rates", "Employment", "Income and Expenditure", "Government Debt", "Other (Uncategorised) Indicators".

    Here is the full list of FRED indicators used:
    Py1. Macro Indicators from FRED
    # Macro economic indicators (mostly US) from the FRED database
    # Detailed info on each indicator check on web: https://fred.stlouisfed.org/series/<indicator_name>
    # DOC with the metrics and external exploratory Colab: https://docs.google.com/document/d/1Cf4C3Xz4_yitlzPaLEknHoDlw7KMXey4c49kZ7ucQEE/edit?usp=sharing
    
    FRED_INDICATORS = ['GDP', 'GDPC1', 'GDPPOT', 'NYGDPMKTPCDWLD',         # 1. Growth
                       'CPIAUCSL', 'CPILFESL', 'GDPDEF',                   # 2. Prices and Inflation
                       'M1SL', 'WM1NS', 'WM2NS', 'M1V', 'M2V', 'WALCL',    # 3. Money Supply
                       'DFF', 'DTB3', 'DGS5', 'DGS10', 'DGS30', 'T5YIE',   # 4. Interest Rates
                       'T10YIE', 'T5YIFR', 'TEDRATE', 'DPRIME',            # 4. Interest Rates
                       'UNRATE', 'NROU', 'CIVPART', 'EMRATIO',             # 5. Employment
                       'UNEMPLOY', 'PAYEMS', 'MANEMP', 'ICSA', 'IC4WSA',   # 5. Employment
                       'CDSP', 'MDSP', 'FODSP', 'DSPIC96', 'PCE', 'PCEDG', # 6. Income and Expenditure
                       'PSAVERT', 'DSPI', 'RSXFS',                         # 6. Income and Expenditure
                       'INDPRO', 'TCU', 'HOUST', 'GPDI', 'CP', 'STLFSI2',  # 7. Other indicators
                       'DCOILWTICO', 'DTWEXAFEGS', 'DTWEXBGS',             # 7. Other indicators
                       'GFDEBTN', 'GFDEGDQ188S',                           # 8. Gov-t debt
                       # 9. Additional indicators from IVAN
                       'DEXUSEU', 'GVZCLS', 'VIXCLS', 'DIVIDEND',
                       # 9. Additional indicators from IVAN
                       'MORTGAGE30US', 'SPCS20RSA'
                       ]
    You can view the full description and historical values of any indicator on its corresponding Web-page by typing in the following address and putting the indicator name you are interested in at the end: https://fred.stlouisfed.org/series/<indicator_name>. E.g. for GDP, type the following link: https://fred.stlouisfed.org/series/GDP.

    As not all the data indicators were available on FRED, I went to another data source called Nasdaq Data (former QUANDL) to get a few more daily indicators and another one called STOOQ to get the historical S&P 500 (SPX) and Dow Jones Industrial average index (DJI) values.
    Py2. Nasdaq data (Quandl) and Stooq
    # Macro Indicators from QUANDL
    QUANDL_INDICATORS = {'BCHAIN/MKPRU', 'USTREASURY/YIELD', 'USTREASURY/REALYIELD',  # 9. Additional indicators from IVAN
                         # 9. Additional indicators from IVAN
                         'MULTPL/SHILLER_PE_RATIO_MONTH', 'LBMA/GOLD'
                         }
    
    # Stock maret indexes
      # All indexes: https://stooq.com/t/ 
    STOOQ_INDICATORS = {'^DJI','^SPX'}  
    We utilised those three databases to construct a dictionary of disconnected time series, which we then transformed to relative levels and joined together at a later stage to produce a unified dataset.

    Here is the result for the 65 features downloaded (it's a simple concatenation of 3 lists above):
    Py3. Full list of 65 macro features
    for i,value in enumerate(macro_indicators.keys()):
      if i%6==0:
        print('\n')
      print(value, end =", ")  
    
    # OUTPUT:
    # GDP, GDPC1, GDPPOT, NYGDPMKTPCDWLD, CPIAUCSL, CPILFESL, 
    # GDPDEF, M1SL, WM1NS, WM2NS, M1V, M2V, 
    # WALCL, DFF, DTB3, DGS5, DGS10, DGS30, 
    # T5YIE, T10YIE, T5YIFR, TEDRATE, DPRIME, UNRATE, 
    # NROU, CIVPART, EMRATIO, UNEMPLOY, PAYEMS, MANEMP, 
    # ICSA, IC4WSA, CDSP, MDSP, FODSP, DSPIC96, 
    # PCE, PCEDG, PSAVERT, DSPI, RSXFS, INDPRO, 
    # TCU, HOUST, GPDI, CP, STLFSI2, DCOILWTICO, 
    # DTWEXAFEGS, DTWEXBGS, GFDEBTN, GFDEGDQ188S, DEXUSEU, GVZCLS, 
    # VIXCLS, DIVIDEND, MORTGAGE30US, SPCS20RSA, BCHAIN_MKPRU, USTREASURY_YIELD, 
    # MULTPL_SHILLER_PE_RATIO_MONTH, USTREASURY_REALYIELD, LBMA_GOLD, SPX, DJI, 
    

    Data Transformations

    There is one major pain point for many of the 'always growing' factors like GDP ($b), SPX (points): we can't include them unchanged in the dataframe.

    Unlike 'stationary' indicators like Mortgage rates (usually, 1-5% rate) or Savings rates (usually, 10%-30% of the total household income), other indicators may grow higher and higher for years to come (non-stationary time series).

    If you add those factors without any transformation, you will receive bad results during the prediction phase, as any model will learn on small levels during the first years and won't know what to predict when the input data is too high (due to it never seeing those high levels during the training phase).

    That's why the 'growth' transformations are introduced for all non-stationary time series. These are Day-on-Day (DoD), Week-on-Week (WoW), Month-on-Month (MoM), Quarter-on-Quarter (QoQ), Year-on-Year (YoY) growth rates in percentage.

    Check out the Colab function get_macro_shift_transformation(macro_indicators_dict) for more details.

    Here are several examples (the original 'non-stationary' indicators are removed from the dataset in most cases):

    Joining All Together In a Single Dataframe

    The next problem you're likely to face is how to join up all the time series you're using in one dataframe. Here is the list of the potential difficulties, which you need to take into consideration:
    • The first time series subset is updated daily Monday-Friday (no weekend stats when the stock market is closed). The second is updated weekly on Wednesday or Sunday (or another day of the week), the third arrives monthly, and the fourth is updated every quarter or even yearly;
    • The data in the time series are not updated exactly when the period is over, but only when several days, weeks, or even months have passed (you may want to backfill the missing values with the latest available value);
    • Some periods are indexed by a period-end (e.g. weekly stats from FRED), and others with a period-start (e.g. monthly stats from FRED).
    Here is what you can do to overcome this:
    • Start with all the records of daily data (e.g. SPX and DJI indexes, Gold prices, most of the interest rates, etc.) and join them together by Date;
    • Add weekly, monthly, quarterly, yearly series to the Daily dataset from the set above by broadcasting the latest available value for all days until it is refreshed
    You can find more in the function get_daily_macro_stats_df(daily_df, macro_ind_df, regime='LAST'), which 'intelligently' aligns all types of frequency time series with a daily basis. There is a 'merge' code afterwards that stacks up all set of series in a single pile.


    You will end up with 2.5x more (166) indicators (all new are the growth transformations of existing):

    Py4. The full list of 166 features
    i=1
    for value in macro_df.keys():
      if not ('future' in value):
        print(value, end =", ")
        if i%8==0:
          print('\n')
        i+=1
    
    # OUTPUT:
    # WM1NS_wow, WM1NS_mom, WM2NS_wow, WM2NS_mom, WALCL_wow, WALCL_mom, DFF, DTB3, 
    # DGS5, DGS10, DGS30, T5YIE, T10YIE, T5YIFR, TEDRATE, DPRIME, 
    # ICSA_wow, ICSA_mom, IC4WSA_wow, IC4WSA_mom, STLFSI2, STLFSI2_wow, STLFSI2_mom, DCOILWTICO, 
    # DCOILWTICO_growth_1d, DCOILWTICO_growth_3d, DCOILWTICO_growth_7d, 
    # DCOILWTICO_growth_30d, DCOILWTICO_growth_90d, DCOILWTICO_growth_365d, DTWEXAFEGS, DTWEXBGS, 
    # DEXUSEU, GVZCLS, VIXCLS, MORTGAGE30US, MORTGAGE30US_wow, MORTGAGE30US_mom, BCHAIN_MKPRU, BCHAIN_MKPRU_growth_1d, 
    # BCHAIN_MKPRU_growth_3d, BCHAIN_MKPRU_growth_7d, BCHAIN_MKPRU_growth_30d, BCHAIN_MKPRU_growth_90d, BCHAIN_MKPRU_growth_365d, LBMA_GOLD, LBMA_GOLD_growth_1d, LBMA_GOLD_growth_3d, 
    # LBMA_GOLD_growth_7d, LBMA_GOLD_growth_30d, LBMA_GOLD_growth_90d, LBMA_GOLD_growth_365d, SPX, SPX_growth_1d, SPX_growth_3d, SPX_growth_7d, 
    # SPX_growth_30d, SPX_growth_90d, SPX_growth_365d, DJI, DJI_growth_1d, DJI_growth_3d, DJI_growth_7d, DJI_growth_30d, 
    # DJI_growth_90d, DJI_growth_365d, GDP_qoq, GDP_yoy, GDPC1_qoq, GDPC1_yoy, GDPPOT_qoq, GDPPOT_yoy, 
    # NYGDPMKTPCDWLD_yoy, CPIAUCSL_mom, CPIAUCSL_yoy, CPILFESL_mom, CPILFESL_yoy, GDPDEF, GDPDEF_qoq, GDPDEF_yoy, 
    # M1SL_mom, M1SL_yoy, M1V, M1V_qoq, M1V_yoy, M2V, M2V_qoq, M2V_yoy, 
    # UNRATE, UNRATE_mom, UNRATE_yoy, NROU, NROU_qoq, NROU_yoy, CIVPART, CIVPART_mom, 
    # CIVPART_yoy, EMRATIO, EMRATIO_mom, EMRATIO_yoy, UNEMPLOY_mom, UNEMPLOY_yoy, PAYEMS_mom, PAYEMS_yoy, 
    # MANEMP_mom, MANEMP_yoy, CDSP, CDSP_qoq, CDSP_yoy, MDSP, MDSP_qoq, MDSP_yoy, 
    # FODSP, FODSP_qoq, FODSP_yoy, DSPIC96_mom, DSPIC96_yoy, PCE_mom, PCE_yoy, PCEDG_mom, 
    # PCEDG_yoy, PSAVERT, PSAVERT_mom, PSAVERT_yoy, DSPI_mom, DSPI_yoy, RSXFS_mom, RSXFS_yoy, 
    # INDPRO, INDPRO_mom, INDPRO_yoy, TCU, TCU_mom, TCU_yoy, HOUST_mom, HOUST_yoy, 
    # GPDI_qoq, GPDI_yoy, div_ratio, CP_qoq, CP_yoy, GFDEBTN_qoq, GFDEBTN_yoy, GFDEGDQ188S, 
    # GFDEGDQ188S_qoq, GFDEGDQ188S_yoy, DIVIDEND_qoq, DIVIDEND_yoy, SPCS20RSA, SPCS20RSA_mom, SPCS20RSA_yoy, MULTPL_SHILLER_PE_RATIO_MONTH, 
    # MULTPL_SHILLER_PE_RATIO_MONTH_mom, MULTPL_SHILLER_PE_RATIO_MONTH_yoy, 

    Correlation Analysis

    Now we're ready to move on to the exciting stuff.
    The first thing that you've already read in Intro is that (macro-)economics and financial markets are interdependent, but it's not clear how tightly or what are the most correlated time series. Here are the top results for SNP500 365days growth:

    Correlation matrix for a portfolio of stocks
    Fig. 1-1 Top correlated features with SNP&500 365d growth
    Most negatively correlated factors: GVZCLS (CBOE Gold ETF Volatility Index) , STLFSI2 (St. Louis Fed Financial Stress Index), FODSP (Household Financial Obligations as a Percent of Disposable Personal Income), VIXCLS (CBOE Volatility Index VIX), CDSP (Consumer Debt Service Payments as a Percent of Disposable Personal Income).

    Most positively correlated factors
    : INDPRO_yoy (Industrial Production YoY growth), MANEMP_yoy (All Employees in Manufactoring YoY), MULTPL_SHILLER_PE_RATIO_MONTH_yoy (Shiller PE Ratio YoY growth), DJI_growth_365d (Dow Jones Industrial Average 365d growth)

    You can find similar patterns for top correlated factors with DJI 365days growth (DJI is another popular stock market index):
    Correlation matrix for a portfolio of stocks
    Fig. 1-2.Top correlated features with DJI 365d growth
    There is only one new top negative factor UNEMPLOY_yoy (Unemployment Level YoY growth).
    You can also check the same correlation stats for SNP500 90d and 30d growth in the Colab notebook.

    Result (correlations with the CURRENT growth of an index): as you can see from the numbers above there is a strong correlation (-0.6..-0.4 to +0.6-0.75) between some of the factors and the growth of stock market indexes (SPX and DJI). Unfortunately, we can't easily retrieve the marginal impact of each indicator in order to get the most significant ones and those causing the 'chain' reaction for other indicators to follow. The good thing is that the correlated factors are mostly the same for SPX_growth (30d, 90d, or 365d) and DJI_growth_365d, which verifies the robustness of the results.

    All of this is quite appealing, but not very helpful if you want to predict what will happen in the future with the stock market by knowing all recent changes in the macroeconomic indicators. The cause of the problem is the 'simultaneous' correlation values we were looking at (current macro stats correlated against the current growth of an index, and not forward-looking growth).

    One step forward is to start looking for the correlation of the same macro features with the changes in stock market indexes in the future (link to Github - check the section "4) Correlation Analysis").

    But before doing that, let's exclude all but one _future_growth_* indicators from the dataset (filter named macro_df_no_future_ind), as all future indicators are mostly correlated with each other, while we want to predict the future indicator with the current values:
      Py5. Exclude _future_indicators from the keys
      # Future growth indicators are mostly correlated with each other
      future_ind = []
      for ind in macro_df.keys():
        if 'future' in ind:
          future_ind.append(ind)
        
      print(future_ind)
      # OUTPUT:
      # ['SPX_future_growth_1d', 'SPX_future_growth_3d', 'SPX_future_growth_7d', 'SPX_future_growth_30d', 'SPX_future_growth_90d', 'SPX_future_growth_365d', 'DJI_future_growth_1d', 'DJI_future_growth_3d', 'DJI_future_growth_7d', 'DJI_future_growth_30d', 'DJI_future_growth_90d', 'DJI_future_growth_365d']
      
      
      # include all features 
      macro_df_no_future_ind = macro_df.keys()
      # do not use future_ind in the list to find correlations with the label (which is a future_indicator)
      macro_df_no_future_ind = macro_df_no_future_ind.drop(future_ind)
      Correlation matrix for a portfolio of stocks
      Fig. 1-3 Top correlated features with FUTURE SNP&500 365d growth
      Correlation matrix for a portfolio of stocks
      Fig. 1-4 Top correlated features with FUTURE SNP&500 90d growth
      Correlation matrix for a portfolio of stocks
      Fig. 1-5 Top correlated features with FUTURE SNP&500 30d growth
      Result (correlations with the FUTURE growth of an index): the common top correlated indicators available in 2-3 experiments are: M1V (Velocity of Money), T10YIE and T5YIE (10- and 5-Year Breakeven Inflation Rate), PSAVERT (Personal Saving Rate), DTWEXAFEGS (Nominal Advanced Foreign Economics U.S. Dollar Index).

      Another interesting fact is that the general level of top correlations goes down (from +-0.4 for 365 days to +-0.15 for 30 days) when we try to predict growth for the latter period.
      This makes perfect sense, as stock markets are chaotic and volatile in the short term and very hard to predict. Also you shouldn't expect the macro inputs to have an immediate impact (their influence should grow influence over time).

      Overall, the macro factors correlations with forward-looking data (growth of a stock index in a future) are weaker (|corr| in [0.15;045]) than the macro factors corrrelated with simultaneous data (|corr| in [0.5;07]).

        Decision Trees for Features Importance

        The previous results gave us a head start, but this may not be good enough for the simple reason that those features can be correlated between each other and, thus, an analyst can't identify what causes the change of an index in the first place. With this problem in mind, we'll find the marginal impact of each individual indicator and rank them accordingly.

        Decision Tree is one of the most convenient methods (full details from the Scikit-learn library) for quickly predicting the future growth of S&P 500 (90 days) and getting the features' importance.

        The model is not very precise (check the code and picture below), as it is nearly impossible to predict future stock market movement well only by looking at the macro indicators (they have limited explanation power). So any conclusions regarding the importance/ranking of the features should be taken with a grain of salt.

        Anyway, it's better to have something and hope that those relations and sorted order will persist in the 'full' model that has all relevant features.

        Here is the sample code and the graph for actual vs. predicted values:
        Py6. Decision Tree model to get the features importance
        # imports
        from collections import OrderedDict
        from sklearn.tree import DecisionTreeRegressor
        from matplotlib import pyplot
        
        # all features should be numeric
        for key in macro_df.keys():
          macro_df[key] = macro_df[key].astype(float)
        
        # include all features 
        X_keys = macro_df.keys()
        # do not use future ind to predict
        X_keys = X_keys.drop(future_ind)
        
        # deep copy of the dataframe not to change the original df
        macro_copy = macro_df.copy(deep=True)
        
        macro_copy.fillna(0,inplace=True)
        # macro_copy.dropna(inplace=True)
        
        #get all features in X and dependent variable in y
        X = macro_copy[X_keys]
        y = macro_copy['SPX_future_growth_90d']
        
        # define a function that returns an ordered dictionary of features, sorted by importance
        def get_importance_features(model):
          importance = model.feature_importances_
          feat_imp = OrderedDict()
          # summarize feature importance
          for i,v in enumerate(importance):
            feat_imp[X.keys()[i]] = importance[i]
        
          # https://stackoverflow.com/questions/613183/how-do-i-sort-a-dictionary-by-value  
          sorted_feat_imp = sorted(feat_imp.items(), key=lambda kv: kv[1])
        
          return sorted_feat_imp
        
        # init the class and fit the model
        decision_tree_model = DecisionTreeRegressor()
        decision_tree_model.fit(X, y)
        
        decision_feat_imp = get_importance_features(decision_tree_model)
        Normalised portfolio metrics (return, return-per-risk, max drawdown, etc.)
        Fig 1-6. Actual (blue) vs. Predicted (orange) graph of SNP500 future 90-days growth for the DecisionTree model
        Here are the results of the 10 most important features when predicting the 90-days future growth for S&P500:
        Portfolio of stocks performance scoring system
        Fig 1-7. Top 10 macro stats by their (marginal) importance (bigger coef. - better)
        As you can see, the top important indicators look similar to what we saw during the correlation analysis, but some are new. They may not be highly correlated with the indexes growth directly, but can have an important marginal impact:
        Conclusion
        In this article we walked through the whole process of getting macroeconomic factors and embedding them in the stock market analysis. We started from identifying the most important (and widely acknowledged) macro series and then explained their potential influence on the stock market. We continued by using Python's Pandas Datareader to get the data in a set of arrays, generated derived statistics, and converted them to one dataframe.
        Finally, we conducted a correlation analysis and built a Decision Tree to select the most powerful indicators and sort them by magnitude of influence.

        Do you find the article useful?

        Do you like the content?
        Consider to make a donation
        Whether you're grateful for the content (Buy Me A Coffee page), or you wish to support me coding (GitHub sponsorship page)

        Leave your feedback on the article

        For example, is it easy to understand?
        For example, could you run the code?
        For example, do you have idea to improve the article ?

        Here you'll find the best articles from PythonInvest. Only useful digests, no spam.