#Part 5

Developing a Short Term Investment Strategy Based on Earnings-Per-Share (EPS) Data

The main idea is to check the long-term performance of EPS predicted vs. actual EPS and its influence on the stock price
FREE Analytical Course by PythonInvest HAS STARTED (you can still join)
Discussion in Telegram
Screencasts on Youtube
Articles on Medium
Code on Github


If you're following the stock market news, you will encounter the financial terms like revenue, net income, earnings-per-share (EPS). They often appear just after the quarterly and annual earning calls. The metrics reflect the immediate operational results of the company and analysts use them to calculate the long-term profitability trend which is concluded in a fair stock price.

In this article, we will take a closer look at the historical EPS for ~200 companies, selecting the most active stocks during the last trading day (which normally include the largest "blue chips" and some other less popular companies). We get the stock price data around the dates of quarterly reports and calculate the price increase or decrease just after the the results announcement.

In the course of this article, we'll try to find answer to the following questions:

  • Do all companies try to report very close to estimates? What is big enough Surprise(%) to cause a shock in a short-time stock prices?
  • If the spike caused a rapid change in the stock price, does it remain for a several days and can we predict the direction of its movement (so that one can use the knowledge to make a short term investments)?
  • Does consistent reporting of a slight positive surprise bring to long-term stocks growth?
  • Does a gradual increase in EPS increase the valuation of a company?
The ultimate goal is to find a segment of stocks, which has a high probability of growth during the several days after the announcement of quarterly results.
This article is the fifth part in the series. The previous part 4 was an introduction to the topic: it showed how to scrape one page from on earnings-per-share and concentrated on one reporting period (earning reports in Aug'20, which covered Q2'20 revenues). This part is much more advanced: it takes the whole available history for the selected set of stocks and aims to find investment opportunities rather than a simpler exploratory analysis.
Executive Summary
This article extends the previous chapter about 'Scrapping EPS' for one quarter of reporting (Q2'20). This time we take about 200 traded companies on a latest date and get the long-term history of their EPS (actual/predicted/surprise(%)). We generate metrics of a stock's growth 1 and 7 days after the reporting, analyse the patterns when a stock shows better results after 7 days rather than 1 day. We start from several examples of large stocks, and then try to generalise that by doing the analysis at scale. We conclude with 7 insights appeared throughout the chapter.

Preparation Steps

As in the previous parts of this series, we will use a Google Colab notebook ( to implement the Python code needed for the research.

To start with, we'll need a standard set of analytical, scrapping, and finance imports:
Py1. Imports

import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

Then, we create a set of tickers to be used in our analysis:
Py2. Additional tickers


Functions For Scraping

Since we're going to obtain some data for our analysis by scraping, we need to create some code for this. Reusing the approach from Part 4 on scrapping one page with one table, and cleaning it up (we need to clean the missing values), we create the get_scraped_yahoo_finance_page() function that works as follows:

  • Takes URL and URL_PARAMs as input arguments
  • Finds one existing table in the HTML table
  • Saves the table column manes (there is one hack for Most-active page, as one column is not visible)
  • Retrieves the values row-by-row
  • Returns a dataframe with all fields as an object
We also need to create clean_earnings_history_df() that removes all cells with missing values ('-').

Here is the implementation of these functions:
Py3. Scrape one page from Yahoo Finance

# Example of a full URL with one stock Symbol
# url = ""

def get_scraped_yahoo_finance_page(url,url_params):
  # url = ""
  # url_params = {'symbol':ticker}
  r = requests.get(url,params=url_params)
  # DEBUG:
  # print('Response OK? :',r.ok)
  # print('Status code:', r.status_code )
  # print('Headers:', r.headers)
  # print('Content:', r.content)
  soup = BeautifulSoup(r.text)
  table = soup.find_all('table')
  # DEBUG: If we found just 1 table -> it is good
  print('We\'ve found tables:', len(table))
  if len(table)!=1:
    return None
  # Get all column names
  if len(soup.table.find_all('thead'))==0:
    return None

  spans = soup.table.thead.find_all('span')
  columns = []
  for span in spans:

  # Hack: due to some reason one of the columns are not <span> tag for calendar/earnings and is not discovered in the columns list. 
  # we manually add this column 
  if url.find('') != -1 :
    columns.insert(len(columns)-2,"Market Cap")

  rows = soup.table.tbody.find_all('tr')  
  # read row by row
  stocks_df = pd.DataFrame(columns=columns)

  for row in rows:
    elems = row.find_all('td')
    dict_to_add = {}
    for i,elem in enumerate(elems):
      dict_to_add[columns[i]] = elem.text
    stocks_df = stocks_df.append(dict_to_add, ignore_index=True)  

  return stocks_df

# The only record per stock appears with this patern : the next earnings date
def get_next_earnings_records(stocks_df):
  filter1 = stocks_df['EPS Estimate']!='-'
  filter2 = stocks_df['Surprise(%)']=='-'
  filter3 = stocks_df['Reported EPS']=='-'
  rez_df = stocks_df[filter1 & filter2 & filter3]
  return rez_df

# Remove all records with not filled stats and cast to float values
def clean_earnings_history_df(stocks_df):
  filter1 = stocks_df['EPS Estimate']!='-'
  filter2 = stocks_df['Surprise(%)']!='-'
  filter3 = stocks_df['Reported EPS']!='-'
  stocks_df_noMissing = stocks_df[filter1 & filter2 & filter3]
  stocks_df_noMissing['EPS Estimate'] = stocks_df_noMissing['EPS Estimate'].astype(float)
  stocks_df_noMissing['Reported EPS'] = stocks_df_noMissing['Reported EPS'].astype(float)
  stocks_df_noMissing['Surprise(%)'] = stocks_df_noMissing['Surprise(%)'].astype(float)
  return stocks_df_noMissing

Let's now try out the above functions, getting the stats for F (Ford):
Py4. Test scraping result for Ford (F)

ticker = 'F'

f_df = get_scraped_yahoo_finance_page(url = "",url_params = {'symbol':ticker})

f_df_clean = clean_earnings_history_df(f_df)

The result set generated for Ford shows that Ford had great results recently:
Ford company (F) last 5 EPS results
Fig. 1 Ford company (F) last 5 EPS results

Getting the Top 200 Traded Stocks

Here we'll look at the 200 most active stocks during the last trading day (mid-Nov 2020). The list (in conjunction with the selected stocks) will be used to get the historical values of EPS and future returns on the dates of EPS (to test the hypothesis if good or bad EPS predict good returns). We will apply the regex approach to convert all text values to numeric symbols (% values and magnitude M, B, T).
Py5. Scrape the most active stocks (another page) from Yahoo Finance

num_stocks = 200
most_active_stocks = get_scraped_yahoo_finance_page(url = "",url_params = {'count':num_stocks})
Top 10 stocks for the last trading (20 Nov. 2020)
Fig. 2 Top 10 stocks for the last trading (20 Nov. 2020)
The problem is that all the values in the dataframe are objects. Not integers or floats, meaning you can't make arithmetic operations on them.
Scraped dataframe fields types
Fig. 3 Scraped dataframe fields types
Py6. Convert string values with magnitude to float

import re

POWERS = {'T': 10 ** 12,'B': 10 ** 9, 'M': 10 ** 6, '%': 0.01, '1':1}

"""Read a string (with M/B/T/% values)
   Return a correct numeric value
def convert_str_to_num(num_str):
   match ="([0-9\.-]+)(M|B|T|%)?", num_str)
   if match is None:
     return None
     quantity =
     if is None:
       magnitude = '1' # no modificator in the end -> don't multiply on anything
       magnitude =
    #  print(quantity, magnitude)
     return float(quantity) * POWERS[magnitude]
We apply the above function column-by-column to the dataframe to convert its object values to numbers, transform the 'Market Cap' to log10 values, and build a histogram:
Py7. Get most active stocks in the dataframe and visualise it

columns_to_apply = [ "Price (Intraday)", "Change", "% Change", "Volume", "Avg Vol (3 month)", "Market Cap", "PE Ratio (TTM)"]

for col in columns_to_apply:
  most_active_stocks[col] = most_active_stocks[col].apply(convert_str_to_num)

most_active_stocks["log_market_cap"] = np.log10(most_active_stocks["Market Cap"])

most_active_stocks.log_market_cap.hist(bins = 10)
Most traded stocks are large on Market Cap: $B (10⁹) to $T (10¹²)
Fig.4 Most traded stocks are large on Market Cap: $B (10⁹) to $T (10¹²)
Let's now divide the active stocks on 3 equally sized groups:
Py8. Create 3 buckets (by the market cap) for the most traded stocks

most_active_stocks["log_market_cap_binned"] = pd.qcut(most_active_stocks.log_market_cap,3)
Three equal size clusters of the most traded stocks
Fig.5 Three equal size clusters of the most traded stocks
We can now find average values for these groups on the following:
  1. % Change: largest stocks change -0.4%, while smaller stocks 1.3%
  2. Total volume for clusters can be comparable (the same power of 10), but market cap differs 5–10x each (6.4*10⁹ vs. 2.7*10¹⁰ vs. 2.17*10¹¹)
  3. P/E Ratio (where filled): is high for the largest stocks (78), moderate for smaller (51), and highest for small (104)
Average values for 3 clusters of stocks by market cap
Fig6. Average values for 3 clusters of stocks by market cap
You might also want to remove outliers: top-xx % from each sides. This can be implemented with the following function:
Py9. Cutting top %% of data from the both side (removing outliers)

def remove_outliers(df, column_name, quantile_threshold):
  q_low = df[column_name].quantile(quantile_threshold)
  q_hi = df[column_name].quantile(1-quantile_threshold)
  rez = df[(df[column_name] < q_hi) & (df[column_name] > q_low)]
return rez
And then put it in use as follows:
Py10. Build a histogram of change % on the data without outliers

tmp = remove_outliers(df = most_active_stocks, column_name = “% Change”,quantile_threshold = 0.02)
tmp[“% Change”].hist(bins=100)
One day %Change happen to be mostly between -5% to 5% (can differ for other trading days)
Fig.7 One day %Change happen to be mostly between -5% to 5% (can differ for other trading days)
Py11. Data without ouliers: 1-day change in abs. values

tmp = remove_outliers(df = most_active_stocks, column_name = "Change", quantile_threshold = 0.02)
# The difference vs. previous graph: we draw the abs. daily change here vs. relative change in % in the previous graph
Abs. daily сhange is concentrated around 0 in normal trading days
Fig.8 Abs. daily сhange is concentrated around 0 in normal trading days
Py12. Generating a new feature "relative_volume" (compared to 3-months avg. volume)

# We want to get some idea of the stock was traded today vs. 3-month average volume
most_active_stocks["relative_volume"] = most_active_stocks["Volume"]/most_active_stocks["Avg Vol (3 month)"]

Let's now summarise what we have so far, listing our findings:

Result 1

We split companies to medium, large, and largest (by log_market_cap_binned). In most cases the first day change lies between -5% and +5%, while medium and small companies can drop/rise even further to -10/+10%. It is just one trading day, but it can give a general sense how stock can change its value in one day. On volume, largest companies are rarely traded more that 3x times more than 3-months average, while smaller companies can have up to 5–6x volume.
Let's build a diagram to illustrate the point:
Py 13. Diagram (kdeplot) of % Change vs. relative_volume of trade

import seaborn as sns
 data = most_active_stocks, x=”% Change”, y=”relative_volume”, hue=”log_market_cap_binned”, fill=True,
One day % Change vs. relative_volume (1 day volume/3-months avg. volume)
Fig.9 One day % Change vs. relative_volume (1 day volume/3-months avg. volume)

Result 2

The medium and large companies tend to have more negative and positive range of returns and form a 'wider' bell — with a higher standard deviation from the mean. An Investor you can earn more with smaller companies, but actually risk more too. So, the return-per-risk can be anything: smaller or bigger for the medium-large-largest stocks.
Py 14. Histogram of '% Change' for 3 bins of market cap companies

 data = most_active_stocks, x="% Change", hue="log_market_cap_binned",
 fill=True, common_norm=False, 
 # palette="rocket",
Histogram of 1-day %Change for three classes of stocks
Fig.10 Histogram of 1-day %Change for three classes of stocks

Getting a Dataframe with All EPS Historical for the Most Traded Stocks

We downloaded all available dates for EPS (earnings-per-share) for the most traded stocks:
200 most active stocks for Friday, 20 Nov. 2020
Fig.11 200 most active stocks for Friday, 20 Nov. 2020
There are also tickers in the ADDITIONAL_TICKERS list defined in the beginning of this article. These tickers might not appear in the most traded stocks list:
Py 15. Extend the tickers list

NEW_TICKERS = [x for x in ADDITIONAL_TICKERS if x not in set(most_active_stocks.Symbol)]
TICKERS_LIST = most_active_stocks.Symbol.append(pd.Series(NEW_TICKERS))
In the following code, we scrape info for each ticker from
Py 16. Scrape info about the every ticker

from random import randint
from time import sleep
# Empty dataframe
all_tickers_info = pd.DataFrame({‘A’ : []})
for i,ticker in enumerate(TICKERS_LIST):
  current_ticker_info = get_scraped_yahoo_finance_page(url = "",url_params = {‘symbol’:ticker})
  print(f’Finished with ticker {ticker}, record no {i}’)
  if all_tickers_info.empty:
  all_tickers_info = current_ticker_info
   all_tickers_info =    pd.concat([all_tickers_info,current_ticker_info], ignore_index=True) 
 # Random sleep 1–3 sec
Before proceeding, we need to calculate the closest future earnings dates:
Py 17. Next earnings dates for all the tickers

next_earnings_dates = get_next_earnings_records(all_tickers_info)
We know when is the next earnings date and want to know what to expect on EPS:
Py 18. Sorted list of the closest next earnings dates

next_earnings_dates.sort_values(by=’Earnings Date’).tail(30)
Next reporting dates are only in January 2021
Fig.12 Next reporting dates are only in January 2021
In the next step, we remove all cells with missing values ('-'):
Py 19. Clean dataframe with all tickers historical earnings

all_tickers_info_clean = clean_earnings_history_df(all_tickers_info)
All tickers EPS history
Fig.13 All tickers EPS history
Now let's check what we have at the aggregate level:
Aggregate stats for EPS
Fig.14 Aggregate stats for EPS
Py 19. Clean dataframe with all tickers historical earnings

all_tickers_info_clean = clean_earnings_history_df(all_tickers_info)

Result 3

An average EPS estimate is only 0.52, which is not far from the reality 0.50 => it is only1.2% surprise in an average case. The standard deviation for the Surprise is 386% — which says that there are many outliers in the dataset:
  • small values EPS (<0.13) tend to under-report on EPS (-1% surprise),
  • medium (EPS=0.38) the slightly overreport 4% higher than estimate,
  • the highest quantile (EPS>0.75) they over-report in average 13%
Actual EPS figures are close to the estimates during the whole period of history
Fig.15 Actual EPS figures are close to the estimates during the whole period of history
Probably (to be checked), this "normal" points when EPS Estimate is close to the Reported EPS won't give us the outstanding returns when everything is happening as predicted. So we can see from this graph (box plot), that there are many outliers with highly positive or negative EPS Estimate/Reported EPS.
Py 20. Boxplot of EPS Estimate and Reported EPS

all_tickers_info_clean[['EPS Estimate','Reported EPS']]
EPS Estimate/Reported EPS have many outliers in both positive and negative directions
Fig. 16 EPS Estimate/Reported EPS have many outliers in both positive and negative directions
If we remove 360 entries (out of 12636 values) with extreme values (abs. z-score>2) — then we can have a better view on the box-plot : less outliers in both directions remain:
Py 21. Filtering the dataframe to remove the outliers

import scipy
# calculate z-scores of `df`
z_scores = scipy.stats.zscore(all_tickers_info_clean[["EPS Estimate","Reported EPS"]])
abs_z_scores = np.abs(z_scores)
filtered_entries = (abs_z_scores < 2).all(axis=1)
new_df = all_tickers_info_clean[["EPS Estimate","Reported EPS"]][filtered_entries]
Box Plot with removed outliers
Fig.17 Box Plot with removed outliers
Now we need to convert dates in the dataframe to simplify further analysis:
Py 22. Read date from string

from datetime import datetime 
from datetime import timedelta
all_tickers_info_clean["Earnings Date 2"] = all_tickers_info_clean["Earnings Date"].apply(lambda x:datetime.strptime(x[:-3], "%b %d, %Y, %H %p") )
Next, we can generate the PRIMARY KEY (PK) to be used in merge operations afterwards, using string of the earnings date without time and the ticker symbol:
Py 23. Generate the primary key (PK) field

all_tickers_info_clean["PK"] = all_tickers_info_clean.Symbol + "|"+ all_tickers_info_clean["Earnings Date 2"].apply(lambda x : x.strftime("%Y-%m-%d"))
Historical EPS for top traded stocks and the Primary Key (PK)
Fig.18 Historical EPS for top traded stocks and the Primary Key (PK)

Getting All Available History of Stock Prices for the Selected Tickers

If you're in Colab, start with installing yfinance in your notebook:
Py 24. Install and import the moduly 'yfinance'

!pip install yfinance
import yfinance as yf
In the following code, we generate a table with daily prices and future returns (in 1–7,30,90,360 days):
Py 25. Get historical prices from yfinance

# Start from an empty dataframe
df_stocks_prices = pd.DataFrame({'A' : []})
# Download all history of stock prices and calculate the future returns for 1–7 days, 30d, 90d, 365d 
# That is: we are very interested if we buy stock at some date (e.g. high EPS) -> if it is going to be a profitable decision
for i,ticker in enumerate(TICKERS_LIST):
  yf_ticker = yf.Ticker(ticker)
  historyPrices = yf_ticker.history(period='max')
  historyPrices['Ticker'] = ticker
  # Sometimes there is a problem with .index value → use try
    historyPrices['Year']= historyPrices.index.year
    historyPrices['Month'] = historyPrices.index.month
    historyPrices['Weekday'] = historyPrices.index.weekday
    historyPrices['Date'] =
  except AttributeError:
  # !!! Important: we do historyPrices['Close'].shift(1) — to get the Close market price 1 day BEFORE 
  # !!! Important: we do historyPrices['Close'].shift(-i)) — to get the Close market price the i days AFTER current
  # If you divide second on first -> you get the returns from holding “i days” the stock that you bought the day before financial reporting occurred
  for i in [1,2,3,4,5,6,7,30,90,365]:
    historyPrices['r_future'+str(i)] = np.log(historyPrices['Close'].shift(-i) / historyPrices['Close'].shift(1) )

  historyPrices['years_from_now'] = historyPrices['Year'].max()- historyPrices['Year']
  historyPrices['ln_volume']= np.log(historyPrices['Volume'])

  if df_stocks_prices.empty:
    df_stocks_prices = historyPrices
    df_stocks_prices = pd.concat([df_stocks_prices,historyPrices], ignore_index=True)
We generate the same PRIMARY KEY to (inner) join with the dataframe of EPS : <Symbol | Date>:
Py 26. Generating the primary key (PK) for the historical prices dataframe

df_stocks_prices["PK"] = df_stocks_prices.Ticker + "|"+ df_stocks_prices["Date"].apply(lambda x : x.strftime('%Y-%m-%d'))
We've generated a lot of daily stats (1.2M records for 200 stocks) for the financial performance of the selected stocks.

Let's now look at an example of how r_future1 and r_future2 are calculated for one stock. For that:
  • lets select the second row for the date '2020–10–02' : GE Close price for 2020–10–01 was 6.24, for 10–05 (the next trading day after 10–02) was 6.41, for 10–06 was 6.17
  • r_future1 = log(6.41 / 6.24) = log(1.027) = 0,026 — that is approximately 2.6% returns from buying stock GE on 1-Oct and selling it 5th Oct (1 trading days after the reporting date)
  • r_future2 = log(6.17 / 6.24) = log(0.988) = -0,011 — that is approximately -1.1% returns (loss) from buying stock GE on 1-Oct and selling it 6-th Oct (2 trading days after the reporting date)
Py 27. Checking the stats for one stock GE

filter1 = df_stocks_prices.Ticker=='GE'
filter2 = df_stocks_prices.Year == 2020
filter3 = df_stocks_prices.Month == 10
df_stocks_prices[filter1 & filter2 & filter3].head(2)
Truncated (on rows) dataset for a daily stock prices and their future returns
Fig 19. Truncated (on rows) dataset for a daily stock prices and their future returns
We still may have duplicates in the all_tickers_info_clean dataset due to the double records in the original website:
Py 28. Dropping the duplicates

all_tickers_info_clean = all_tickers_info_clean.drop_duplicates(subset=['PK'], keep='first')
We also need to remove duplicates from df_stocks_prices in case we have them:
Py 29. Dropping the duplicates

df_stocks_prices = df_stocks_prices.drop_duplicates(subset=['PK'], keep='first')
Now we can try to merge these dataframes. We use one-to-one validation to make sure there are no duplicates (which were supposed to be removed in the previous steps):
Py 30. Joining the dataframes (one-to-one, using the PK field)

merged_df = pd.merge(all_tickers_info_clean, df_stocks_prices, on="PK", validate="one_to_one")
The resulting dataframe should have the following structure:
Merged Dataframe: earnings dates, stock prices, and returns
Fig 20. Merged Dataframe: earnings dates, stock prices, and returns
It is back to normal size of 12k rows, because we store only the records for the financial reporting days, reducing the volume 100x: from 1.2M to 12k rows.

Individual Stocks Examples: Recent Spikes in Q3 and Q2 Reports

In particular, we'll cover the following tickets, taking respective indicators from the merged_df dataframe:

• [GE] shares jump after company posts surprise adjusted third-quarter profit, revenue tops expectations (
• [MSFT] Microsoft's stock rises after company reports 15% sales jump and says coronavirus had 'minimal' impact on revenue (,Revenue%3A%20%2435.02%20billion)

To perform the analysis, we'll use the following function:
Py 32. Visualisation function for one stock

def draw_plot(symbol):
  filter = (merged_df.Symbol== symbol) & (merged_df.Year>=2010)
  df = merged_df[filter][["EPS Estimate","Reported EPS","Surprise(%)","r_future1","r_future7","Date"]]
  with pd.option_context('display.max_rows', None, 'display.max_columns', None): # more options can be specified also
  #Graph1: EPS estimate vs. Reported EPS 
  df[["EPS Estimate","Reported EPS","Date"]].plot.line(x="Date", figsize=(20,6), title="EPS Estimate vs. Reported")
  #Graph2: Surprise in %
  df[["Surprise(%)","Date"]].plot.line(x="Date", figsize=(20,6), title="Surprise % (=Reported EPS/EPS Estimate)")
  #Graph3: 1- and 7-days returns
df[["r_future1",”r_future7","Date"]].sort_values(by='Date').plot(x="Date", kind='bar', figsize=(20,6), title="Stock jump")
Let's take a glimpse at GE (General Electric) stock:

  • In most of the periods Reported EPS is higher than Predicted EPS
  • Four datapoints (quarters) show the negative Surprise, which didn't cause a big shock for a stock price for the first time, but caused a 10–20% dip for other times
  • The last report Q3'20 showed a positive dynamics moving EPS from a negative to positive values, which resulted in 3.7% rise in the first day, and 13% stock rise in 7 days
  • The last quarter results pose the following investment idea: if the Surprise (EPS actual vs. EPS predicted) is positive and big, then the stock can jump in one day (r1_future) and continue its growth for the whole week after that (r7_future). An investor can monitor such occasions and buy a stock just after the very successful reporting date aiming to sell it in a short period of time
Py 33. Building plots for GE (General Electric)

GE Stock: EPS Estimate vs. Reported EPS
Fig 21. GE Stock: EPS Estimate vs. Reported EPS
GE Stock: Surprise %
Fig.22 GE Stock: Surprise %
GE Stock: Stock price jump 1 and 7 days after the reporting date
Fig.23 GE Stock: Stock price jump 1 and 7 days after the reporting date
Py 34. Building plots for MSFT (Microsoft)

Let's look at MSFT stock:
  • Microsoft (MSFT)tend to have the the actual EPS different from an estimates on -10% .. +20%
  • There was one date in the end of 2017 with surprise 40%, which did cause a positive spike in returns, although r1 and r7 are not very different (which probably happened due to low absolute value for EPS <0.01)
  • Last 6 quarters the stock showed the gradual EPS rising trend, with r7_future was always higher than r1_future. Which means that MSFT stock was a good opportunity to invest all quarters last 1.5 years
  • Investment idea: try to find stocks with growing EPS over 1–2 years period of time and invest into it around the reporting day
MSFT Stock: EPS Estimate vs. Reported EPS
Fig.24 MSFT Stock: EPS Estimate vs. Reported EPS
MSFT Stock: Surprise %
Fig.25 MSFT Stock: Surprise %
MSFT Stock: Stock price jump 1 and 7 days after the reporting date
Fig.26 MSFT Stock: Stock price jump 1 and 7 days after the reporting date

Scaled Analysis

In this section, we'll examine the aggregated statistics across many years of data and different dimensions to group by.

The first example is 1 and 7-days returns for all stocks grouped by year

Py 35. Aggregated 1 and 7 days returns comparison

import matplotlib.pyplot as plt

print('Count observations: ',merged_df.groupby(by='Year').count()['r_future1'])
ax = merged_df.groupby(by='Year').mean()[['r_future1','r_future7']].plot.line(figsize=(20,6))
vals = ax.get_yticks()
ax.set_yticklabels(['{:,.1%}'.format(x) for x in vals])
plt.axhline(y=0, color='r', linestyle='-')
plt.title("1 and 7 days returns of stocks after the quarterly earnings results announcement")
Aggregated r_future1 and r_future7
Fig.27 Aggregated r_future1 and r_future7

Result 4

For many years (but not always!) the expected returns after 7 days is higher than after 1 day from the reporting date. This means that individual trends that we've seen earlier for MSFT and GE tend to be generally used for the larger dataset, but only within successful "bullish" years (when r_future1>0 in average).

Parametrise the function

(with filtering and groupby conditions)
Here we create a parametrised version of the previous graph, in which you can select a feature for groupby, and condition to filter.
Py 36. Parametrised returns function and visualisation

def draw_returns(groupby_factor, filter): 
  filter_year = merged_df.Year>=2000
  print('Count observations: ',merged_df[filter & filter_year].groupby(by=groupby_factor).count()['r_future1'])
  ax = merged_df[filter & filter_year].groupby(by=groupby_factor).mean()[['r_future1','r_future7']].plot.line(figsize=(20,6))
  vals = ax.get_yticks()
  ax.set_yticklabels(['{:,.0%}'.format(x) for x in vals])
  plt.axhline(y=0, color='r', linestyle='-')
  plt.title("1 and 7 days returns of stocks after the quarterly earnings results announcement")
  if groupby_factor=='Year':

Result 5

We try the parametrised approach, getting the returns r1 and r7 for different classes of stocks (EPS<-1, EPS<0, EPS>0, EPS>1, EPS>2 etc.). We want to find out that one line (r7) is always higher than another line (r1), but can't actually prove that it is true, unfortunately.
Py 37. Calling new function with the different options

draw_returns('Year', merged_df["Reported EPS"]<0)
draw_returns('Year', merged_df["Reported EPS"]<-1)
draw_returns('Year', merged_df["Reported EPS"]>0)
draw_returns('Year', merged_df["Reported EPS"]>1)
draw_returns('Year', merged_df["Reported EPS"]>2)
EPS less than 0
Fig.28 EPS less than 0
EPS less than -1
Fig.29 EPS less than -1
EPS greater than 0
Fig.30 EPS greater than 0
EPS greater than 1
Fig.31 EPS greater than 1
EPS greater than 2
Fig.32 EPS greater than 2

Slicing the Data on the Volume of Trade

You might also want to try to slice the data on the volume of trade /size of a company and see if there is some spectacular behaviour for any of the subgroup.
Py 38. Histogram of ln(volume_of_trade)

merged_df.ln_volume.replace([np.inf, -np.inf], np.nan).hist(bins=100)
Ln daily volume of trade distribution
Fig.33 Ln daily volume of trade distribution
Py 39. Getting 10 equal size bins on a daily volume of trade

merged_df["ln_volume_binned"] = pd.qcut(merged_df["ln_volume"],10)

# (17.674, 21.613] 1226 
# (6.396, 14.344] 1226 
# (17.111, 17.674] 1225 
# (16.74, 17.111] 1225 
# (16.436, 16.74] 1225 
# (16.157, 16.436] 1225 
# (15.85, 16.157] 1225 
# (15.49, 15.85] 1225 
# (15.037, 15.49] 1225 
# (14.344, 15.037] 1225 Name: ln_volume_binned, dtype: int64
Now we can use the same parametrised approach to slice data on ln_volume_binned and selecting different filters (e.g. merged_df["Year"]>2000 (Fig.34),
merged_df["Year"]==2020 (Fig.35),
(merged_df["Year"]==2020) & (merged_df["Reported EPS"]>0) (Fig.36):
Py 40. Calling the returns function for ln_volume_binned and different filters by year

draw_returns('ln_volume_binned', merged_df["Year"]>2000)
draw_returns('ln_volume_binned', merged_df["Year"]==2020)
draw_returns('ln_volume_binned', (merged_df["Year"]==2020) & (merged_df["Reported EPS"]<0))
r1_future and r7_future returns split by volume of trade for year&gt;2000
Fig 34. r1_future and r7_future returns split by volume of trade for year>2000
R1 and R7 returns split by volume of trade for year==2020
Fig 35. R1 and R7 returns split by volume of trade for year==2020
R1 and R7 returns split by volume of trade, year==2020, and reported_EPS&lt;0
Fig 36. R1 and R7 returns split by volume of trade, year==2020, and reported_EPS<0
The first graph shows that the average stock with large volume of trade were negative on returns in average for the period 2000–2020. In 2020 the trend reversed: the large stocks are positive in returns and r7 is higher than r1. If we look deeper in 2020 and add the condition EPS<0 then we will see the larger positive difference for r7 and r1 returns (which is good if you buy at day 1 and sell at day 7).

Result 6

We tried to split the stocks by the volume of trade, year, and EPS. There are no universal trends (when one line r7 lies above another line r1), but few (weaker) observations persist. In 2020: a stock with a high volume of trade tend to be positive in short-term returns (contrary to the years before that), and stocks with a negative EPS tend to return quicker to the previous (pre-reporting) prices and grow beyond that.

Analysing the Surprise % Value

The Surprise is largely concentrated around 0, as many companies want to report the very close value:
Py 41. Getting 10 equal size bins by the level of surprise

merged_df["surprise_%_binned"] = pd.qcut(merged_df["Surprise(%)"],10)

# (-31360.231, -17.356] 1226
# (-17.356, -3.39] 1226
# (-3.39, 0.27] 1229 
# (0.27, 1.87] 1221 
# (1.87, 3.73] 1226 
# (3.73, 6.446] 1223 
# (6.446, 10.677] 1225
# (10.677, 17.52] 1226 
# (17.52, 36.469] 1224
# (36.469, 6900.0] 1226 
# Name: surprise_%_binned, dtype: int64
Let's draw the returns:
Py 42. Returns for the different levels of surprise

draw_returns('surprise_%_binned', True)
R1 and R7 returns split by the Surprise%
Fig.37 R1 and R7 returns split by the Surprise%

Result 7

The general rule is that if the Surprise(%) is negative, then r1 and r7 are negative too. If the surprise is positive — r1 and r7 are positive too. It is hard to use the Surprise(%) factor as an Investment idea, as lines r1 and r7 lie close to each other. They seem to move apart only starting for high positive Surprise(%)>17. The average number of such cases is 20% (two highest bins out of ten), which means that an investor needs to monitor a lot of stock earning reports dates to catch the highest positive cases.

We've shown how to scrape the financial predictions from a website and how to connect them together with the stock returns. Q2'20 seems to be a very successful quarter for the top 50 (on the volume trade) stocks — most of them showing the positive surprise over the expected earnings-per-share (EPS) and high short-term returns. The result remains strong even after the corresponding S&P500 index returns are deducted (i.e. the top 50 stocks had higher positive growth than average index dynamics). When scaled to top-200 stocks — the result is not that simple — the average returns are smaller, and there is more variation in EPS and the returns.

Do you find the article useful?

Do you like the content?
Consider to make a donation
Whether you're grateful for the content, or you wish to support one of the ideas in the wishlist (published on the BuyMeACoffee page)

Leave your feedback on the article

For example, is it easy to understand?
For example, could you run the code?
For example, do you have idea to improve the article ?

Here you'll find the best articles from PythonInvest. Only useful digests, no spam.