PYTHON
[|]
#Part 4

Scraping Earnings Per Share (EPS)

The financial results are becoming extremely important during the hard times (e.g. COVID-19): some resistant verticals gain disproportionally high stock volume of trade and price rise
Discussion in Telegram
Screencasts on Youtube
Articles on Medium
Code on Github
This article continues the series of event-driven analytical stories, which we've started from the news sentiment analysis (Part 3), then continued with one quarter of EPS vs. returns (Part 4), and finally got the long-term EPS analysis over the last 20 years period (Part 5) for the selected set of stocks.

Introduction

I know the exact day when I've started trading— it was 28th April 2020, when Google announced its Q1 earnings and rose more than 7% in 1 day. I thought that Facebook had a similar business and it was releasing the report the next day, so I've invested my first $1000 in Facebook stock. And it was a quick win — it rose more than 10% in 1 day after showing "stability in ad revenue after fall in March". The same approach worked for Facebook stock in Q2 earnings date as well: it soared 8% after reporting higher earning-per-share $1.8 instead of expected $1.39. After the second success I finally decided to make a proper analysis at scale and write an article about it, which you can read below.

The hypothesis is: "The financial results are becoming extremely important during the hard times (e.g. COVID-19 in 2020): some resistant verticals gain disproportionally high stock volume of trade and price rise".

I wish I could find at least one strong idea, which will work in most of the times. Just as that Metal Man of Sligo from the picture pointing at the entrance to Sligo for a couple of centuries now.
Executive Summary
In this chapter we've tested how to get (scrape) publicly available data from a website when there is no easy way to get it from some library or API. In particular, we checked the Earnings-Per-Share (EPS) pages in one quarter for the top companies from the Yahoo Finance website and saved the numbers to one dataframe. Then we joined it with the historical prices of stocks, and finally we made a set of visual scatterplots trying to find the segments of high returns (above S&P500 Index) after the reporting dates by going over the Reported EPS, EPS Estimate, and Surpise(%) dimensions.

The Approach

The stocks ownership gives you the right to share in the profits of the company. In an ideal world, the price of the stock should be highly dependent on the earnings of the company, as it is a discounted future profits. If, at some point, a company earns more than previously — it might mean that the growth of a company is accelerated and the stock should be priced higher. That's why — among some other financial indicators to follow — earnings per share (EPS) is one of the most important one. Every quarter analysts make predictions on the company profits (or losses) and then check those predictions vs. actual reported EPS. If the company is doing better than predicted — it should cause the stocks price increase, and vice versa.

In this article, we aim to test this at scale — for hundreds of stocks that have reported earnings in 2020 Q2. We will check the dependency of a stock's price fluctuation vs. actual EPS, predicted EPS, and Surprise (= actual_EPS/predicted_EPS-1). Below is a quick overview of the sections and topics covered in the article:

In the Scraping Yahoo Finance: Earnings-Per-Share section, you'll learn how to obtain the earnings-per-share information for a wide range of companies for a specified period of time (starting with one day), scraping it from the Yahoo Finance website. Then, in the Packing Everything in One Scrape Function section, you'll embody everything you learned in the previous section into a single function to get a weekly stats on the dates and EPS. After that, in the Getting Stock Prices for a Company section, you'll look at how you can get data on stock returns and volume for a certain company. In the Getting S&P 500 Stats section, you will look at how to obtain S&P 500 data to evaluate how a certain symbol is doing against the index. In the Getting Stock Returns and Volume from Yahoo Finance section, you'll learn how to obtain data on stock returns and volume for all tickers found in Yahoo finance. In Merging All the Pieces Together part you will get the combined dataframe of the stats from all previous parts. And finally, in Analysis and Visualisation section you will see the examples of graphs built on the dataset.

This article is the fourth part in the series that covers how to take advantage of computer technologies to make informed decisions in stock trading. Refer to part 1 to be guided through the process of setting up the working environment need to follow along with the examples provided in the rest of the series. Then, part 2 covered several well-known finance APIs, allowing you to obtain and analyse stock data programmatically. In part 3, you explored whether stock market is influenced by the news.

An Overview on YouTube

Scraping Yahoo Finance: Earnings-Per-Share

You might be interested in the earnings-per-share information for a wide range of companies over a certain period of time. You can scrape this information from the Yahoo Finance website at https://finance.yahoo.com/calendar/earnings:
To do scraping, you'll need to install the BeautifulSoup library in your Colab:
Py1. Install libs
!pip install beautifulsoup4
Then, make sure to import the following libraries:
Py2. Importing the libs
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
Suppose you want to obtain data for 2020–07–27. You'll need to specify the following URL:
Py3. URL address with parameters for scraping
url = “https://finance.yahoo.com/calendar/earnings?from=2020-07-26&to=2020-08-01&day=2020-07-27"
Then, send the following request:
Py4. Send the request to the URL
r = requests.get(url)
To make sure it has worked as expected, check the status of the request:
Py5. OK status from the requests lib
r.ok

#Output: True
Now you can move on to the content:
Py6. URL response content
r.content
Your task however, is to find table data within it. This can be easily accomplished with the help of BeautifulSoup as follows:
Py7. Find all tables in the content
soup = BeautifulSoup(r.text)
table = soup.find_all('table')
len(table)

# output:
# 1
Just one table has been found, which is good. Let's now get all the column names of the table:
Py8. Get columns list from the <span> tags
spans = soup.table.thead.find_all('span')
columns = []
for span in spans:
  print(span.text)
  columns.append(span.text)
Here are the columns (refer back to the screenshot in Figure 1 to make sure that all the columns have been found):
Py8. Get columns list from the <span> tags
Symbol
Company
Earnings Call Time
EPS Estimate
Reported EPS
Surprise(%)
Now you can move on to the rows:
Py9. Get rows list
rows = soup.table.tbody.find_all('tr')
len(rows)
As you can see, you have 100 rows in the table scraped from the page. In the next code snippet, you load the rows to a pandas dataframe, reading row by row:
Py10. Get the content from the table
stocks_df = pd.DataFrame(columns=columns)
for row in rows:
  elems = row.find_all('td')
  dict_to_add = {}
  for i,elem in enumerate(elems):
    dict_to_add[columns[i]] = elem.text
  stocks_df = stocks_df.append(dict_to_add, ignore_index=True)
As a result, the data in the dataframe should look as follows:
Py11. Visualise the content
stocks_df
Figure-2: Scraped table from the Earnings Calendar on Yahoo Finance
Figure-2: Scraped table from the Earnings Calendar on Yahoo Finance
You should have 100 rows scraped. We will use all columns, but note that Earnings Call Time values are not supplied in many cases. Some other problems in the dataset are:

  • missing values: some values for EPS Estimate, Reported EPS, and Surprise are unknown
  • some values in these columns are integers: you need to convert them to float
To get rid of missing values, you can apply the following filters to the dataset:
Py12. Filtering the dataframe by excluding the rows with missing values
filter1 = stocks_df['Surprise(%)']!='-'
filter2 = stocks_df['EPS Estimate']!='-'
filter3 = stocks_df['Reported EPS']!='-'
stocks_df_noMissing = stocks_df[filter1 & filter2 & filter3]
You should see that the number of rows has reduced after that:
Py13. 21 out of 100 records were filtered out
len(stocks_df_noMissing)

# Output:
#  79
In the next step, you solve another problem and convert all the values in the EPS Estimate, Reported EPS, and Surprise columns to float:
Py14. Type conversion for numerical values from string to float
len(stocks_df_noMissing)
stocks_df_noMissing['EPS Estimate'] = stocks_df_noMissing['EPS Estimate'].astype(float)
stocks_df_noMissing['Reported EPS'] = stocks_df_noMissing['Reported EPS'].astype(float)
stocks_df_noMissing['Surprise(%)'] = stocks_df_noMissing['Surprise(%)'].astype(float)
As a result, you should have the following dataframe:
Py15. The resulting scraped table with the correct datatypes
stocks_df_noMissing.info()

<class ‘pandas.core.frame.DataFrame’>
Int64Index: 79 entries, 0 to 99
Data columns (total 6 columns):
# Column Non-Null Count Dtype
 — — — — — — — — — — — — — — -
0 Symbol 79 non-null object
1 Company 79 non-null object
2 Earnings Call Time 79 non-null object
3 EPS Estimate 79 non-null float64
4 Reported EPS 79 non-null float64
5 Surprise(%) 79 non-null float64

Packing Everything in One Scrape Function

Now you are ready to pack everything in one scrape function that returns stocks_df over 1 week. You may call the function during the several weeks period:
Py16. Packing the scraping code to one function
# Need to supply weekly stats as you see on the website
# from_dt = '2020-07-26'
# to_dt = '2020-08-01'

def get_scrapped_week(from_dt, to_dt):

  # initially look at the first 100 stocks with earnings at the first day of the week (from_dt)  
  # FULL URL with PARAMS example: url = "https://finance.yahoo.com/calendar/earnings?from=2020-07-26&to=2020-08-01&day=2020-07-27" 
  url = "https://finance.yahoo.com/calendar/earnings"
  offset = 0
  size = 100
  fst = 1

  # scrape every date in the submitted interval 
  for day_date in (datetime.strptime(from_dt, '%Y-%m-%d')  + timedelta(n) for n in range(6)):
    day_dt = datetime.strftime(day_date, '%Y-%m-%d')
    print(day_dt)

    # inner cycle for iteration with offset, if more than 100 stocks earnings happenned that date  
    while True:
      # make URL request with the params
      params = {'from': from_dt, 'to': to_dt,'day': day_dt, 'offset':offset, 'size': size} 
      r = requests.get(url, params=params) 
      soup = BeautifulSoup(r.text) 

      # scrape table column names when going first time to create a correct dataframe
      if fst == 1:
        spans = soup.table.thead.find_all('span')
        columns = []
        for span in spans:
          print(span.text)
          columns.append(span.text)
        stocks_df = pd.DataFrame(columns=columns)
        fst = 0

      # scrape body with row values
      rows = soup.table.tbody.find_all('tr') 
      for row in rows:
        elems = row.find_all('td')
        dict_to_add = {}
        dict_to_add['Date'] = day_dt
        for i,elem in enumerate(elems):
          dict_to_add[columns[i]]=elem.text
        stocks_df = stocks_df.append(dict_to_add, ignore_index=True)  
      if len(rows) != 100:
        print(len(rows)+offset)
        offset = 0
        break
      else: 
        offset = offset + 100
  
  return stocks_df

# stocks_df.to_csv('stocks.csv', index = False)
    for span in spans:
      print(span.text)
      columns.append(span.text)
      stocks_df = pd.DataFrame(columns=columns)
      fst = 0
    # scrape body with row values
    rows = soup.table.tbody.find_all(‘tr’)
    for row in rows:
      elems = row.find_all(‘td’)
      dict_to_add = {}
      dict_to_add[‘Date’] = day_dt
      for i,elem in enumerate(elems):
        dict_to_add[columns[i]]=elem.text
        stocks_df = stocks_df.append(dict_to_add, ignore_index=True)
        if len(rows) != 100:
          print(len(rows)+offset)
          offset = 0
          break
        else:
          offset = offset + 100
  return stocks_df
Let's try it for a certain week:
Py17. Calling the scraping function for one week
stocks_df = get_scrapped_week(‘2020–07–05’, ‘2020–07–11’)
Here is the output:
Py18. The output for one scraped week
2020–07–05
Symbol
Company
Earnings Call Time
EPS Estimate
Reported EPS
Surprise(%)
8
2020–07–06
29
2020–07–07
23
2020–07–08
23
2020–07–09
23
2020–07–10
4
You can obtain data for more weeks, appending the results to stocks_df:
Py20. Cleaning the final dataset
filter1 = stocks_df['Surprise(%)']!='-'
filter2 = stocks_df['EPS Estimate']!='-'
filter3 = stocks_df['Reported EPS']!='-'
stocks_df_noMissing = stocks_df[filter1 & filter2 & filter3]
stocks_df_noMissing['EPS Estimate'] = stocks_df_noMissing['EPS Estimate'].astype(float)
stocks_df_noMissing['Reported EPS'] = stocks_df_noMissing['Reported EPS'].astype(float)
stocks_df_noMissing['Surprise(%)'] = stocks_df_noMissing['Surprise(%)'].astype(float)
To simplify search operations within the dataset, you might want to set the Symbol column as the index:
Py21. Setting the index for the dataframe
stocks_df_noMissing.set_index('Symbol')
Here is how you can take advantage of it:
Py22. One row of the resulting table
Symbol Company Earnings Call Time EPS Estimate Reported EPS Surprise(%) Date

505 GOOGL Alphabet Inc. Time Not Supplied 8.21 10.13 23.42 2020–07–29

Getting Stock Prices for a Company

In this section, we'll look at how you can get data on stock returns and volume for a certain company. To start with, you'll need to install the yfinance library needed to obtain stock data:
Py23. Installing yfinance
!pip install yfinance
Let's start with an attempt to get one stock one date, calculating the returns and volume jump.
Py24. Doing the imports
import yfinance as yf
import numpy as np
import pandas as pd
from datetime import datetime
from datetime import timedelta
This is what we can see for the FB symbol:
Py24. Checking one row of the data
row = stocks_df_noMissing[stocks_df_noMissing['Symbol'] == 'FB']
print(row)

# Output:
# Symbol    Company             Earnings Call Time    EPS Estimate      Reported EPS      Surprise(%) Date
# 776   FB    Facebook, Inc.    Time Not Supplied 1.39          1.8         29.59             2020-07-29
You can easily extract the date from it:
Py25. The next reporting date for FB
date = row['Date'].values[0]
print(date)

# Output:
# 2020-07-29
Let's now obtain stock data for this date and the dates near it, using the yfinance library:
Py26. Download stock prices for one ticker (FB)
date = datetime.strptime(row['Date'].values[0], ‘%Y-%m-%d’)
print(date + timedelta(days=3))
print(date — timedelta(days=1))
ticker = yf.Ticker('FB')
hist = yf.download('FB', start= date — timedelta(days=1), end=date + timedelta(days=3))
The output should look as follows:
Py27. Output message after the download is finished
# Output:
# 2020–08–01 00:00:00
# 2020–07–28 00:00:00
# [*********************100%***********************] 1 of 1 completed
If you check out the hist variable, it should contain the following data:
Figure-3: Yfinance data for FB around the quarterly reporting date
Figure-3: Yfinance data for FB around the quarterly reporting date
In the next step, you determine the stock price and volume rises for the last two days of observation.
Py28. 2-days returns (r2) for the stock's open prices and volume rise
hist['r2'] = np.log(hist['Open'] / hist['Open'].shift(2))
hist['volume_rise'] = np.log(hist['Volume'] / hist['Volume'].shift(2))
So the updated dataset shows you the volume and price rises for Facebook:
Figure-4: Yfinance data for FB with added r2 and volume_rise fields
Figure-4: Yfinance data for FB with added r2 and volume_rise fields
If you want to look at the last value of returns (r2 = 2-days return on the next morning after the event) in the dataset, you can extract it as follows:
Py29. 10% growth in 2 days after the reporting date (compared vs. the open prices for the day before)
hist.r2.values[-1]

# Output:
# 0.10145051589492579
And the volume of trade rising can be viewed as follows:
Py30. FB had 4x volume rise after the quarterly report was issued (exp^1.36 = 3.89)
hist.volume_rise.values[-1]

# Output:
# 1.361648662790037

Getting S&P 500 Stats

As mentioned in part2, it's a common practice to compare the stock performance with S&P 500 index. To begin with, let's obtain the S&P 500 index data for a specified period of time:
Py 30. Get S&P500 stats and draw the plot
import pandas_datareader.data as pdr
from datetime import date

start = datetime(2020,7,1)
end = datetime(2020,8,10)

print(f'Period 1 month until today: {start} to {end} ')

spx_index = pdr.get_data_stooq('^SPX', start, end)

# S&P500 index was growing almost all July 2020 → need to adjust stock growth after the reporting date

spx_index['Open'].plot.line()
Figure-5: S&amp;P 500 index in July - August 2020
Figure-5: S&P 500 index in July - August 2020
You can apply to the index the same technique, which you used in the previous section to determine the stock price rise, calculating the results for 2 days returns daily in July-Aug 2020:
Py 31. Get S&P500 2-days returns
spx_index['r2'] = np.log(np.divide(spx_index['Open'] , spx_index['Open'].shift(2)))
spx_index['r2'].plot.line()
Figure-6: 2-days returns for S&amp;P 500 index in July - August 2020
Figure-6: 2-days returns for S&P 500 index in July - August 2020
In the tabular form, the same data would look as follows:
Py 32. S&P 500 stats and returns represented in a table
spx_index.head(30)

# Output:
# Open High Low Close Volume r2 Date
# 2020–08–10 3356.04 3363.29 3335.44 3360.47 2565981272 NaN
# 2020–08–07 3340.05 3352.54 3328.72 3351.28 2279160879 NaN
# 2020–08–06 3323.17 3351.03 3318.14 3349.16 2414075395 -0.009843
# 2020–08–05 3317.37 3330.77 3317.37 3327.77 2452040105 -0.006813
# 2020–08–04 3289.92 3306.84 3286.37 3306.51 2403695283 -0.010056
# 2020–08–03 3288.26 3302.73 3284.53 3294.61 2379546705 -0.008814
# …
In the next code snippet, you fill an array of S&P returns for a corresponding stock. As an important note, if there is a "gap" for a particular date, we take the closest previous value:
Py 33. Generating an corresponding array of S&P500 returns for the reporting dates of the stocks
array_returns_snp500 = []
for index,row in stocks_df_noMissing.iterrows():
  start_dt = datetime.strptime(row['Date'], ‘%Y-%m-%d’) — timedelta(days = 1)
  end_dt = datetime.strptime(row['Date'], ‘%Y-%m-%d’) + timedelta(days = 3)
  # we don’t have gaps more than 4 days -> try to find the closest value of S&P500 returns in the dataframe:
  cur_dt = end_dt
  while cur_dt >= start_dt:
    rez_df = spx_index[cur_dt.strftime('%Y-%m-%d')]
    if len(rez_df)>0:
      array_returns_snp500.append(rez_df.r2.values[0])
      break
    else:
      cur_dt = cur_dt — timedelta(days = 1)
To make sure that it has worked as expected, you can check the lenght of both datasets: the newly created array_returns_snp500 and stocks_df_noMissing introduced earlier in this article:
Py 34. Checking the size of arrays of stocks and S&P500 returns
len(array_returns_snp500)
# Output:1698

len(stocks_df_noMissing)
# Output:1698
In both cases, you should have the same number.

Getting Stock Returns and Volume from Yahoo Finance

In this section, we'll look at how you can get data on stock returns and volume for all tickers found in Yahoo Finance. In the following script, you calculate 2 days returns on the open price after earnings in relation to the price 2 days ago, for each ticker:
Py 35. Getting stock returns and volume from Yahoo Finance
array_tickers = []
array_returns = []
array_volume_rise = []
array_volume_usd = []
array_snp500 = []

for index,row in stocks_df_noMissing.iterrows():
  start_dt = datetime.strptime(row['Date'], '%Y-%m-%d') — timedelta(days = 1)
  end_dt = datetime.strptime(row['Date'], '%Y-%m-%d') + timedelta(days = 3)
  hist = yf.download(row['Symbol'], start = start_dt, end = end_dt)
  
  # We need to have a full data : volume and price for all dates calculate the returns and volume rise
  # ALSO: if end_dt is non-trading day (Sat,Sun) → we can’t directly calc the stats of returns
  if len(hist)<4:
    continue
  
  hist['r2'] = np.log(np.divide(hist['Open'] , hist['Open'].shift(2)))
  
  hist['volume_rise'] = np.log(np.divide(hist['Volume'], hist['Volume'].shift(2)))
  
  hist['volume_usd'] = hist['Volume'] * hist['Open']
  print(row)
  print(index)
  print(' — — — — — — — ')
  
  array_tickers.append(row['Symbol'])
  array_returns.append(hist.r2.values[-1])
  array_volume_rise.append(hist.volume_rise.values[-1])
  array_volume_usd.append(hist.volume_usd.values[-1])
  
  # We only append values S&P for the stocks that have all the data
  array_snp500.append(array_returns_snp500[index])
The script generates huge output (about 1000 entries if you recall). Below is just a fragment:
Py 36. The fragment of an output from the previous code snippet
[*********************100%***********************] 1 of 1 completed
Symbol AEOJF
Company AEON Financial Service Co., Ltd.
Earnings Call Time Time Not Supplied
EPS Estimate 14.03
Reported EPS -5
Surprise(%) -135.67
Date 2020–07–07
Name: 37, dtype: object
37
 — — — — — — — 
[*********************100%***********************] 1 of 1 completed
Symbol BBBY
Company Bed Bath & Beyond Inc.
Earnings Call Time Time Not Supplied
EPS Estimate -1.22
Reported EPS -1.96
Surprise(%) -60.39
Date 2020–07–07
Name: 43, dtype: object
43
 — — — — — — — 
…
To sum up, it would be interesting to learn how many stocks have these financials: volume of trade, volume_rise (in stocks amount), and returns:
Py 37. Length check for the array of tickers (that provided a reporting date)
len(array_tickers)

# Output:
# 1003

Merging All the Pieces Together

Finally, let's merge all the financials we have obtained so far to see the entire "picture" for each of the traded stocks. For that, we'll create a dataframe:
Py 38. Create an empty dataframe for the merged stats
returns_df = pd.DataFrame(columns=['Ticker', 'Returns','Volume Rise','Volume Trade USD','Returns S&P500'])
And load it with data from the datasets we have created so far:
Py 39. Forming a dataframe from the set of arrays of standalone metrics
returns_df = pd.DataFrame([array_tickers,array_returns,array_volume_rise,array_volume_usd, array_snp500]).transpose()

returns_df.columns=['Ticker','Returns','Volume Rise','Volume Trade USD', 'Returns S&P500']
returns_df.set_index('Ticker', inplace=True)
returns_df.dropna(inplace=True)
returns_df['Returns'] = returns_df['Returns'].astype(float)
returns_df['Volume Rise'] = returns_df['Volume Rise'].astype(float)
returns_df['Volume Trade USD'] = returns_df['Volume Trade USD'].astype(float)
returns_df['Returns S&P500'] = returns_df['Returns S&P500'].astype(float)
returns_df['Returns in %'] = np.exp(returns_df['Returns'])
returns_df['Volume Rise in %'] = np.exp(returns_df['Volume Rise'])
Also, it would be interesting to learn what companies have had returns above S&P500 and add this information to the dataset:
Py 40. Generating the adjusted returns (returns above S&P500)
# Returns above S&P500
returns_df['Adj. Returns'] = returns_df['Returns'] — returns_df['Returns S&P500']
returns_df['Adj. Returns in %'] = np.exp(returns_df['Adj. Returns'])
The Adj. Returns metric serves as an indicator of a relative growth to overall S&P500 index.

You might want to create a set of histograms for each column in the returns_df dataframe to see a representation of the distribution of the data. If you want to have the histograms with no INF values, you can replace them in the dataframe as follows:
Py 41. Replacing inf numbers with nan and looking at the histograms distribution
returns_df = returns_df.replace([np.inf, -np.inf], np.nan)

returns_df.hist(figsize=(20,10), bins=100)
Figure-7: Histograms for Adjusted returns, raw returns, volume rise
Figure-7: Histograms for Adjusted returns, raw returns, volume rise
In the next step, you join the returns_df dataframe with the stocks_df_noMissing dataframe:
Py 42. Joining the dataframes on the index field ('Symbol')
stocks_and_returns = stocks_df_noMissing.set_index('Symbol').join(returns_df)
stocks_and_returns.head()
You might want to remove the INF values from the final dataset:
Py 43. Replacing inf values with nan and dropping those records
stocks_and_returns_no_missing = stocks_and_returns.replace([np.inf, -np.inf], np.nan).dropna()
That is what you should finally have:
Py 44. Output for the resulting dataframe
stocks_and_returns_no_missing.info()
<class ‘pandas.core.frame.DataFrame’>
Index: 997 entries, AA to ZEN
Data columns (total 14 columns):
# Column Non-Null Count Dtype
— — — — — — — — — — — — — — -
0 Company 997 non-null object
1 Earnings Call Time 997 non-null object
2 EPS Estimate 997 non-null float64
3 Reported EPS 997 non-null float64
4 Surprise(%) 997 non-null float64
5 Date 997 non-null object
6 Returns 997 non-null float64
7 Volume Rise 997 non-null float64
8 Volume Trade USD 997 non-null float64
9 Returns S&P500 997 non-null float64
10 Returns in % 997 non-null float64
11 Volume Rise in % 997 non-null float64
12 Adj. Returns 997 non-null float64
13 Adj. Returns in % 997 non-null float64
dtypes: float64(11), object(3)
memory usage: 116.8+ KB
Time to get results from our analysis. What are the TOP 50 most traded stocks around the date they publish the quarterly reporting?
Py 45. Top-50 stocks sorted by the volume of trade in USD
top50_volume = stocks_and_returns_no_missing.sort_values(by='Volume Trade USD', ascending=False).head(50)
print(top50_volume)
You might also want to try to slice the data on the volume of trade /size of a company and see if there is some spectacular behaviour for any of the subgroup.
Figure-8: Resulting dataset: top stocks returns and volume of trade rise around the quarterly reporting date
Figure-8: Resulting dataset: top stocks returns and volume of trade rise around the quarterly reporting date
Now what are the TOP 200 most traded stocks (for better statistical results of the analysis in the next section)? Run this code to learn:
Py 46. Top-200 stocks sorted by the volume of trade in USD
top200_volume = stocks_and_returns_no_missing.sort_values(by='Volume Trade USD', ascending=False).head(200)
print(top200_volume)

Analysis and Visualisation

Let's play with our dataset, trying to extract interesting information from it. In particular, it would be interesting to look at the distribution of returns. To see a visual summary of information, we'll build several plots:

In the following plot, you can see a plotting of the Surprise column versus the Returns column in the top50_volume dataframe. This information can be quite useful if you want to start investing in the most lucrative stocks.
Py 47. Top-50 stocks: Surprise(%) vs. Returns in %
top50_volume[['Surprise(%)','Returns in %']].plot.scatter(x='Surprise(%)', y='Returns in %')
Figure-9: A plotting of Surprise vs. Returns for TOP 50
Figure-9: A plotting of Surprise vs. Returns for TOP 50
There are several immediate results:

  • most of the stocks show the result around the expected EPS
  • 2 Surprise outliers showed 60x and 80x actual EPS vs. predicted AND slightly positive returns
In the next plot, you can see the same plotting but for the top200_volume dataframe.
Py 48. Top-200 stocks: Surprise(%) vs. Returns in %
top200_volume[['Surprise(%)','Returns in %']].plot.scatter(x='Surprise(%)', y='Returns in %')
Figure-10: A plotting of Surprise vs. Returns for TOP 200
Figure-10: A plotting of Surprise vs. Returns for TOP 200
A few things to add on this graph:

  • there are some new outliers with Surprise(%)>100% and Surprise(%)<2000%, which showed very impressive returns 10%-30% just in 2 days after the quarterly results announcement
  • strong negative Surprise doesn't mean automatic decrease in price: 5of the most left points showed 0.9–1.1 of the value in 2 days, which is -10% to +10% Returns
What if the 'Returns in %' depend not only from the relative 'Surprise in %', but more on the absolute value 'Reported EPS' ?

In the next plot, we're plotting Surprise and Reported EPS vs. Returns in % for TOP 50. We use subplots to show the axis value:
Py 49. Top-50 stocks: Surprise(%) vs. Reported EPS vs. Returns in % (color)
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
top50_volume[['Surprise(%)','Reported EPS','Returns in %']].plot.scatter(x='Reported EPS', y='Surprise(%)', c='Returns in %', colormap='RdYlGn', ax=ax)
Figure-11: A plotting Surprise and Reported EPS vs. Returns for TOP 50
Figure-11: A plotting Surprise and Reported EPS vs. Returns for TOP 50
In general we see many of the stocks reporting EPS between 0 and 2.5$, with a small positive surprise and moderate growth (light green : <5% Returns).
In the next plot, you can see the same plotting but for TOP 200:
Py 50. Top-200 stocks: Surprise(%) vs. Reported EPS vs. Returns in % (color)
fig, ax = plt.subplots()
top200_volume[['Surprise(%)','Reported EPS','Returns in %']].plot.scatter(x='Reported EPS', y='Surprise(%)', c='Returns in %', colormap='RdYlGn', ax=ax)
Figure-12: A plotting Surprise and Reported EPS vs. Returns for TOP 200
Figure-12: A plotting Surprise and Reported EPS vs. Returns for TOP 200
In the next plot, we're plotting Surprise and Reported EPS vs. Adj. Returns in % for TOP 50.
Py 51. A plotting Surprise and Reported EPS vs. Adj. Returns for TOP 50
fig, ax = plt.subplots()
top50_volume[['Surprise(%)','Reported EPS','Adj. Returns in %']].plot.scatter(x='Reported EPS', y='Surprise(%)', c='Adj. Returns in %', colormap='RdYlGn', ax=ax)
Figure-13: A plotting Surprise and Reported EPS vs. Adj. Returns for TOP 50
Figure-13: A plotting Surprise and Reported EPS vs. Adj. Returns for TOP 50
As you can see, Figure 11 and Figure 13 are not very different — individual stock shocks are much higher than average S&P500, so that Adj.Returns and Returns are very close.

In the next plot, you can see the same plotting but for TOP 200.
Figure-14: A plotting of Surprise and Reported EPS vs. Adj. Returns for TOP 200
Figure-14: A plotting of Surprise and Reported EPS vs. Adj. Returns for TOP 200
Again, Figure 14 it is very similar to Figure 12 (Adjusted Returns and Non-Adjusted returns).

In the next plot, we're plotting Reported EPS and EPS Estimate vs Returns in % for TOP 50.
The similar result will be for TOP 200 stocks.
Py 53. A plotting Reported EPS vs. EPS Estimate vs. Adj. Returns for TOP 50
fig, ax = plt.subplots()
top50_volume.plot.scatter(x='Reported EPS', y='EPS Estimate', c='Returns in %', colormap='RdYlGn', ax=ax)
Figure-15: A plotting of Reported EPS and EPS Estimate vs. Returns for TOP 50
Figure-15: A plotting of Reported EPS and EPS Estimate vs. Returns for TOP 50
In the following histogram, we compare on returns top 50 and top 200 volume-traded stocks. You may notice that top 200 stocks distribution (blue)have more "bell shaped" distribution around 0 and slightly positive returns vs. top 50 stocks (orange):
Py 54. Histogram for Top50 vs. Top200 stocks Adj.returns
top200_volume['Adj. Returns in %'].hist(bins=50, alpha=0.5)
top50_volume['Adj. Returns in %'].hist(bins=50, alpha = 0.5)
Figure-16: A histogram to compare on returns top 50 and top 200 volume-traded stocks
Figure-16: A histogram to compare on returns top 50 and top 200 volume-traded stocks
Conclusion

We've shown how to scrape the financial predictions from a website and how to connect them together with the stock returns. Q2'20 seems to be a very successful quarter for the top 50 (on the volume trade) stocks — most of them showing the positive surprise over the expected earnings-per-share (EPS) and high short-term returns. The result remains strong even after the corresponding S&P500 index returns are deducted (i.e. the top 50 stocks had higher positive growth than average index dynamics). When scaled to top-200 stocks — the result is not that simple — the average returns are smaller, and there is more variation in EPS and the returns.

Do you find the article useful?

Do you like the content?

Leave your feedback on the article

For example, is it easy to understand?
For example, could you run the code?
For example, do you have idea to improve the article ?

Here you'll find the best articles from PythonInvest. Only useful digests, no spam.