Stock Volatility Analysis and Forecasting Using GARCH and EGARCH Models: A Comparative Study of Microsoft and Apple ¶

By Imonikhe Ayeni¶

Abstract¶

This project provides a comprehensive analysis of stock price volatility, focusing on two technology giants: Microsoft Corporation (MSFT) and Apple Inc. (AAPL). Initially, a comparative study of their historical closing prices and daily returns is conducted to understand and contrast their inherent "swinginess" or risk profiles. Following this comparative assessment, the project delves into advanced time series modeling, specifically applying Generalized Autoregressive Conditional Heteroskedasticity (GARCH) and Exponential GARCH (EGARCH) models to forecast Microsoft's stock volatility. The EGARCH model, by accounting for asymmetric responses to market shocks (the "leverage effect" where negative news has a disproportionately larger impact on volatility than positive news), is identified as the superior forecasting tool. A robust walk-forward validation strategy is employed to generate realistic, out-of-sample volatility predictions. The findings highlight the presence of volatility clustering and persistence in both stocks, with EGARCH effectively capturing the critical leverage effect in Microsoft's volatility. This research offers valuable insights for investors seeking to better understand and manage risk in dynamic financial markets.

Introduction¶

In today's fast-paced and interconnected financial markets, understanding and anticipating stock price movements are paramount. Beyond simply knowing if a stock goes up or down, investors, analysts, and policymakers are keenly interested in its volatility – a measure of how much its price is expected to fluctuate or "swing" over a given period. High volatility signifies greater uncertainty and risk, while low volatility suggests more stable price movements. Accurate volatility forecasting is therefore crucial for effective risk management, portfolio optimization, and derivatives pricing.

This project embarks on a detailed exploration of stock volatility by focusing on two of the world's most influential technology companies: Microsoft Corporation (MSFT) and Apple Inc. (AAPL). Both companies hold significant weight in global markets, and understanding their individual and comparative risk characteristics provides valuable insights into broader market dynamics.

We begin by conducting a comparative analysis of Microsoft's and Apple's historical stock performance to identify differences in their inherent volatility. Following this, the project transitions to a more in-depth investigation of Microsoft's stock volatility specifically. Here, we employ sophisticated econometric models, namely GARCH (Generalized Autoregressive Conditional Heteroskedasticity) and EGARCH (Exponential GARCH). These models are cornerstones of modern financial econometrics, designed to capture key empirical features of financial returns, such as volatility clustering (where large price changes tend to be followed by large price changes, and vice-versa) and the leverage effect (the observation that negative news often triggers a larger increase in volatility than positive news of the same magnitude).

Through this comprehensive analysis and forecasting endeavor, our aim is to provide clear, actionable insights into the risk profiles of these tech giants and to demonstrate the power of advanced quantitative techniques in navigating the complexities of stock market volatility for a non-technical audience.

Objectives¶

The specific objectives of this study are to:

Analyze Historical Stock Performance and Volatility (MSFT vs. AAPL):¶

  • Examine and visualize the historical daily closing prices and percentage returns for both Microsoft and Apple stock.
  • Compute and compare key descriptive statistics, including measures of central tendency and dispersion, to understand their basic return characteristics and overall "swinginess."

Compare Volatility Characteristics (MSFT vs. AAPL):¶

  • Conduct a direct quantitative comparison of the historical volatility levels between Microsoft and Apple stock, determining which company's stock has exhibited higher overall "swinginess" over the observed period.

Model Microsoft's Volatility Dynamics using GARCH and EGARCH:¶

  • Apply the GARCH(1,1) model to Microsoft's daily returns to capture volatility clustering and persistence, understanding how past squared returns and past volatility influence current volatility.
  • Apply the EGARCH(1,1) model to Microsoft's daily returns to explicitly test for the "leverage effect," assessing whether negative price shocks induce a greater volatility response than positive shocks of similar magnitude.
  • Compare the GARCH and EGARCH models using statistical criteria (Log-Likelihood, AIC, BIC) to determine the superior model for Microsoft's volatility dynamics.

Forecast Microsoft's Future Volatility:¶

  • Utilize the best-fitting model (EGARCH) to generate out-of-sample forecasts for Microsoft's stock volatility over a defined future horizon.
  • Employ a walk-forward validation strategy to simulate realistic forecasting scenarios, continually updating the model with new data to produce robust one-step-ahead predictions.

Communicate Insights to a Non-Technical Audience:¶

  • Translate complex statistical findings and model interpretations into clear, accessible language, using visual aids (e.g., plots of returns versus predicted volatility bounds) to effectively convey practical implications for investors and market participants.

Methodology¶

This project employs a systematic, data-driven methodology to analyze and forecast stock volatility. The approach integrates data handling, exploratory analysis, econometric modeling, and robust forecasting techniques to achieve the stated objectives.

1. Data Collection and Preprocessing¶

Data Source: Historical daily stock price data for Microsoft Corporation (MSFT) and Apple Inc. (AAPL) were obtained from a reliable public financial data source, using a Python library to fetch data from Alpha Vantage.

Data Scope: The dataset spans a comprehensive period, ensuring sufficient observations for robust statistical modeling and analysis.

Data Cleaning:

  • The raw data was meticulously cleaned to handle missing values, ensure consistent data types, and convert the date column into a proper datetime index.
  • Column names were standardized for ease of access and readability (e.g., 'Adj Close' to 'adjusted_close', 'Open' to 'open').

Return Calculation: Daily logarithmic returns were calculated for both MSFT and AAPL using their adjusted closing prices. Logarithmic returns are preferred in financial modeling due to their desirable statistical properties, such as time additivity.

Formula: $R_t = \ln(P_t / P_{t-1})$, where $R_t$ is the return at time t, and $P_t$ is the adjusted closing price at time t.

2. Exploratory Data Analysis (EDA) and Comparative Analysis¶

Descriptive Statistics: Key descriptive statistics were computed for both MSFT and AAPL daily returns, including mean, standard deviation (as a measure of volatility), skewness, and kurtosis. This provided initial insights into their average performance and risk characteristics.

Time Series Plots:

  • Plots of historical adjusted closing prices for both companies were generated to observe general trends and identify periods of significant price movements.
  • Time series plots of daily returns were created to visually inspect for volatility clustering (periods of high volatility followed by more high volatility) and other stylized facts of financial time series.

Comparative Volatility Visualization: Visual tools (e.g., side-by-side plots of rolling volatility, comparative plots of returns) were used to directly compare the "swinginess" of Apple and Microsoft stock over the entire historical period.

3. Volatility Modeling with GARCH and EGARCH (Focus on Microsoft)¶

Model Framework: The arch Python library was utilized for implementing the GARCH family models.

GARCH(1,1) Model Specification:

  • A standard GARCH(1,1) model was fitted to Microsoft's daily returns. This model estimates the conditional variance ($h_t$) as a function of a constant ($\omega$), the previous period's squared residual ($\epsilon_{t-1}^2$, the ARCH term, weighted by $\alpha_1$), and the previous period's conditional variance ($h_{t-1}$, the GARCH term, weighted by $\beta_1$).
  • Equation: $h_t = \omega + \alpha_1 \epsilon_{t-1}^2 + \beta_1 h_{t-1}$

EGARCH(1,1) Model Specification:

  • An EGARCH(1,1) model was fitted to Microsoft's daily returns, specifically including the asymmetric (leverage) term. This model captures the dynamics of the logarithm of the conditional variance ($\ln(h_t)$), allowing for a differential impact of positive and negative shocks.
  • Equation: $\ln(h_t) = \omega + \alpha_1 \left(\frac{|\epsilon_{t-1}|}{\sqrt{h_{t-1}}} - E\left[\frac{|\epsilon_{t-1}|}{\sqrt{h_{t-1}}}\right]\right) + \gamma_1 \frac{\epsilon_{t-1}}{\sqrt{h_{t-1}}} + \beta_1 \ln(h_{t-1})$
  • The parameter $\gamma_1$ (the leverage effect coefficient) was of particular interest, as its significance (and sign) indicates the presence and direction of asymmetry.

Model Estimation: Both models were estimated using Maximum Likelihood Estimation (MLE), a standard method for fitting ARCH-type models.

Model Evaluation and Selection: The estimated parameters' statistical significance (p-values) were examined. Model fit was compared using Log-Likelihood, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). The model with the highest Log-Likelihood and lowest AIC/BIC was selected as the preferred model for forecasting Microsoft's volatility.

4. Volatility Forecasting (for Microsoft)¶

Walk-Forward Validation: To evaluate the out-of-sample forecasting performance realistically, a walk-forward validation strategy was implemented over a designated test set (e.g., the last 20% of the data).

  • In each step, the model was trained on all available data up to that point.
  • A one-day-ahead forecast of volatility was generated.
  • The training window then "walked forward" by one day, incorporating the new observation, and the process was repeated.

Forecast Extraction: Predicted conditional volatilities (standard deviations) were extracted from the model's forecasts.

Visualization of Forecasts:

  • The generated walk-forward volatility forecasts were plotted against the actual historical returns for the test period.
  • Predicted ±2 standard deviation bands were overlaid on the returns, providing a visual assessment of how well the model's forecasted "swinginess" contained the actual daily price movements.

Output Formatting: A custom function was developed to reformat the model's numerical forecasts into a clean, easy-to-read dictionary (JSON-like) structure, mapping future dates to their corresponding predicted volatility values.

InĀ [123]:
!pip install arch
Requirement already satisfied: arch in c:\users\user\anaconda3\lib\site-packages (7.2.0)
Requirement already satisfied: numpy>=1.22.3 in c:\users\user\anaconda3\lib\site-packages (from arch) (1.26.4)
Requirement already satisfied: scipy>=1.8 in c:\users\user\anaconda3\lib\site-packages (from arch) (1.13.1)
Requirement already satisfied: pandas>=1.4 in c:\users\user\anaconda3\lib\site-packages (from arch) (2.2.2)
Requirement already satisfied: statsmodels>=0.12 in c:\users\user\anaconda3\lib\site-packages (from arch) (0.14.2)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\user\anaconda3\lib\site-packages (from pandas>=1.4->arch) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in c:\users\user\anaconda3\lib\site-packages (from pandas>=1.4->arch) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in c:\users\user\anaconda3\lib\site-packages (from pandas>=1.4->arch) (2023.3)
Requirement already satisfied: patsy>=0.5.6 in c:\users\user\anaconda3\lib\site-packages (from statsmodels>=0.12->arch) (0.5.6)
Requirement already satisfied: packaging>=21.3 in c:\users\user\anaconda3\lib\site-packages (from statsmodels>=0.12->arch) (23.2)
Requirement already satisfied: six in c:\users\user\anaconda3\lib\site-packages (from patsy>=0.5.6->statsmodels>=0.12->arch) (1.16.0)
InĀ [3]:
import pandas as pd
import requests
import plotly.express as px
import numpy as np
import matplotlib.pyplot as plt
from arch import arch_model
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
InĀ [4]:
import requests

API_KEY = "7DX1UVYVHI*****"  # Masked my API Key, get yours from https://www.alphavantage.co

def get_stock_data(symbol):
    # Changed API function to TIME_SERIES_DAILY
    url = f"https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol={symbol}&outputsize=full&apikey={API_KEY}"
    response = requests.get(url)
    return response.json()

apple_data = get_stock_data("AAPL")
microsoft_data = get_stock_data("MSFT")

# Access data using the correct key - 'Time Series (Daily)'
#print(apple_data["Time Series (Daily)"])  # Example output
InĀ [5]:
print(apple_data["Meta Data"])
{'1. Information': 'Daily Prices (open, high, low, close) and Volumes', '2. Symbol': 'AAPL', '3. Last Refreshed': '2025-06-09', '4. Output Size': 'Full size', '5. Time Zone': 'US/Eastern'}
InĀ [Ā ]:
 
InĀ [6]:
# Assuming apple_data is your dictionary with stock data
df_apple = pd.DataFrame.from_dict(apple_data['Time Series (Daily)'], orient="index", dtype=float)

print("df_apple shape:", df_apple.shape)
print()
print(df_apple.info())
df_apple.head(10)
df_apple shape: (6440, 5)

<class 'pandas.core.frame.DataFrame'>
Index: 6440 entries, 2025-06-09 to 1999-11-01
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   1. open    6440 non-null   float64
 1   2. high    6440 non-null   float64
 2   3. low     6440 non-null   float64
 3   4. close   6440 non-null   float64
 4   5. volume  6440 non-null   float64
dtypes: float64(5)
memory usage: 301.9+ KB
None
Out[6]:
1. open 2. high 3. low 4. close 5. volume
2025-06-09 204.390 206.00 200.020 201.45 72862557.0
2025-06-06 203.000 205.70 202.050 203.92 46607693.0
2025-06-05 203.500 204.75 200.150 200.63 55221235.0
2025-06-04 202.910 206.24 202.100 202.82 43603985.0
2025-06-03 201.350 203.77 200.955 203.27 46381567.0
2025-06-02 200.280 202.13 200.120 201.70 35423294.0
2025-05-30 199.370 201.96 196.780 200.85 70819942.0
2025-05-29 203.575 203.81 198.510 199.95 51477938.0
2025-05-28 200.590 202.73 199.900 200.42 45339678.0
2025-05-27 198.300 200.74 197.430 200.21 56288475.0
InĀ [7]:
df_microsoft = pd.DataFrame.from_dict(microsoft_data['Time Series (Daily)'], orient="index", dtype=float)

print("df_microsoft:", df_microsoft.shape)
print()
print(df_microsoft.info())
df_microsoft.head(10)
df_microsoft: (6440, 5)

<class 'pandas.core.frame.DataFrame'>
Index: 6440 entries, 2025-06-09 to 1999-11-01
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   1. open    6440 non-null   float64
 1   2. high    6440 non-null   float64
 2   3. low     6440 non-null   float64
 3   4. close   6440 non-null   float64
 4   5. volume  6440 non-null   float64
dtypes: float64(5)
memory usage: 301.9+ KB
None
Out[7]:
1. open 2. high 3. low 4. close 5. volume
2025-06-09 469.700 473.430 468.6200 472.75 16469932.0
2025-06-06 470.085 473.335 468.7800 470.38 15285624.0
2025-06-05 464.955 469.650 464.0300 467.68 20154460.0
2025-06-04 464.000 465.690 463.0201 463.87 14162688.0
2025-06-03 461.470 464.140 460.8622 462.97 15743760.0
2025-06-02 457.140 462.110 456.8900 461.97 16626495.0
2025-05-30 459.715 461.680 455.5400 460.36 34770475.0
2025-05-29 461.550 461.720 455.3105 458.68 13982211.0
2025-05-28 461.220 462.520 456.9300 457.36 17086261.0
2025-05-27 456.480 460.950 456.1150 460.69 20974293.0

Data Cleaning and Preprocessing¶

I am going to create a function that cleans our stock data, the function:

  1. Converting index to DatetimeIndex named 'date'
  2. Removing numbering from column names
  3. Converting values to float
InĀ [10]:
def clean_stock_data(df):
    # Rename index to "date" and convert to DatetimeIndex
    df.index = pd.to_datetime(df.index, format="%Y-%m-%d")
    df.index.name = "date"

    # Remove numbering from column names (e.g., "1. open" → "open")
    df.columns = [col.split(" ")[1] for col in df.columns]

    # Convert data to float
    df = df.astype(float)

    return df
InĀ [11]:
cleaned_apple_df = clean_stock_data(df_apple)
cleaned_microsoft_df = clean_stock_data(df_microsoft)
InĀ [12]:
print(cleaned_apple_df.info())
cleaned_apple_df.head()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6440 entries, 2025-06-09 to 1999-11-01
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   open    6440 non-null   float64
 1   high    6440 non-null   float64
 2   low     6440 non-null   float64
 3   close   6440 non-null   float64
 4   volume  6440 non-null   float64
dtypes: float64(5)
memory usage: 301.9 KB
None
Out[12]:
open high low close volume
date
2025-06-09 204.39 206.00 200.020 201.45 72862557.0
2025-06-06 203.00 205.70 202.050 203.92 46607693.0
2025-06-05 203.50 204.75 200.150 200.63 55221235.0
2025-06-04 202.91 206.24 202.100 202.82 43603985.0
2025-06-03 201.35 203.77 200.955 203.27 46381567.0
InĀ [13]:
print(cleaned_microsoft_df.info())
cleaned_microsoft_df.head()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6440 entries, 2025-06-09 to 1999-11-01
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   open    6440 non-null   float64
 1   high    6440 non-null   float64
 2   low     6440 non-null   float64
 3   close   6440 non-null   float64
 4   volume  6440 non-null   float64
dtypes: float64(5)
memory usage: 301.9 KB
None
Out[13]:
open high low close volume
date
2025-06-09 469.700 473.430 468.6200 472.75 16469932.0
2025-06-06 470.085 473.335 468.7800 470.38 15285624.0
2025-06-05 464.955 469.650 464.0300 467.68 20154460.0
2025-06-04 464.000 465.690 463.0201 463.87 14162688.0
2025-06-03 461.470 464.140 460.8622 462.97 15743760.0

App Incorporation¶

InĀ [15]:
fig = px.line(cleaned_apple_df, x=cleaned_apple_df.index, y="close",
              title="Apple Stock Closing Price Over Time",
              labels={"close": "Closing Price", "index": "Date"},
              line_shape="linear")

# Customize line color
fig.update_traces(line=dict(color="orange"))

# Show the interactive chart
fig.show()
InĀ [16]:
fig = px.line(cleaned_microsoft_df,
              x=cleaned_microsoft_df.index,
              y="close",
              title="Microsoft Stock Closing Price Over Time",
              labels={"close": "Closing Price", "index": "Date"},
              line_shape="linear")

# Customize line color
fig.update_traces(line=dict(color="blue"), name="Microsoft Closing Price")

# Show the interactive chart
fig.show()