Financial time series: a journey from code to maths, and back

In this post we discuss a fundamental model in financial time series modelling - the ARCH model.

The model was introduced by Engle in the 80’s and brought him a Nobel prize in Economy. Nowadays ARCH is still the go-to model, or at least a starting point for time series with varying volatility, a characteristic shared by many financial return series.

ARCH model

Throughout we always denote a time series by \((Y_t)\). ARCH model is a specification of the conditional variance of \(Y_t\) given its history \(\mathcal F_{t-1}\) by the following equation: \[\mathrm{Var}[Y_t| \mathcal F_{t-1}] = \omega + \alpha Y_{t-1}^2.\]

In python, it is straightforward to define the ARCH model with the arch library.

`arch` library

import numpy as np
from arch import arch_model

rng = np.random.default_rng(41)
y = rng.standard_normal(size=(500,)) * np.arange(500) * 0.2
mod = arch_model(y, mean='zero', q=0)
res = mod.fit(disp="off")
fcst = res.forecast(horizon=50, method="bootstrap", simulations=3)

Above we exposed the main classes of the library.
- arch_model is a convenience model constructor function with which one can specify the the mean process, the volatility (aka conditional standard deviation), the noise distribution (adhering to scipy.stats).
- mod.fit() returns a result container: one that contains residuals, standardised residuals, conditional volatility of the fit model.
- res.forecast() returns a forecast container: one that contains conditonal variance given the present value, simulations given the current value etc.

type(mod), type(res), type(fcst)

(arch.univariate.mean.ZeroMean,
 arch.univariate.base.ARCHModelResult,
 arch.univariate.base.ARCHModelForecast)

res.summary()

Zero Mean - ARCH Model Results
Dep. Variable:	y	R-squared:	0.000
Mean Model:	Zero Mean	Adj. R-squared:	0.002
Vol Model:	ARCH	Log-Likelihood:	-2719.79
Distribution:	Normal	AIC:	5443.58
Method:	Maximum Likelihood	BIC:	5452.01
		No. Observations:	500
Date:	Sat, Oct 19 2024	Df Residuals:	500
Time:	18:28:40	Df Model:	0

Volatility Model
	coef	std err	t	P>\|t\|	95.0% Conf. Int.
omega	2082.8286	396.923	5.247	1.542e-07	[1.305e+03,2.861e+03]
alpha[1]	0.4962	0.174	2.850	4.377e-03	[ 0.155, 0.838]

Covariance estimator: robust

import polars as pl
import hvplot.polars
hvplot.extension('matplotlib')

pl.DataFrame(fcst.simulations.values.squeeze().T).hvplot() # 3 forecasted paths with horizon 50

Mean model and noise distribution

By default the noise distribution is standard normal. Another popular choice is t distribution where the degree of freedom \(\nu\) is a model parameter.
One can also change the mean to be non-zero. The model spec is as follows \[ \begin{align*} Y_t &= \mu + \sigma_t W_t \\ \sigma_t^2 &= \omega+ \alpha (Y_{t-1}-\mu)^2 \end{align*} \] where \(W_t\) is iid with t distribution, say.

In arch, it suffices to set mean="constant" and dist="t" in the constructor.

arch_model(y=y+30, mean="constant", q=0, dist='t').fit(disp="off")

                         Constant Mean - ARCH Model Results                         
====================================================================================
Dep. Variable:                            y   R-squared:                       0.000
Mean Model:                   Constant Mean   Adj. R-squared:                  0.000
Vol Model:                             ARCH   Log-Likelihood:               -2679.58
Distribution:      Standardized Student's t   AIC:                           5367.17
Method:                  Maximum Likelihood   BIC:                           5384.03
                                              No. Observations:                  500
Date:                      Sat, Oct 19 2024   Df Residuals:                      499
Time:                              18:28:43   Df Model:                            1
                               Mean Model                               
========================================================================
                 coef    std err          t      P>|t|  95.0% Conf. Int.
------------------------------------------------------------------------
mu            27.9482      1.487     18.797  7.917e-79 [ 25.034, 30.862]
                              Volatility Model                              
============================================================================
                 coef    std err          t      P>|t|      95.0% Conf. Int.
----------------------------------------------------------------------------
omega       1557.7431    401.680      3.878  1.053e-04 [7.705e+02,2.345e+03]
alpha[1]       1.0000      0.263      3.798  1.460e-04     [  0.484,  1.516]
                              Distribution                              
========================================================================
                 coef    std err          t      P>|t|  95.0% Conf. Int.
------------------------------------------------------------------------
nu             3.3891      0.357      9.480  2.536e-21 [  2.688,  4.090]
========================================================================

Covariance estimator: robust
ARCHModelResult, id: 0x7978505fd3a0

Why stop here? We can also allow the conditonal mean of \(Y_t\) given history to vary with time as well, one simplest choice would be to assume AR dynamic for the conditional mean \[ \begin{align*} Y_t &= \mu + \phi (Y_{t-1} - \mu) + \sigma_t W_t \\ \sigma_t^2 &= \omega+ \alpha (Y_{t-1}-\mu)^2 \end{align*} \] where as before \(W_t\) is iid normal by default.

In arch, just set mean="AR" and lags=[1] where 1 is as in AR(1) for the conditional mean.

arch_model(y=y+30, mean="AR", q=0, lags=[1]).fit(disp="off")

                           AR - ARCH Model Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                      -0.009
Mean Model:                        AR   Adj. R-squared:                 -0.011
Vol Model:                       ARCH   Log-Likelihood:               -2713.60
Distribution:                  Normal   AIC:                           5435.21
Method:            Maximum Likelihood   BIC:                           5452.06
                                        No. Observations:                  499
Date:                Sat, Oct 19 2024   Df Residuals:                      497
Time:                        18:28:43   Df Model:                            2
                               Mean Model                               
========================================================================
                 coef    std err          t      P>|t|  95.0% Conf. Int.
------------------------------------------------------------------------
Const         27.0714      2.670     10.138  3.746e-24 [ 21.838, 32.305]
y[1]          -0.0280  8.163e-02     -0.343      0.731 [ -0.188,  0.132]
                              Volatility Model                              
============================================================================
                 coef    std err          t      P>|t|      95.0% Conf. Int.
----------------------------------------------------------------------------
omega       1947.6921    445.220      4.375  1.216e-05 [1.075e+03,2.820e+03]
alpha[1]       0.5807      0.247      2.351  1.873e-02   [9.656e-02,  1.065]
============================================================================

Covariance estimator: robust
ARCHModelResult, id: 0x79785063a780

as before, we can obtain the result container and forecast container as follows

res = arch_model(y+30, mean="AR", q=0, lags=[1]).fit(disp="off")
fcst = res.forecast(horizon=5)

notice the difference of resid and std_resid, as well as the diff between variance and residual_variance

res.resid[1:].std(), res.std_resid[1:].std()

(np.float64(58.66084093396231), np.float64(0.9985090629605182))

fcst.variance

	h.1	h.2	h.3	h.4	h.5
499	23353.942452	15527.507157	10965.954024	8317.077882	6778.893468

fcst.residual_variance

	h.1	h.2	h.3	h.4	h.5
499	23353.942452	15509.168033	10953.760756	8308.466659	6772.362326

Generalized ARCH (GARCH)

Another generalization of the ARCH model focuses on the conditonal variance specification itself, rather than the conditional mean and the noise distribution. The most prominent example along this line of thinking is the GARCH family, which in its simplest form (zero mean model and lag 1) is specified as follows \[ \begin{align*} Y_t &= \sigma_t W_t \\ \sigma_{t}^2 &= \omega + \alpha Y_{t-1}^2 + \beta \sigma_{t-1}^2 \end{align*} \] The difference lies in the additional beta parameter which specifies a dependence of the conditional variance on the conditional variance of previous timestamp.

GARCH generalizing ARCH is similar to the way ARMA generalizes AR.

Introducing `archette`

archette is a simplified implementation of the zero mean GARCH model with iid Gaussian noise.

The existing arch library is amazing and covers many popular GARCH-type models. archette is not meant to replace arch in any way, after all, it supports only a very particular model (undoubtedly a very important one).

The goal of archette is to reduce the complexity of arch (e.g. all the class hierarchy) to highlight only the essence of GARCH model. It is meant to be super hackable so that one can build variants of GARCH models by modifying parts of the source code of archette.

There is only one class GARCHETTE in archette for which implements the methods:

fit
nll
forecast_vs
simulate

and the properties

vs
std_resids
params

The API is simple: states are maintained in the instances of the main class rahter than delegated to separate container classes, hopefully reducing some mental overhead for users.

Here is how one can fit a model and inspect the fitted parameters and the corresponding negative log likelihood.

(see how to install: https://xiaochuany.github.io/archette/ )

from archette import GARCHETTE

rng = np.random.default_rng()
y = rng.standard_normal(500) * np.arange(500)
mod = GARCHETTE()
mod.fit(y)

<archette.core.GARCHETTE at 0x79785c680590>

mod.params

array([124.94299377,   0.16898145,   0.83101855])

mod.nll(mod.params)

np.float64(5832.3224873676845)

Model fitting

Under the hood, fit calls minimize function in scipy.optimize module to minimize the negative log likelihood (nll).

Let \(f\) denote the joint distribution of time series \(y\). By successive conditioning, one can write the nll as \[ -\log f(y) = - \log f_\theta(y_0) - \sum_{t=1}^{T} \log f_{\theta}(y_t | y_{[0,t-1]}) \] where \(y_{[0,t-1]}:= (y_0, ..., y_{t-1})\) and \(\theta\) is the parameter of the model.

Since the marginal distribution at time 0 is unknown, one typically remove the first term and minimize \(-\log f(y|y_0)\) instead, where \[ -\log f(y|y_0) = - \sum_{t=1}^{T} \log f_{\theta}(y_t | y_{[0,t-1]}) \] By Gaussian noise assumption and lag 1, we have, up to some unimportant additive constant, \[ - \log f_{\theta}(y_t | y_{[0,t-1]}) = - \log f_{\theta}(y_t | y_{t-1}) = \frac{1}{2}\Big(\log v_t + \frac{y_t^2}{v_t}\Big) \] where \(v_t\) is the conditional variance of \(y_t\) given \(y_{t-1}\) (which is denoted by \(\sigma_t^2\) before).

Notice that \(v_0\) is not observable, hence one needs to set a sensible initial value, e.g. sample variance of \(y\).

We define two helper functions, one computes the sequence \(v\), the other the nll.

def _get_vs(y, params, sig2_init):
    """
    compute conditional variance by recursion of the GARCH specification
    """
    om,al,be = params
    vs = np.zeros(y.size)
    vs[0] = sig2_init
    for i in range(1,y.size):
        vs[i] = om + al * y[i-1]**2 + be * vs[i-1]
    return vs

def _nllgauss(y,params,sig2_init):
    """
    compute nll of GARCH(1,1) with zero conditional mean and gaussian noise
    """
    vs = _get_vs(y,params, sig2_init)
    nll =  np.log(vs) + y**2 / vs
    return nll.sum()

Now we are in a good place to start writing the main class.

from scipy.optimize import minimize

class GARCHETTE:
    """simple garch model"""
    def __init__(self):
        self._y = None
        self.params = None
        self._is_fit = False
        self._v_init = None

    def fit(self,y:np.ndarray):
        """fit y to garch"""
        self._y = y
        self._v_init = y.var()  
        self._is_fit = True
        func = self.nll
        self.params = minimize(
            func, 
            x0=(self._v_init * 0.4,0.3,0.3), 
            bounds=[(0,None), (0,None), (0,None)],
            constraints= {
                "type": "ineq",
                "fun": lambda x: 1-x[1]-x[2]
            }
         ).x
        return self
    
    def nll(self, params)-> float:
        """negative log likelihood of the series at the given params"""
        assert self._is_fit
        return _nllgauss(self._y, params, self._v_init)
    
    @property
    def vs(self):
        return _get_vs(self._y, self.params, self._v_init)

We see that the method nll and property vs are just wrappers around the helper functions that we just defined. The fit method, though, needs some attention. For the minimization of nll, apart from the objective function nll and initial value x0, it is also necessary to set bounds and constraints for the variables of nll. We impose the parameters to be non-negative because the sequence v must be non-negative (as squares). Also we need
\[ \alpha + \beta <1 \] because it is the necessary and sufficnet condition for the stationarity of GARCH(1,1) model. Having these constraints are essential for the optimization to work. Finally, we made an educated guess for the initial value: we set the initial values \(\alpha=\beta=0.3\) to be away from the boundary of the constrained parameter space. The initial value for \(\omega\) is set based on an important fact about the long term conditional variance: for any \(t\), as \(T\to\infty\): \[ \mathbb E[Y_{t+T}^2| \mathcal F_t] \to \frac{\omega}{1-\alpha-\beta} \] If we believe that the sample variance of y approaches the long term conditonal variance, then initialize om=(1-0.3-0.3)* y.var() is indeed sensible.

Now it remains to forecast conditional variances (or equivalently conditional volatilities) and simulate paths. For variance forecast, the recursion for \(h\ge 2\) is simpler than that of the GARCH specification: \[ \mathbb E[Y^2_{t+h}| \mathcal F_t] = \omega + \alpha \mathbb E[Y^2_{t+h-1}|\mathcal F_{t}] + \beta \mathbb E[\sigma^2_{t+h-1}|\mathcal F_{t}] = \omega + (\alpha +\beta) \mathbb E[Y^2_{t+h-1}|\mathcal F_{t}] \] This leads to

def _make_fcst_vs(params,last_v, last_y, horizon):
    om,al,be = params
    fcst_vs = np.empty(horizon)
    fcst_vs[0] = om + al* last_y**2 + be* last_v
    for i in range(1,horizon):
        fcst_vs[i] = om + (al+be) * fcst_vs[i-1]
    return fcst_vs

The simulation helper function is rather straightforward. We leave the possibility of specifying noises ws to the user.

def _simulate(params, last_y, last_v, horizon, n_rep, seed, ws=None):
    om,al,be = params
    if ws is None:
        np.random.seed(seed)
        ws = np.random.randn(n_rep,horizon)
    y = np.empty((n_rep, horizon+1))
    v = np.empty((n_rep, horizon+1))
    y[:,0] = last_y
    v[:,0] = last_v
    for i in range(1,horizon+1):
        v[:,i] = om + al* y[:,i-1]**2 + be * v[:,i-1]
        y[:,i] = np.sqrt(v[:,i]) * ws[:,i-1]
    return y[:,1:]

Combining everything, here is all there is in the archette library.

from typing import Literal

class GARCHETTE:
    """simple garch model"""
    def __init__(self):
        self._y = None
        self.params = None
        self._is_fit = False
        self._v_init = None

    def fit(self,y:np.ndarray):
        """fit y to garch"""
        self._y = y
        self._v_init = y.var()  
        self._is_fit = True
        func = self.nll
        self.params = minimize(
            func, 
            x0=(self._v_init * 0.4,0.3,0.3), 
            bounds=[(0,None), (0,None), (0,None)],
            constraints= {
                "type": "ineq",
                "fun": lambda x: 1-x[1]-x[2]
            }
         ).x
        return self
    
    def nll(self, params)-> float:
        """negative log likelihood of the series at the given params"""
        assert self._is_fit
        return _nllgauss(self._y, params, self._v_init)
    
    @property
    def vs(self):
        assert self._is_fit
        return _get_vs(self._y, self.params, self._v_init)
    
    @property
    def std_resids(self):
        assert self._is_fit
        return y/ np.sqrt(self.vs)

    def forecast_vs(self, 
                    horizon:int
                    ) -> np.ndarray:
        """forecast conditional variance in the horizon"""
        assert self._is_fit
        return _make_fcst_vs(self.params, self.vs[-1], self._y[-1], horizon)

    def simulate(self, 
                    horizon:int, # path length
                    method:Literal["bootstrap","simulate"]="simulate",# "bootstrap" resamples from past std_resids; "simulate" simulates gaussian nosie
                    n_rep=1_000, # number of repetitions
                    seed=42) -> np.ndarray: 
        assert self._is_fit
        if method == "bootstrap":
            np.random.seed(seed)
            ws = np.random.choice(self.std_resids, size=(n_rep, horizon),replace=True)
        else: ws=None
        return _simulate(self.params,self._y[-1], self.vs[-1], horizon, n_rep, seed, ws)

Sanity check

We define GARCH model with both arch and archette, we see that when the model is mis-specified, the MLE found different parameters.

mod_arch = arch_model(y, mean='zero', dist='normal', rescale=False).fit(disp='off')
mod_archette = GARCHETTE().fit(y)

mod_arch.params

omega       30.588087
alpha[1]     0.073224
beta[1]      0.926776
Name: params, dtype: float64

mod_archette.params

array([124.94299377,   0.16898145,   0.83101855])

What if model is well specified ?

y_sim = mod_archette.simulate(horizon=500, n_rep =1, seed=89).squeeze()

arch_model(y_sim, mean='zero', dist='normal', rescale=False).fit(disp='off').params

omega       622.399179
alpha[1]      0.172894
beta[1]       0.795693
Name: params, dtype: float64

GARCHETTE().fit(y_sim).params

array([5.98300620e+02, 1.66863034e-01, 8.02092039e-01])

The estimated parameters using arch and archette are quite close. Hoorey!