Design an IRB model validation library

What is the status quo

A common workflow for validation of risk models is to write a bunch of functions for the relevant statistical tests (often offloading to scipy/statsmodels/sklearn/arch), save them in a py file, write an orchestration notebook that reads the data, runs a collection of functions and save the charts/spreadsheets somewhere.

Possible complaints about this workflow include:

passing around many dataframes from functions to functions can be painful
having to create many intermediate dataframes in memory can feel repetitive and ressource consuming
hard to see what are being tested at a glance

Move from calling functions to building a report object

The core design is to create a dataframe interface which allows users to specify the tests in one go. To be more precise, the library

registers a new namespace irb on the polars.LazyFrame class;
provides a unifided configuration interface for all PD, LGD, CCF models by calling polars.LazyFrame.irb.configure(id_col="obligor_id",score_col="score", ...), which in turn creates an empty Report object;
offers a fluent builder API for the Report class, allowing users to chain .check_X().check_Y();
The Report object is just a queue of checks which are not executed until the user calls .show() (pretty html report for notebook) or .run() (for manual inspection of specific tables/charts in the report).

The choice of polars over pandas fits the lazy execution philosophy here. But obviously this does not prevent pandas users from using it. Here is an example:

import pandas as pd
import polars as pl

df: pd.DataFrame
lf = pl.from_pandas(df).lazy()
report = lf.irb.configure(...)
(
    report
    .check_x()
    .check_y()
    .add_samples(...)
    .check_representativeness(versus=SAMPLE, variables= ...)
    .show()
)

Comments on the use of AI coding tools

We are in an era where LLMs write most of the code. In this instance, I mostly outlined the API design; reported bugs; insisted on being minimalist. I don't know how fast are we heading into an era where humans are not necessary even for design.

Personally, I don't feel comfortable with generating too many lines of code in one go becaues the assumptions made by LLM along the way can be misaligned with my intent. Also since I still read all the code LLM generates, I prefer let AI generate a small chunk of code at a time to keep my sanity. It's better for detecing early the divergence from my intent/design choice which are not provided in the initial prompt. Perhaps this is just a matter of personal preference.

A few implementation details

class diagram

alt text

register a new namespace

@pl.api.register_lazyframe_namespace("irb")
class IRBAccessor:
    def __init__(self, lf: pl.LazyFrame):
        self._lf = lf

    def configure(self, **kwargs) -> Report:
        return Report(self._lf, IRBConfig(**kwargs))

unified config for PD LGD CCF

@dataclass(frozen=True)
class IRBConfig:
    # Metadata
    id_col: str | None = None  # obligor or facility
    date_col: str | None = None

    # PD
    default_col: str | None = None
    score_col: str | None = None
    grade_col: str | None = None
    pd_col: str | None = None

    # LGD
    ...

User calls df.configure(score_col="score", default_col="default") to overwrite the default values (None).

Report class

The builder always returns a new Report object for immutability. It is cheap to create them because a lazyframe is just a query plan + reference to data source, and the checks are essentially callables. For instance, below RepresentativenessCheck is a wrapper around psi function which computes the PSI between two categorical columns.

class Report:
    def __init__(
        self,
        lf: pl.LazyFrame,
        config: IRBConfig,
        checks: list[Check] = None,
        samples: dict[str, pl.LazyFrame] = None
    ):
        self._lf = lf
        self._config = config
        self._checks = checks or []
        self._samples = samples or {}

    def add_samples(self, **samples: pl.LazyFrame) -> Self:
        """Returns a new Report containing the merged samples."""
        return Report(
            self._lf,
            self._config,
            self._checks,
            {**self._samples, **samples}
        )

    def check_representativeness(self, versus: str, variables: list[str]) -> Self:
        check = RepresentativenessCheck(
            target_lf=self._lf,
            baseline_lf=self._samples[versus],
            baseline_name=versus,
            variables=variables
        )

        return Report(
            self._lf,
            self._config,
            self._checks + [check],
            self._samples
        )