Design an IRB model validation library
What is the status quo
A common workflow for validation of risk models is to write a bunch of functions for the relevant statistical tests (often offloading to scipy/statsmodels/sklearn/arch), save them in a py file, write an orchestration notebook that reads the data, runs a collection of functions and save the charts/spreadsheets somewhere.
Possible complaints about this workflow include:
- passing around many dataframes from functions to functions can be painful
- having to create many intermediate dataframes in memory can feel repetitive and ressource consuming
- hard to see what are being tested at a glance
Move from calling functions to building a report object
The core design is to create a dataframe interface which allows users to specify the tests in one go. To be more precise, the library
- registers a new namespace
irbon thepolars.LazyFrameclass; - provides a unifided configuration interface for all PD, LGD, CCF models by calling
polars.LazyFrame.irb.configure(id_col="obligor_id",score_col="score", ...), which in turn creates an emptyReportobject; - offers a fluent builder API for the
Reportclass, allowing users to chain.check_X().check_Y(); - The
Reportobject is just a queue of checks which are not executed until the user calls.show()(pretty html report for notebook) or.run()(for manual inspection of specific tables/charts in the report).
The choice of polars over pandas fits the lazy execution philosophy here. But obviously this does not prevent pandas users from using it. Here is an example:
import pandas as pd
import polars as pl
df: pd.DataFrame
lf = pl.from_pandas(df).lazy()
report = lf.irb.configure(...)
(
report
.check_x()
.check_y()
.add_samples(...)
.check_representativeness(versus=SAMPLE, variables= ...)
.show()
)
Comments on the use of AI coding tools
We are in an era where LLMs write most of the code. In this instance, I mostly outlined the API design; reported bugs; insisted on being minimalist. I don't know how fast are we heading into an era where humans are not necessary even for design.
Personally, I don't feel comfortable with generating too many lines of code in one go becaues the assumptions made by LLM along the way can be misaligned with my intent. Also since I still read all the code LLM generates, I prefer let AI generate a small chunk of code at a time to keep my sanity. It's better for detecing early the divergence from my intent/design choice which are not provided in the initial prompt. Perhaps this is just a matter of personal preference.
A few implementation details
class diagram

register a new namespace
@pl.api.register_lazyframe_namespace("irb")
class IRBAccessor:
def __init__(self, lf: pl.LazyFrame):
self._lf = lf
def configure(self, **kwargs) -> Report:
return Report(self._lf, IRBConfig(**kwargs))
unified config for PD LGD CCF
@dataclass(frozen=True)
class IRBConfig:
# Metadata
id_col: str | None = None # obligor or facility
date_col: str | None = None
# PD
default_col: str | None = None
score_col: str | None = None
grade_col: str | None = None
pd_col: str | None = None
# LGD
...
User calls df.configure(score_col="score", default_col="default") to overwrite the default values (None).
Report class
The builder always returns a new Report object for immutability. It is cheap to create them because a lazyframe is just a query plan + reference to data source, and the checks are essentially callables. For instance, below RepresentativenessCheck is a wrapper around psi function which computes the PSI between two categorical columns.
class Report:
def __init__(
self,
lf: pl.LazyFrame,
config: IRBConfig,
checks: list[Check] = None,
samples: dict[str, pl.LazyFrame] = None
):
self._lf = lf
self._config = config
self._checks = checks or []
self._samples = samples or {}
def add_samples(self, **samples: pl.LazyFrame) -> Self:
"""Returns a new Report containing the merged samples."""
return Report(
self._lf,
self._config,
self._checks,
{**self._samples, **samples}
)
def check_representativeness(self, versus: str, variables: list[str]) -> Self:
check = RepresentativenessCheck(
target_lf=self._lf,
baseline_lf=self._samples[versus],
baseline_name=versus,
variables=variables
)
return Report(
self._lf,
self._config,
self._checks + [check],
self._samples
)