A tale of convexity in credit rating assignment problem

As I was exploring methodologies in credit rating assignment (say in the PD model), an elegant approach caught my attention. Here is the formulation:

given a collection of continuous scores and coresponding binary status (good or bad), find a partition of scores into bins that maximize the the information value (IV) aka the symmetrized KL divergence, subject to the constraints

monotonicity of the bad rates in the bins (e.g. default rate is increasing wrt score)

consecutive bins are statitically different (e.g. z test for binomials results in a p value < 0.05)

upper and lower bounds for the final number of bins

The IV is computed between the probability vectors \((B_i/\sum_j B_j)\) and \((G_i/\sum_j G_j)\), where \(B_i\) and \(G_i\) are the number of bad and good observations in bin \(i\).

A popular library for this task in the python ecosystem is optbinning which calls under the hood Google's OR-Tools.

At first sight, this is a very hard combinatorial optimization problem. A first reduction of complexity is to pre-bin the scores at a relatively granular level e.g. 100 bins by quantiles, then search for the optimal way of merging these pre-bins to achieve the best IV under constraints. This reduces the search space but the search problem is still a hard one.

A remarkable observation is the following

merging two bins only decreases the IV

This monotonicity makes the search problem tractable. If the IV in the current branch is lower than the best known IV value previously explored, this branch can be abandoned altogether. The exact implementation is still non trivial but luckily Google does it for us.

My main inerest here is to convince myself this remarkable property is true. Mathematically, this is equivalent to the statement that

\[ (x,y) \mapsto (x-y) \log \frac{x}{y}, \quad x, y>0 \]

is subadditive. To this end, we only need to show that

\[ g(x,y) = x \log\frac{x}{y} \]

is subadditive.

Note that \(g\) is homogeneous of degree 1, hence showing subadditivity of \(g\) is equivalent of showing the convexity of \(g\). Indeed, if \(g\) were convex,

\[ g(u + v) = 2 g((u+v)/2) \le 2 (g(u)/2+g(v)/2) = g(u)+g(v). \]

Now it remains to show the convexity of \(g\). Here is the key lemma:

Lemma

Let \(f:\mathbb R^d->\mathbb R\) be a convext function and define \(g:\mathbb R^{d+1}\to\mathbb{R}\) by

\[ g(x,t) = t f(\frac{x}{t}) \quad t>0. \]

Then \(g\) is a convex function.

We can apply the lemma with \(f(x)=x\log(x)\) and note that \(f\) is a convex function.

This lemma is striking. It allows us to claim convexity of multivariate functions from convexity of functions on lower dimensional space.

One can prove this lemma analytically but that does not offer much insight. A better explanation of why this lemma holds is that the epigraph of \(g\) is the cone generated by the epigraph of \(f\). How? If we place ourselves at the origin and put the flat epigraph of \(f\) at height \(t=1\), we form a cone if we collect all the rays travelling from the origin through the epigraph of \(f\). Note that \(f(x)=g(x,1)\) and sliding \(t\) up and down gives us a collection of scaled flat epigraph of \(f\), the union of which is the cone.