Defensive Forecasting

Defensive forecasting is a technique for competitive on-line prediction based on game-theoretic probability.

Basic principle

A typical law of game-theoretic probability (see, e.g., the strong law of large numbers for bounded observations) says that either some interesting property of the forecasts $p_n$ and outcomes $y_n$ (such as their unbiasedness $\lim_{N\to\infty}\frac{1}{N}\sum_{n=1}^N (y_n-p_n)=0$) holds or Sceptic's capital tends to infinity. Typically, the strategy for Sceptic implicit in such a law is continuous. The basic observation (made by Takemura in 2004; first published in Vovk et al. 2005) is that, if Sceptic is known to follow a continuous strategy, Forecaster can play in such a way as to prevent Sceptic's capital from growing. For example, adding signals $x_n$ to the protocol used in the strong law of large numbers for binary observations and making Sceptic announce his strategy for each round at the beginning of that round, we obtain the following protocol:

Players: Skeptic, Forecaster, Reality
Protocol:

Initialize $K_0=1$

FOR $n=1,2,\dots$:

Reality announces $x_n \in X$

Sceptic announces continuous $S_n:[0,1]\to\mathbb{R}$

Forecaster announces $p_n \in [0,1]$

Reality announces $y_n \in \{0,1\}$

$K_n = K_{n-1} + S_n(p_n)(y_n-p_n)$

END.

The usual restriction on Sceptic that his capital should always be nonnegative ($K_n \ge 0$ for all $n$) becomes irrelevant in defensive forecasting.

Basic Lemma. Forecaster has a strategy in the above protocol that ensures $K_0 \ge K_1 \ge \dots$.

The proof is trivial: Forecaster's is chosen as a root of $S_n(p_n)=0$ if one exists, or as $p_n=(1+\text{sign}(S_n))/2$ if $S_n$ has no roots in the interval $ [0,1] $.

The Basic Lemma can be generalized to a wide class of forecasting games. Therefore, for a typical continuous law of game-theoretic probability Forecaster has a strategy that guarantees that the corresponding property will hold for the realized sequence of forecasts and outcomes. The corresponding strategy for Forecaster is said to have been obtained by the method of defensive forecasting.

In particular, one can apply defensive forecasting to the property of calibration-cum-resolution, which leads to the K29 algorithm. In some applications, such as those to competitive on-line prediction described below, the property of calibration-cum-resolution is replaced by the conjunction of calibration and resolution.

Application to competitive on-line prediction

We would like to compete with prediction strategies indexed by $k$. Let $P(\Omega)$ be the set of all probability measures on $\Omega$ and $G:P(\Omega) \to \Gamma$ be a choice function:

$G(p)\in\arg\min_{\gamma \in \Gamma} \lambda(\gamma,p)$,

where $\lambda(\gamma,p):=\int{\lambda(\gamma,\omega)p(d\omega)}$, $p \in P(\Omega)$. If the probability forecasts are chosen by defensive forecasting for suitable strategies for Sceptic, we will have:

$L_N = \sum_{n=1}^N \lambda(G(p_n),\omega_n) = \sum_{n=1}^N \lambda(G(p_n),p_n) + A_N$

${}\le \sum_{n=1}^N \lambda(\gamma_n^k,p_n) + A_N = \sum_{n=1}^N \lambda(\gamma_n^k,\omega_n) + A_N + A_N^k = L_N^k + A_N + A_N^k $,

where $L_N$ is Learner's loss over $N$ steps, $L_N^k$ is the $k$th strategy's loss over $N$ steps, $A_N=\sum_{n=1}^N (\lambda(G(p_n),\omega_n) - \lambda(G(p_n),p_n))$ is small, and $A_N^k=\sum_{n=1}^N (\lambda(\gamma_n^k,p_n) - \lambda(\gamma_n^k, \omega_n))$ is also small. The smallness of $A_N, A_N^k$ (as compared with $N$) will be ensured if defensive forecasting is applied to suitable forms of the law of large numbers.

The chain of equalities and inequalities given above says, in effect, that real losses are close to the (one-step-ahead conditional) expected losses. This part is a defensive forecasting part of the proof. The expected loss of Learner cannot exceed the expected loss of any of the strategies by the definition of a choice function. The choice funtion also tranforms the probability forecasts of defensive forecasting into real forecasts. Instead of existence of the actual choice function Vovk (2007) proves the existence of the approximate choice function.

In each particular case one should estimate the numbers $A_N, A_N^k$. The inherent accuracy given by the law of large numbers is at best $\sqrt{N}$, and so this technique cannot provide a regret term better than $\sqrt{N}$, the difference between the expected and real losses. There is Shortcut Defensive Forecasting for finitely many experts that allows one to replace $\sqrt{N}$ with smaller terms.

The technique of defensive forecasting can also be applied to infinite (and uncountable) classes of prediction rules, in particular reproducing kernel Hilbert spaces and proper functional Banach spaces.

Two approaches to defensive forecasting

The well-known example due to Oakes (simplified by Dawid) imposes severe limitations on the possibility of well-calibrated forecasting. There are two known ways to overcome difficulties pointed out by Oakes and Dawid:

assuming that the strategy for Sceptic is continuous;
using randomization.

This article concentrated on the first approach. Both approaches have also been widely studied in von Mises's version of game-theoretic probability and in measure-theoretic probability (Foster, Vohra, Young, Sandroni, and many other people).

Bibliography

Vladimir Vovk, Akimichi Takemura, and Glenn Shafer. Defensive forecasting. In: Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (ed. by R. G. Cowell and Z. Ghahramani), pp. 365 � 372. Society for Artificial Intelligence and Statistics, 2005.
Vladimir Vovk. On-line regression competitive with reproducing kernel Hilbert spaces (extended abstract). In: Theory and Applications of Models of Computation. Proceedings of the Third Annual Conference on Computation and Logic (ed. by J.-Y. Cai, S. B. Cooper and A. Li), Lecture Notes in Computer Science, vol. 3959, pp. 452 � 463. Berlin: Springer, 2006.
Vladimir Vovk. Predictions as statements and decisions. arXiv:cs/0606093v1 [cs.LG], 2007