Bayesian Merging Scheme

The Strong Aggregating Algorithm for the logarithmic loss function and $\eta = 1$ is the same as the Bayesian scheme, which goes back to DeSantis et. al 1988, in the case of countable number of experts and number of outcomes. We call it the Bayesian Algorithm (BA) as it is virtually identical to the Bayes rule used in Bayesian learning (the main difference being that the experts are not required to follow any prediction strategies). We consider the case when the set of possible outcomes $\Omega = \{0,1\}$ and the prediction set $\Gamma = [0,1]$.

The Bayesian Algorithm works as follows.

Initialize weights of experts {$w_k:=1/K$}, {$k=1,\ldots,K$}.
Initialize cumulative losses {$L_t:=0, L_t^k:=0, k=1,\ldots,K$}.
FOR {$t=1,2,\dots$}
Experts announce predictions {$\gamma_t^k \in \Gamma, k=1,\ldots,K$}.
Predict {$\gamma_t(\omega)= \sum_{k=1}^K w_k \lambda(\omega,\gamma_t^k)$}.
Reality announces {$\omega_t\in\Omega$}.
Update the weights {$w_k := w_k \gamma_t^k(\omega_t)$} and normalize them {$w_k := w_k / \sum_{k=1}^K w_k$}, {$k=1,\ldots,K$}.
END FOR.

The following theoretical bound on the cumulative logarithmic loss of the Bayesian algorithm holds in case of the uniform weights distribution $P_0(d\theta) = 1/K$. \begin{equation*} L_T \le \min_{k=1,\ldots,K} L_T^k + \ln K \end{equation*} for all $T=1,2,\ldots$.

Bibliography

  • Alfredo DeSantis, George Markowsky, and Mark N. Wegman. Learning probabilistic

prediction functions. In ''Proceedings of the 29th Annual IEEE Symposium on Foundations of Computer Science'', pages 110–119, Los Alamitos, CA, USA, 1988. IEEE Computer Society.