Mondrian Conformal Predictor

Main.MondrianConformalPredictor History

Hide minor edits - Show changes to output

May 18, 2017, at 01:56 PM by Vovk - added link
Added lines 66-69:

!! Applications

[[ExCAPE project]]
Changed line 68 from:
* Vineeth N. Balasubramanian, Shen-Shyang Ho, and Vladimir Vovk, editors (2014).  ''Conformal Prediction for Reliable Machine Learning: Theory, Adaptations, and Applications.  Morgan Kaufmann, Chennai.
to:
* Vineeth N. Balasubramanian, Shen-Shyang Ho, and Vladimir Vovk, editors (2014).  ''Conformal Prediction for Reliable Machine Learning: Theory, Adaptations, and Applications''.  Morgan Kaufmann, Chennai.
Added lines 43-44:
Another example is where we want to make the analysis conditional on a type of objects.  For example, if the objects are people, we might want to define $\kappa_i:=\text{male}$ if $x_i$ is male and $\kappa_i:=\text{female}$ if $x_i$ is female.
Changed line 59 from:
In the next theorem we assume that $\kappa_i$ depends only on $z_i$.
to:
In the next theorem we assume that $\kappa_i$ depends only on $z_i$ (without this assumption, we can still guarantee the right coverage probability but the independence of errors may be lost).
Added line 37:
(Formally, as defined here, an MCP is a [[confidence predictor]] only when all $\epsilon_k$ coincide.)
Changed line 58 from:
'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to {$K$}'', that is, for all $n$, the conditional probability distribution of {$(p_1,p_2,\ldots)$} given {$\kappa_1,\kappa_2,\ldots$} is uniform on {$[0,1]^{\infty}$}, where the observations $z_1,z_2,\ldots$ are generated from an [[exchangeable probability distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.
to:
'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $(p_1,p_2,\ldots)$ given $\kappa_1,\kappa_2,\ldots$ is uniform on ${[0,1]}^{\infty}$, where the observations $z_1,z_2,\ldots$ are generated from an [[exchangeable probability distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.
Changed line 58 from:
'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $(p_1,p_2,\ldots)$ given $\kappa_1,\kappa_2,\ldots$ is uniform on {$[0,1]^{\infty}$}, where the observations $z_1,z_2,\ldots$ are generated from an [[exchangeable probability distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.
to:
'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to {$K$}'', that is, for all $n$, the conditional probability distribution of {$(p_1,p_2,\ldots)$} given {$\kappa_1,\kappa_2,\ldots$} is uniform on {$[0,1]^{\infty}$}, where the observations $z_1,z_2,\ldots$ are generated from an [[exchangeable probability distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.
Changed line 7 from:
A ''simple Mondrian taxonomy'' is a function $K$ that maps any sequence of observations $z_1,\ldots,z_n$ to a same-length sequence $\kappa_1,\ldots,\kappa_n$ of categories that is equivariant with respect to permutations:
to:
A ''simple Mondrian taxonomy'' is a function $K$ that maps any finite sequence of observations $z_1,\ldots,z_n$ to a same-length sequence $\kappa_1,\ldots,\kappa_n$ of categories that is equivariant with respect to permutations:
Changed line 13 from:
We refer to $\kappa_i$ as the ''category of'' $z_i$.  A ''$K$-conditional conformity measure'' is a measurable function $A$ that maps every data sequence $z_1,\lodts,z_n$ to a same-length sequence $\alpha_1,\ldots,\alpha_n$ and that is equivariant with respect to permutations leaving the categories intact. 
to:
We refer to $\kappa_i$ as the ''category of'' $z_i$.  A ''$K$-conditional conformity measure'' is a measurable function $A$ that maps every data sequence $z_1,\ldots,z_n$ to a same-length sequence $\alpha_1,\ldots,\alpha_n$ and that is equivariant with respect to permutations leaving the categories intact. 
Changed line 58 from:
'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $(p_1,p_2,\ldots)$ given $\kappa_1,\kappa_2,\ldots$ is uniform on $[0,1]^{\infty}$, where the observations $z_1, z_2, \ldots$ are generated from an [[exchangeable (probability) distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.
to:
'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $(p_1,p_2,\ldots)$ given $\kappa_1,\kappa_2,\ldots$ is uniform on {$[0,1]^{\infty}$}, where the observations $z_1,z_2,\ldots$ are generated from an [[exchangeable probability distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.
May 13, 2017, at 04:30 PM by Vovk - finished rewrite
Changed lines 58-60 from:
'''Theorem''' ''Any smoothed MCP based on a simplified Mondrian taxonomy $K$ is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $p_n$ given $\kappa_1, p_1, \ldots, \kappa_n, z_{n-1}, p_{n-1}, \kappa_n$ is uniform on $[0, 1]$, where {$z_1, z_2, \ldots$} are examples generated from an [[exchangeable (probability) distribution]] on {$Z^{\infty}$} and {$p_1, p_2, \ldots$} are the p-values output by {$f$}.

This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category {$k$} is equal to {$\epsilon_k$} for each {$k$}. Therefore, the frequency of errors made by MCPs on examples of category {$k$} does not exceed {$\epsilon_k$} for each {$k$}.
to:
'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $(p_1,p_2,\ldots)$ given $\kappa_1,\kappa_2,\ldots$ is uniform on $[0,1]^{\infty}$, where the observations $z_1, z_2, \ldots$ are generated from an [[exchangeable (probability) distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.

This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category {$k$} is equal to {$\epsilon_k$} for each {$k$}. Therefore, the frequency of errors made by MCPs on examples of category $k$ does not exceed $\epsilon_k$ for each $k$.

An important feature of MCPs is that the theorem only requires the observations in the same category to be exchangeable: it will remain true if the equality in the definition of exchangeability is only required to hold for permutations $\pi$ leaving the categories intact.  A nice example (due to [[http://daniil.ryabko.net/ | Daniil Ryabko]]) showing the significance of this relaxation is recognizing handwritten text (assuming the text has been divided into separate symbols): the label-conditional conformal predictor will be valid provided that different instances of the same symbol are exchangeable; we do not need to require that the sequence of symbols in the text be exchangeable (this assumption would be patently wrong)
.
May 13, 2017, at 03:10 PM by Vovk - partially fixed
Changed lines 5-6 from:
A ''Mondrian taxonomy'' is a measurable function $\kappa:N \times Z \to K$, where $Z$ is a set of [[observations -> confidence predictor]] and $K$ is a countable set of elements called ''categories'', that satisfies some natural properties given in Vovk et al. (2005), Section 4.5.  The definition involve partitions into rectangles, reminiscent of [[https://en.wikipedia.org/wiki/Piet_Mondrian | Piet Mondrian's]] paintings (see also [[https://scottlocklin.wordpress.com/2016/12/05/predicting-with-confidence-the-best-machine-learning-idea-you-never-heard-of/ | Scott Locklin's blog]]).  A simple example of MCPs is given by ''label-conditional conformal predictors'', which require the label space $\mathbf{Y}$ to be finite.  In this case the set of categories $K=\mathbf{Y}$ coincides with the set of labels.  The ''Mondrian conformal predictor'' determined by a conformity measure $A$ and a set of significance levels $\epsilon_k$, $k \in K$,
is defined by
to:
A ''Mondrian taxonomy'' is a measurable function $\kappa:N \times Z \to K$, where $Z$ is a set of [[observations -> confidence predictor]] and $K$ is a countable set of elements called ''categories'', that satisfies some natural properties given in Vovk et al. (2005), Section 4.5.  The definition involve partitions into rectangles, reminiscent of [[https://en.wikipedia.org/wiki/Piet_Mondrian | Piet Mondrian's]] paintings (see also [[https://scottlocklin.wordpress.com/2016/12/05/predicting-with-confidence-the-best-machine-learning-idea-you-never-heard-of/ | Scott Locklin's blog]]).  However, in this article we will use a simpler definition given in Balasubramanian et al. (2014), Section 2.2.

A ''simple Mondrian taxonomy'' is a function
$K$ that maps any sequence of observations $z_1,\ldots,z_n$ to a same-length sequence $\kappa_1,\ldots,\kappa_n$ of categories that is equivariant with respect to permutations:
Added lines 9-16:
  (\kappa_1,\ldots,\kappa_n) = K(z_1,\ldots,z_n)
  \Longrightarrow
  (\kappa_{\pi(1)},\ldots,\kappa_{\pi(n)}) = K(z_{\pi(1)},\ldots,z_{\pi(n)}).
\]
We refer to $\kappa_i$ as the ''category of'' $z_i$.  A ''$K$-conditional conformity measure'' is a measurable function $A$ that maps every data sequence $z_1,\lodts,z_n$ to a same-length sequence $\alpha_1,\ldots,\alpha_n$ and that is equivariant with respect to permutations leaving the categories intact. 

The ''Mondrian conformal predictor'' determined by a $K$-conditional conformity measure $A$ and a set of significance levels $\epsilon_k$, $k \in K$, is defined by
\[
Changed line 29 from:
  \frac{|\{i = 1, \ldots, n+1: y_i = y_{n+1}  \& \alpha^y_i \le \alpha_{n+1}\}|}{|\{i = 1, \ldots, n+1: y_i = y_{n+1}\}|},
to:
  \frac{|\{i=1,\ldots,n+1 \mid \kappa_i^y = y_{n+1}^y  \;\&\; \alpha^y_i \le \alpha^y_{n+1}\}|}{|\{i=1,\ldots,n+1: \kappa^y_i = \kappa^y_{n+1}\}|},
Changed line 31 from:
where {$\kappa_i := \kappa(i, z_i), z_i := (x_i, y_i)$} and
to:
where $(\kappa^y_1,\ldots,\kappa^y_{n+1}) := K(z_1,\ldots,z_{n+1})$ and
Changed lines 33-35 from:
  \alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)
to:
  (\alpha^y_1,\ldots,\alpha^y_{n+1})
  :=
  A(
z_1,\ldots,z_n,(x_{n+1},y)).
Changed lines 37-44 from:
for {$i = 1, \ldots, n$} such that {$\kappa_i = \kappa_n$} ({$\{z_j: \ldots \}$} denotes a multiset).

The standard assumption for MCPs is [[randomness assumption]] (also called the i.i.d. assumption).

Important special cases of MCPs:
*[[conformal predictor]]s
*[[inductive conformal predictor]]s

to:

The standard assumption for MCPs is the [[randomness assumption]] (also called the IID assumption
).

A simple example is given by ''label-conditional conformal predictors'', which require the label space $\mathbf{Y}$ to be finite. In this case the set of categories $K=\mathbf{Y}$ coincides with the set of labels, and the category of an observation $(x_i,y_i)$ is simply its label, $\kappa_i:=y_i$.
Changed lines 46-58 from:
''Smoothed MCPs'' are defined analogously to [[smoothed conformal predictors -> conformal predictor]]:

%center% {$ \Gamma
^{(\epsilon_k: k \in K)}(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$}

is set to the set of all labels
{$y \in  Y$} such that

%center%
{$p_n > \epsilon{\kappa(n, (x_n, y))} $}, where 
{$p_n = (|\{i = 1, \ldots, n: \kappa_i = \kappa_n  \& \alpha_i > \alpha_n \}| + \eta_n |\{i = 1, \ldots, n: \kappa_i = \kappa_n  \& \alpha_i = \alpha_n \}|  )/|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|$},

the nonconformity scores {$\alpha_i$} are defined as before and {$\eta_n\in[0,1]$} is a random number
.

'''Theorem''' ''Any smoothed MCP based on a Mondrian taxonomy {$\kappa$} is category-wise exact w.r. to {$\kappa$}'', that is, for all {$n$}, the conditional probability distribution of {$p_n$} given {$\kappa(1, z_1), p_1, \ldots, \kappa(n-1, z_{n-1}), p_{n-1}, \kappa(n, z_n)$} is uniform on [0, 1], where {$z_1, z_2, \ldots$} are examples generated from an [[exchangeable (probability) distribution]] on {$Z^{\infty}$} and {$p_1, p_2, \ldots$} are the p-values output by {$f$}.
to:
''Smoothed MCPs'' are defined analogously to [[smoothed conformal predictors -> conformal predictor]]: the definition of $p^y$ is now modified to
\[
  p^
y
  :=
  \frac
{|\{i=1,\ldots,n+1 \mid \kappa_i^y = y_{n+1}^y  \;\&\; \alpha^y_i < \alpha^y_{n+1}\}|
 
  + \eta_y |\{i=1,\ldots,n+1 \mid \kappa_i^y = y_{n+1}^y  \;\&\; \alpha^y_i = \alpha^y_{n+1}\}|}
  {|\{i=1,\ldots,n+1: \kappa^y_i = \kappa^y_{n+1}\}|},
\]
and the rest of the definition is the same.  Here $
\eta_y\sim U[0,1]$ are random numbers (chosen independently at different trials and independently of the observations).

In the next theorem we assume that $\kappa_i$ depends only on $z_i$.

'''Theorem''' ''Any smoothed MCP based on a simplified Mondrian taxonomy
$K$ is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $p_n$ given $\kappa_1, p_1, \ldots, \kappa_n, z_{n-1}, p_{n-1}, \kappa_n$ is uniform on $[0, 1]$, where {$z_1, z_2, \ldots$} are examples generated from an [[exchangeable (probability) distribution]] on {$Z^{\infty}$} and {$p_1, p_2, \ldots$} are the p-values output by {$f$}.
Added line 63:
* Vineeth N. Balasubramanian, Shen-Shyang Ho, and Vladimir Vovk, editors (2014).  ''Conformal Prediction for Reliable Machine Learning: Theory, Adaptations, and Applications.  Morgan Kaufmann, Chennai.
Changed lines 1-2 from:
''Mondrian conformal predictors (MCPs)'' represent a wide class of [[confidence predictor]]s which is the generalisation of [[conformal predictor]]s and [[inductive conformal predictor]]s with a new property of validity, validity within categories (see section '''Validity'''); this kind of validity is stronger than usual [[conservative validity]] for [[conformal predictor]]s.
to:
''Mondrian conformal predictors (MCPs)'' represent a wide class of [[confidence predictor]]s which is the generalisation of [[conformal predictor]]s and [[inductive conformal predictor]]s with a new property of validity, validity within categories; this kind of validity is stronger (for large data sets) than the usual [[conservative validity]] for [[conformal predictor]]s.
Changed lines 5-16 from:
A ''Mondrian taxonomy'' is a measurable function $\kappa:N \times Z \to K$, where {$Z$} is a set of [[observations -> confidence predictor]], $K$ is the measurable space (at most countable with the discrete {$\sigma$}-algebra) of elements called ''categories'', with the following property:
the elements {$\kappa^{-1}(k)$} of each category {$k \in K$} form a rectangle {$A \times B$}, for some {$A \subseteq N$} and {$B \subseteq Z$}
.
In words, a Mondrian taxonomy defines a division
of the Cartesian product {$N \times Z$} into categories.

''Mondrian nonconformity measure'' based on a Mondrian taxonomy {$\kappa$} is a family of measurable functions {$\{A_n:n\in N\}$} of the type

%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \to \bar{\mathbb{R}}
$},

where {$(Z ^ {(*)})^K$} is  a set of all functions mapping {$K$} to the set of all bags (multisets) of elements of {$Z$}.

The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma:\mathbf{Z}^*\times X\times (0,1)^K\to 2^Y$}
({$2^Y$} is a set of all subsets of {$Y$}) such that the [[prediction set -> confidence predictor]] {$ \Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$} is defined as the set of all labels {$y\in Y$} such that
to:
A ''Mondrian taxonomy'' is a measurable function $\kappa:N \times Z \to K$, where $Z$ is a set of [[observations -> confidence predictor]] and $K$ is a countable set of elements called ''categories'', that satisfies some natural properties given in Vovk et al. (2005), Section 4.5.  The definition involve partitions into rectangles, reminiscent of [[https://en.wikipedia.org/wiki/Piet_Mondrian | Piet Mondrian's]] paintings (see also [[https://scottlocklin.wordpress.com/2016/12/05/predicting-with-confidence-the-best-machine-learning-idea-you-never-heard-of/ | Scott Locklin's blog]]).  A simple example of MCPs is given by ''label-conditional conformal predictors'', which require the label space $\mathbf{Y}$ to be finite.  In this case the set of categories $K=\mathbf{Y}$ coincides with the set of labels.  The ''Mondrian conformal predictor'' determined by a conformity measure $A$ and a set of significance levels $\epsilon_k$, $k \in K$,
is defined by
Changed lines 8-10 from:
  p_n > \epsilon_{\kappa(n, (x_n, y))},
  \qquad
  p
_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n  \& \alpha_i \ge \alpha_n \}|  /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|
to:
  \Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n}, y_{n}, x_{n+1})
  :=
 
\left\{
    y \in \mathbf{Y}
   \mid
    p^y > \epsilon_y
 
\right\},
Changed line 16 from:
where {$\kappa_i := \kappa(i, z_i), z_i := (x_i, y_i)$} and
to:
where
Changed lines 18-20 from:
  \alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)
to:
  p^y
 
:=
  \frac{|\{i = 1, \ldots, n+1: y_i = y_{n+1} \& \alpha^y_i \le \alpha_{n+1}\}|}{|\{i = 1, \ldots, n+1: y_i = y_{n+1}\}|},
Added lines 22-25:
where {$\kappa_i := \kappa(i, z_i), z_i := (x_i, y_i)$} and
\[
  \alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)
\]
Changed lines 51-54 from:
This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category {$k$} is equal to {$\epsilon_k$} for each {$k$}. Therefore, the frequency of errors made by MCPs on examples of category {$k$} does not exceed {$\epsilon_k$} for each {$k$}.
to:
This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category {$k$} is equal to {$\epsilon_k$} for each {$k$}. Therefore, the frequency of errors made by MCPs on examples of category {$k$} does not exceed {$\epsilon_k$} for each {$k$}.

'''Bibliography'''
* Vladimir Vovk, Alexander Gammerman, and Glenn Shafer (2005). [[http://www.alrw.net | Algorithmic learning in a random world]]. Springer, New York
.
Changed line 15 from:
The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma: Z^*\times X\times (0,1)^K\to 2^Y$}
to:
The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma:\mathbf{Z}^*\times X\times (0,1)^K\to 2^Y$}
Changed lines 18-20 from:
  p_n > \epsilon_{\kappa(n, (x_n, y))}$}{$p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n  \& \alpha_i \ge \alpha_n \}|  /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|
to:
  p_n > \epsilon_{\kappa(n, (x_n, y))},
  \qquad
 
p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n  \& \alpha_i \ge \alpha_n \}|  /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|
May 13, 2017, at 01:27 PM by Vovk - slightly updated
Changed lines 1-3 from:
''Mondrian conformal predictors (MCPs)'' represent a wide class of [[confidence predictor]]s which is the generalisation of [[transductive conformal predictor]]s and [[inductive conformal predictor]]s with a new main property - validity within categories (see section '''Validity'''), a kind of validity that is stronger than usual [[conservative validity]] for TCMs.

to:
''Mondrian conformal predictors (MCPs)'' represent a wide class of [[confidence predictor]]s which is the generalisation of [[conformal predictor]]s and [[inductive conformal predictor]]s with a new property of validity, validity within categories (see section '''Validity'''); this kind of validity is stronger than usual [[conservative validity]] for [[conformal predictor]]s.
Changed line 5 from:
''Mondrian taxonomy'' is a measurable function {$\kappa:N \times Z \rightarrow K$}, where {$Z$} is a set of [[examples -> confidence predictor]], {$K$} is the measurable space (at most countable with the discrete {$\sigma$}-algebra) of elements called ''categories'', with the following property:
to:
A ''Mondrian taxonomy'' is a measurable function $\kappa:N \times Z \to K$, where {$Z$} is a set of [[observations -> confidence predictor]], $K$ is the measurable space (at most countable with the discrete {$\sigma$}-algebra) of elements called ''categories'', with the following property:
Changed lines 9-12 from:
''Mondrian nonconformity measure'' based on a mondrian taxonomy {$\kappa$} is a family of measurable functions {$\{A_n:n\in N\}$} of the type

%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \rightarrow \bar{\mathbb{R}}$},
to:
''Mondrian nonconformity measure'' based on a Mondrian taxonomy {$\kappa$} is a family of measurable functions {$\{A_n:n\in N\}$} of the type

%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \to \bar{\mathbb{R}}$},
Changed lines 15-23 from:
The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma: Z^*\times X\times (0,1)^K\rightarrow 2^Y$}
({$2^Y$} is a set of all subsets of {$Y$}) such that the [[prediction set -> confidence predictor]] {$ \Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$} is defined as the set of all lables {$y\in Y$} such that

%center% {$p_n > \epsilon_{\kappa(n, (x_n, y))}$},  {$p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n  \& \alpha_i \ge \alpha_n \}|  /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|$}

where {$\kappa_i
:= \kappa(i, z_i), z_i := (x_i, y_i)$} and

%center% {$
\alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)$}
to:
The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma: Z^*\times X\times (0,1)^K\to 2^Y$}
({$2^Y$} is a set of all subsets of {$Y$}) such that the [[prediction set -> confidence predictor]] {$ \Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$} is defined as the set of all labels {$y\in Y$} such that
\[
 
p_n > \epsilon_{\kappa(n, (x_n, y))}$},  {$p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n  \& \alpha_i \ge \alpha_n \}|  /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|
\]
where {
$\kappa_i := \kappa(i, z_i), z_i := (x_i, y_i)$} and
\[
  \alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)
\]
Deleted line 25:
Changed line 29 from:
*[[transductive conformal predictor]]s
to:
*[[conformal predictor]]s
Deleted line 31:
Changed lines 36-37 from:
''Smoothed MCPs'' are defined analogously to [[smoothed conformal predictors -> transductive conformal predictor]]:
to:
''Smoothed MCPs'' are defined analogously to [[smoothed conformal predictors -> conformal predictor]]:
Changed lines 47-49 from:

'''Theorem''' ''Any smoothed MCP based on a Mondrian taxonomy {$\kappa$} is category-wise exact w.r. to {$\kappa$}'', that is,
for all {$n$}, the conditional probability distribution of {$p_n$} given {$\kappa(1, z_1), p_1, \ldots, \kappa(n-1, z_{n-1}), p_{n-1}, \kappa(n, z_n)$} is uniform on [0, 1], where {$z_1, z_2, \ldots$} are examples generated from an [[exchangeable (probability) distribution]] on {$Z^{\infty}$} and {$p_1, p_2, \ldots$} are the p-values output by {$f$}.
to:
'''Theorem''' ''Any smoothed MCP based on a Mondrian taxonomy {$\kappa$} is category-wise exact w.r. to {$\kappa$}'', that is, for all {$n$}, the conditional probability distribution of {$p_n$} given {$\kappa(1, z_1), p_1, \ldots, \kappa(n-1, z_{n-1}), p_{n-1}, \kappa(n, z_n)$} is uniform on [0, 1], where {$z_1, z_2, \ldots$} are examples generated from an [[exchangeable (probability) distribution]] on {$Z^{\infty}$} and {$p_1, p_2, \ldots$} are the p-values output by {$f$}.
Changed line 12 from:
%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \rightarrow \bar{R}$},
to:
%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \rightarrow \bar{\mathbb{R}}$},
Deleted line 49:
As in the case of [[TCPs->transductive conformal predictor]], we can formulate analogous statements for ICPs using the same terminology.
Changed line 1 from:
''Mondrian conformal predictors (MCPs)'' are a wide class of [[confidence predictor]]s which is the generalisation of [[transductive conformal predictor]]s and [[inductive conformal predictor]]s. Their main property is validity within categories (see section '''Validity'''), a kind of validity that is stronger than usual [[conservative validity]] for TCMs.
to:
''Mondrian conformal predictors (MCPs)'' represent a wide class of [[confidence predictor]]s which is the generalisation of [[transductive conformal predictor]]s and [[inductive conformal predictor]]s with a new main property - validity within categories (see section '''Validity'''), a kind of validity that is stronger than usual [[conservative validity]] for TCMs.
Added lines 1-55:
''Mondrian conformal predictors (MCPs)'' are a wide class of [[confidence predictor]]s which is the generalisation of [[transductive conformal predictor]]s and [[inductive conformal predictor]]s. Their main property is validity within categories (see section '''Validity'''), a kind of validity that is stronger than usual [[conservative validity]] for TCMs.


!!Definition

''Mondrian taxonomy'' is a measurable function {$\kappa:N \times Z \rightarrow K$}, where {$Z$} is a set of [[examples -> confidence predictor]], {$K$} is the measurable space (at most countable with the discrete {$\sigma$}-algebra) of elements called ''categories'', with the following property:
the elements {$\kappa^{-1}(k)$} of each category {$k \in K$} form a rectangle {$A \times B$}, for some {$A \subseteq N$} and {$B \subseteq Z$}.
In words, a Mondrian taxonomy defines a division of the Cartesian product {$N \times Z$} into categories.

''Mondrian nonconformity measure'' based on a mondrian taxonomy {$\kappa$} is a family of measurable functions {$\{A_n:n\in N\}$} of the type

%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \rightarrow \bar{R}$},

where {$(Z ^ {(*)})^K$} is  a set of all functions mapping {$K$} to the set of all bags (multisets) of elements of {$Z$}.

The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma: Z^*\times X\times (0,1)^K\rightarrow 2^Y$}
({$2^Y$} is a set of all subsets of {$Y$}) such that the [[prediction set -> confidence predictor]] {$ \Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$} is defined as the set of all lables {$y\in Y$} such that

%center% {$p_n > \epsilon_{\kappa(n, (x_n, y))}$},  {$p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n  \& \alpha_i \ge \alpha_n \}|  /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|$}

where {$\kappa_i := \kappa(i, z_i), z_i := (x_i, y_i)$} and

%center% {$\alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)$}

for {$i = 1, \ldots, n$} such that {$\kappa_i = \kappa_n$} ({$\{z_j: \ldots \}$} denotes a multiset).


The standard assumption for MCPs is [[randomness assumption]] (also called the i.i.d. assumption).

Important special cases of MCPs:
*[[transductive conformal predictor]]s
*[[inductive conformal predictor]]s


!! Validity

All the statements in the section are given under the [[randomness assumption]].

''Smoothed MCPs'' are defined analogously to [[smoothed conformal predictors -> transductive conformal predictor]]:

%center% {$ \Gamma^{(\epsilon_k: k \in K)}(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$}

is set to the set of all labels {$y \in  Y$} such that

%center% {$p_n > \epsilon{\kappa(n, (x_n, y))} $}, where 
{$p_n = (|\{i = 1, \ldots, n: \kappa_i = \kappa_n  \& \alpha_i > \alpha_n \}| + \eta_n |\{i = 1, \ldots, n: \kappa_i = \kappa_n  \& \alpha_i = \alpha_n \}|  )/|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|$},

the nonconformity scores {$\alpha_i$} are defined as before and {$\eta_n\in[0,1]$} is a random number.

As in the case of [[TCPs->transductive conformal predictor]], we can formulate analogous statements for ICPs using the same terminology.

'''Theorem''' ''Any smoothed MCP based on a Mondrian taxonomy {$\kappa$} is category-wise exact w.r. to {$\kappa$}'', that is,
for all {$n$}, the conditional probability distribution of {$p_n$} given {$\kappa(1, z_1), p_1, \ldots, \kappa(n-1, z_{n-1}), p_{n-1}, \kappa(n, z_n)$} is uniform on [0, 1], where {$z_1, z_2, \ldots$} are examples generated from an [[exchangeable (probability) distribution]] on {$Z^{\infty}$} and {$p_1, p_2, \ldots$} are the p-values output by {$f$}.

This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category {$k$} is equal to {$\epsilon_k$} for each {$k$}. Therefore, the frequency of errors made by MCPs on examples of category {$k$} does not exceed {$\epsilon_k$} for each {$k$}.