# Mondrian Conformal Predictor

## Main.MondrianConformalPredictor History

!! Applications

[[ExCAPE project]]
Changed line 68 from:
* Vineeth N. Balasubramanian, Shen-Shyang Ho, and Vladimir Vovk, editors (2014).  ''Conformal Prediction for Reliable Machine Learning: Theory, Adaptations, and Applications.  Morgan Kaufmann, Chennai.
to:
* Vineeth N. Balasubramanian, Shen-Shyang Ho, and Vladimir Vovk, editors (2014).  ''Conformal Prediction for Reliable Machine Learning: Theory, Adaptations, and Applications''.  Morgan Kaufmann, Chennai.
Another example is where we want to make the analysis conditional on a type of objects.  For example, if the objects are people, we might want to define $\kappa_i:=\text{male}$ if $x_i$ is male and $\kappa_i:=\text{female}$ if $x_i$ is female.
Changed line 59 from:
In the next theorem we assume that $\kappa_i$ depends only on $z_i$.
to:
In the next theorem we assume that $\kappa_i$ depends only on $z_i$ (without this assumption, we can still guarantee the right coverage probability but the independence of errors may be lost).
(Formally, as defined here, an MCP is a [[confidence predictor]] only when all $\epsilon_k$ coincide.)
Changed line 58 from:
'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to {$K$}'', that is, for all $n$, the conditional probability distribution of {$(p_1,p_2,\ldots)$} given {$\kappa_1,\kappa_2,\ldots$} is uniform on {$[0,1]^{\infty}$}, where the observations $z_1,z_2,\ldots$ are generated from an [[exchangeable probability distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.
to:
'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $(p_1,p_2,\ldots)$ given $\kappa_1,\kappa_2,\ldots$ is uniform on ${[0,1]}^{\infty}$, where the observations $z_1,z_2,\ldots$ are generated from an [[exchangeable probability distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.
Changed line 58 from:
'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $(p_1,p_2,\ldots)$ given $\kappa_1,\kappa_2,\ldots$ is uniform on {$[0,1]^{\infty}$}, where the observations $z_1,z_2,\ldots$ are generated from an [[exchangeable probability distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.
to:
'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to {$K$}'', that is, for all $n$, the conditional probability distribution of {$(p_1,p_2,\ldots)$} given {$\kappa_1,\kappa_2,\ldots$} is uniform on {$[0,1]^{\infty}$}, where the observations $z_1,z_2,\ldots$ are generated from an [[exchangeable probability distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.
Changed line 7 from:
A ''simple Mondrian taxonomy'' is a function $K$ that maps any sequence of observations $z_1,\ldots,z_n$ to a same-length sequence $\kappa_1,\ldots,\kappa_n$ of categories that is equivariant with respect to permutations:
to:
A ''simple Mondrian taxonomy'' is a function $K$ that maps any finite sequence of observations $z_1,\ldots,z_n$ to a same-length sequence $\kappa_1,\ldots,\kappa_n$ of categories that is equivariant with respect to permutations:
Changed line 13 from:
We refer to $\kappa_i$ as the ''category of'' $z_i$.  A ''$K$-conditional conformity measure'' is a measurable function $A$ that maps every data sequence $z_1,\lodts,z_n$ to a same-length sequence $\alpha_1,\ldots,\alpha_n$ and that is equivariant with respect to permutations leaving the categories intact.
to:
We refer to $\kappa_i$ as the ''category of'' $z_i$.  A ''$K$-conditional conformity measure'' is a measurable function $A$ that maps every data sequence $z_1,\ldots,z_n$ to a same-length sequence $\alpha_1,\ldots,\alpha_n$ and that is equivariant with respect to permutations leaving the categories intact.
Changed line 58 from:
'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $(p_1,p_2,\ldots)$ given $\kappa_1,\kappa_2,\ldots$ is uniform on $[0,1]^{\infty}$, where the observations $z_1, z_2, \ldots$ are generated from an [[exchangeable (probability) distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.
to:
'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $(p_1,p_2,\ldots)$ given $\kappa_1,\kappa_2,\ldots$ is uniform on {$[0,1]^{\infty}$}, where the observations $z_1,z_2,\ldots$ are generated from an [[exchangeable probability distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.
May 13, 2017, at 04:30 PM by Vovk - finished rewrite
Changed lines 58-60 from:
'''Theorem''' ''Any smoothed MCP based on a simplified Mondrian taxonomy $K$ is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $p_n$ given $\kappa_1, p_1, \ldots, \kappa_n, z_{n-1}, p_{n-1}, \kappa_n$ is uniform on $[0, 1]$, where {$z_1, z_2, \ldots$} are examples generated from an [[exchangeable (probability) distribution]] on {$Z^{\infty}$} and {$p_1, p_2, \ldots$} are the p-values output by {$f$}.

This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category {$k$} is equal to {$\epsilon_k$} for each {$k$}. Therefore, the frequency of errors made by MCPs on examples of category {$k$} does not exceed {$\epsilon_k$} for each {$k$}.
to:
'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $(p_1,p_2,\ldots)$ given $\kappa_1,\kappa_2,\ldots$ is uniform on $[0,1]^{\infty}$, where the observations $z_1, z_2, \ldots$ are generated from an [[exchangeable (probability) distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.

This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category {$k$} is equal to {$\epsilon_k$} for each {$k$}. Therefore, the frequency of errors made by MCPs on examples of category $k$ does not exceed $\epsilon_k$ for each $k$.

An important feature of MCPs is that the theorem only requires the observations in the same category to be exchangeable: it will remain true if the equality in the definition of exchangeability is only required to hold for permutations $\pi$ leaving the categories intact.  A nice example (due to [[http://daniil.ryabko.net/ | Daniil Ryabko]]) showing the significance of this relaxation is recognizing handwritten text (assuming the text has been divided into separate symbols): the label-conditional conformal predictor will be valid provided that different instances of the same symbol are exchangeable; we do not need to require that the sequence of symbols in the text be exchangeable (this assumption would be patently wrong)
.
May 13, 2017, at 03:10 PM by Vovk - partially fixed
Changed lines 5-6 from:
A ''Mondrian taxonomy'' is a measurable function $\kappa:N \times Z \to K$, where $Z$ is a set of [[observations -> confidence predictor]] and $K$ is a countable set of elements called ''categories'', that satisfies some natural properties given in Vovk et al. (2005), Section 4.5.  The definition involve partitions into rectangles, reminiscent of [[https://en.wikipedia.org/wiki/Piet_Mondrian | Piet Mondrian's]] paintings (see also [[https://scottlocklin.wordpress.com/2016/12/05/predicting-with-confidence-the-best-machine-learning-idea-you-never-heard-of/ | Scott Locklin's blog]]).  A simple example of MCPs is given by ''label-conditional conformal predictors'', which require the label space $\mathbf{Y}$ to be finite.  In this case the set of categories $K=\mathbf{Y}$ coincides with the set of labels.  The ''Mondrian conformal predictor'' determined by a conformity measure $A$ and a set of significance levels $\epsilon_k$, $k \in K$,
is defined by
to:
A ''Mondrian taxonomy'' is a measurable function $\kappa:N \times Z \to K$, where $Z$ is a set of [[observations -> confidence predictor]] and $K$ is a countable set of elements called ''categories'', that satisfies some natural properties given in Vovk et al. (2005), Section 4.5.  The definition involve partitions into rectangles, reminiscent of [[https://en.wikipedia.org/wiki/Piet_Mondrian | Piet Mondrian's]] paintings (see also [[https://scottlocklin.wordpress.com/2016/12/05/predicting-with-confidence-the-best-machine-learning-idea-you-never-heard-of/ | Scott Locklin's blog]]).  However, in this article we will use a simpler definition given in Balasubramanian et al. (2014), Section 2.2.

A ''simple Mondrian taxonomy'' is a function
$K$ that maps any sequence of observations $z_1,\ldots,z_n$ to a same-length sequence $\kappa_1,\ldots,\kappa_n$ of categories that is equivariant with respect to permutations:
(\kappa_1,\ldots,\kappa_n) = K(z_1,\ldots,z_n)
\Longrightarrow
(\kappa_{\pi(1)},\ldots,\kappa_{\pi(n)}) = K(z_{\pi(1)},\ldots,z_{\pi(n)}).
\]
We refer to $\kappa_i$ as the ''category of'' $z_i$.  A ''$K$-conditional conformity measure'' is a measurable function $A$ that maps every data sequence $z_1,\lodts,z_n$ to a same-length sequence $\alpha_1,\ldots,\alpha_n$ and that is equivariant with respect to permutations leaving the categories intact.

The ''Mondrian conformal predictor'' determined by a $K$-conditional conformity measure $A$ and a set of significance levels $\epsilon_k$, $k \in K$, is defined by
$Changed line 29 from: \frac{|\{i = 1, \ldots, n+1: y_i = y_{n+1} \& \alpha^y_i \le \alpha_{n+1}\}|}{|\{i = 1, \ldots, n+1: y_i = y_{n+1}\}|}, to: \frac{|\{i=1,\ldots,n+1 \mid \kappa_i^y = y_{n+1}^y \;\&\; \alpha^y_i \le \alpha^y_{n+1}\}|}{|\{i=1,\ldots,n+1: \kappa^y_i = \kappa^y_{n+1}\}|}, Changed line 31 from: where {\kappa_i := \kappa(i, z_i), z_i := (x_i, y_i)} and to: where (\kappa^y_1,\ldots,\kappa^y_{n+1}) := K(z_1,\ldots,z_{n+1}) and Changed lines 33-35 from: \alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i) to: (\alpha^y_1,\ldots,\alpha^y_{n+1}) := A( z_1,\ldots,z_n,(x_{n+1},y)). Changed lines 37-44 from: for {i = 1, \ldots, n} such that {\kappa_i = \kappa_n} ({\{z_j: \ldots \}} denotes a multiset). The standard assumption for MCPs is [[randomness assumption]] (also called the i.i.d. assumption). Important special cases of MCPs: *[[conformal predictor]]s *[[inductive conformal predictor]]s to: The standard assumption for MCPs is the [[randomness assumption]] (also called the IID assumption ). A simple example is given by ''label-conditional conformal predictors'', which require the label space \mathbf{Y} to be finite. In this case the set of categories K=\mathbf{Y} coincides with the set of labels, and the category of an observation (x_i,y_i) is simply its label, \kappa_i:=y_i. Changed lines 46-58 from: ''Smoothed MCPs'' are defined analogously to [[smoothed conformal predictors -> conformal predictor]]: %center% { \Gamma ^{(\epsilon_k: k \in K)}(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)} is set to the set of all labels {y \in Y} such that %center% {p_n > \epsilon{\kappa(n, (x_n, y))} }, where {p_n = (|\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i > \alpha_n \}| + \eta_n |\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i = \alpha_n \}| )/|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|}, the nonconformity scores {\alpha_i} are defined as before and {\eta_n\in[0,1]} is a random number . '''Theorem''' ''Any smoothed MCP based on a Mondrian taxonomy {\kappa} is category-wise exact w.r. to {\kappa}'', that is, for all {n}, the conditional probability distribution of {p_n} given {\kappa(1, z_1), p_1, \ldots, \kappa(n-1, z_{n-1}), p_{n-1}, \kappa(n, z_n)} is uniform on [0, 1], where {z_1, z_2, \ldots} are examples generated from an [[exchangeable (probability) distribution]] on {Z^{\infty}} and {p_1, p_2, \ldots} are the p-values output by {f}. to: ''Smoothed MCPs'' are defined analogously to [[smoothed conformal predictors -> conformal predictor]]: the definition of p^y is now modified to \[ p^ y := \frac {|\{i=1,\ldots,n+1 \mid \kappa_i^y = y_{n+1}^y \;\&\; \alpha^y_i < \alpha^y_{n+1}\}| + \eta_y |\{i=1,\ldots,n+1 \mid \kappa_i^y = y_{n+1}^y \;\&\; \alpha^y_i = \alpha^y_{n+1}\}|} {|\{i=1,\ldots,n+1: \kappa^y_i = \kappa^y_{n+1}\}|},$
and the rest of the definition is the same.  Here $\eta_y\sim U[0,1]$ are random numbers (chosen independently at different trials and independently of the observations).

In the next theorem we assume that $\kappa_i$ depends only on $z_i$.

'''Theorem''' ''Any smoothed MCP based on a simplified Mondrian taxonomy
$K$ is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $p_n$ given $\kappa_1, p_1, \ldots, \kappa_n, z_{n-1}, p_{n-1}, \kappa_n$ is uniform on $[0, 1]$, where {$z_1, z_2, \ldots$} are examples generated from an [[exchangeable (probability) distribution]] on {$Z^{\infty}$} and {$p_1, p_2, \ldots$} are the p-values output by {$f$}.
* Vineeth N. Balasubramanian, Shen-Shyang Ho, and Vladimir Vovk, editors (2014).  ''Conformal Prediction for Reliable Machine Learning: Theory, Adaptations, and Applications.  Morgan Kaufmann, Chennai.
Changed lines 1-2 from:
''Mondrian conformal predictors (MCPs)'' represent a wide class of [[confidence predictor]]s which is the generalisation of [[conformal predictor]]s and [[inductive conformal predictor]]s with a new property of validity, validity within categories (see section '''Validity'''); this kind of validity is stronger than usual [[conservative validity]] for [[conformal predictor]]s.
to:
''Mondrian conformal predictors (MCPs)'' represent a wide class of [[confidence predictor]]s which is the generalisation of [[conformal predictor]]s and [[inductive conformal predictor]]s with a new property of validity, validity within categories; this kind of validity is stronger (for large data sets) than the usual [[conservative validity]] for [[conformal predictor]]s.
Changed lines 5-16 from:
A ''Mondrian taxonomy'' is a measurable function $\kappa:N \times Z \to K$, where {$Z$} is a set of [[observations -> confidence predictor]], $K$ is the measurable space (at most countable with the discrete {$\sigma$}-algebra) of elements called ''categories'', with the following property:
the elements {$\kappa^{-1}(k)$} of each category {$k \in K$} form a rectangle {$A \times B$}, for some {$A \subseteq N$} and {$B \subseteq Z$}
.
In words, a Mondrian taxonomy defines a division
of the Cartesian product {$N \times Z$} into categories.

''Mondrian nonconformity measure'' based on a Mondrian taxonomy {$\kappa$} is a family of measurable functions {$\{A_n:n\in N\}$} of the type

%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \to \bar{\mathbb{R}}$},

where {$(Z ^ {(*)})^K$} is  a set of all functions mapping {$K$} to the set of all bags (multisets) of elements of {$Z$}.

The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma:\mathbf{Z}^*\times X\times (0,1)^K\to 2^Y$}
({$2^Y$} is a set of all subsets of {$Y$}) such that the [[prediction set -> confidence predictor]] {$\Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$} is defined as the set of all labels {$y\in Y$} such that
to:
A ''Mondrian taxonomy'' is a measurable function $\kappa:N \times Z \to K$, where $Z$ is a set of [[observations -> confidence predictor]] and $K$ is a countable set of elements called ''categories'', that satisfies some natural properties given in Vovk et al. (2005), Section 4.5.  The definition involve partitions into rectangles, reminiscent of [[https://en.wikipedia.org/wiki/Piet_Mondrian | Piet Mondrian's]] paintings (see also [[https://scottlocklin.wordpress.com/2016/12/05/predicting-with-confidence-the-best-machine-learning-idea-you-never-heard-of/ | Scott Locklin's blog]]).  A simple example of MCPs is given by ''label-conditional conformal predictors'', which require the label space $\mathbf{Y}$ to be finite.  In this case the set of categories $K=\mathbf{Y}$ coincides with the set of labels.  The ''Mondrian conformal predictor'' determined by a conformity measure $A$ and a set of significance levels $\epsilon_k$, $k \in K$,
is defined by
Changed lines 8-10 from:
p_n > \epsilon_{\kappa(n, (x_n, y))},
p
_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n  \& \alpha_i \ge \alpha_n \}|  /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|
to:
\Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n}, y_{n}, x_{n+1})
:=

\left\{
y \in \mathbf{Y}
\mid
p^y > \epsilon_y

\right\},
Changed line 16 from:
where {$\kappa_i := \kappa(i, z_i), z_i := (x_i, y_i)$} and
to:
where
Changed lines 18-20 from:
\alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)
to:
p^y

:=
\frac{|\{i = 1, \ldots, n+1: y_i = y_{n+1} \& \alpha^y_i \le \alpha_{n+1}\}|}{|\{i = 1, \ldots, n+1: y_i = y_{n+1}\}|},
where {$\kappa_i := \kappa(i, z_i), z_i := (x_i, y_i)$} and
$\alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)$
Changed lines 51-54 from:
This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category {$k$} is equal to {$\epsilon_k$} for each {$k$}. Therefore, the frequency of errors made by MCPs on examples of category {$k$} does not exceed {$\epsilon_k$} for each {$k$}.
to:
This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category {$k$} is equal to {$\epsilon_k$} for each {$k$}. Therefore, the frequency of errors made by MCPs on examples of category {$k$} does not exceed {$\epsilon_k$} for each {$k$}.

'''Bibliography'''
* Vladimir Vovk, Alexander Gammerman, and Glenn Shafer (2005). [[http://www.alrw.net | Algorithmic learning in a random world]]. Springer, New York
.
Changed line 15 from:
The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma: Z^*\times X\times (0,1)^K\to 2^Y$}
to:
The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma:\mathbf{Z}^*\times X\times (0,1)^K\to 2^Y$}
Changed lines 18-20 from:
p_n > \epsilon_{\kappa(n, (x_n, y))}$}{$p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n  \& \alpha_i \ge \alpha_n \}|  /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|
to:
p_n > \epsilon_{\kappa(n, (x_n, y))},

p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n  \& \alpha_i \ge \alpha_n \}|  /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|
May 13, 2017, at 01:27 PM by Vovk - slightly updated
Changed lines 1-3 from:
''Mondrian conformal predictors (MCPs)'' represent a wide class of [[confidence predictor]]s which is the generalisation of [[transductive conformal predictor]]s and [[inductive conformal predictor]]s with a new main property - validity within categories (see section '''Validity'''), a kind of validity that is stronger than usual [[conservative validity]] for TCMs.

to:
''Mondrian conformal predictors (MCPs)'' represent a wide class of [[confidence predictor]]s which is the generalisation of [[conformal predictor]]s and [[inductive conformal predictor]]s with a new property of validity, validity within categories (see section '''Validity'''); this kind of validity is stronger than usual [[conservative validity]] for [[conformal predictor]]s.
Changed line 5 from:
''Mondrian taxonomy'' is a measurable function {$\kappa:N \times Z \rightarrow K$}, where {$Z$} is a set of [[examples -> confidence predictor]], {$K$} is the measurable space (at most countable with the discrete {$\sigma$}-algebra) of elements called ''categories'', with the following property:
to:
A ''Mondrian taxonomy'' is a measurable function $\kappa:N \times Z \to K$, where {$Z$} is a set of [[observations -> confidence predictor]], $K$ is the measurable space (at most countable with the discrete {$\sigma$}-algebra) of elements called ''categories'', with the following property:
Changed lines 9-12 from:
''Mondrian nonconformity measure'' based on a mondrian taxonomy {$\kappa$} is a family of measurable functions {$\{A_n:n\in N\}$} of the type

%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \rightarrow \bar{\mathbb{R}}$},
to:
''Mondrian nonconformity measure'' based on a Mondrian taxonomy {$\kappa$} is a family of measurable functions {$\{A_n:n\in N\}$} of the type

%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \to \bar{\mathbb{R}}$},
Changed lines 15-23 from:
The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma: Z^*\times X\times (0,1)^K\rightarrow 2^Y$}
({$2^Y$} is a set of all subsets of {$Y$}) such that the [[prediction set -> confidence predictor]] {$\Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$} is defined as the set of all lables {$y\in Y$} such that

%center% {$p_n > \epsilon_{\kappa(n, (x_n, y))}$},  {$p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i \ge \alpha_n \}| /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|$}

where {$\kappa_i := \kappa(i, z_i), z_i := (x_i, y_i)$} and

%center% {$\alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)$}
to:
The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma: Z^*\times X\times (0,1)^K\to 2^Y$}
({$2^Y$} is a set of all subsets of {$Y$}) such that the [[prediction set -> confidence predictor]] {$\Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$} is defined as the set of all labels {$y\in Y$} such that
$p_n > \epsilon_{\kappa(n, (x_n, y))}}, {p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i \ge \alpha_n \}| /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|$
where {
$\kappa_i := \kappa(i, z_i), z_i := (x_i, y_i)$} and
$\alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)$
Deleted line 25:
Changed line 29 from:
*[[transductive conformal predictor]]s
to:
*[[conformal predictor]]s
Deleted line 31:
Changed lines 36-37 from:
''Smoothed MCPs'' are defined analogously to [[smoothed conformal predictors -> transductive conformal predictor]]:
to:
''Smoothed MCPs'' are defined analogously to [[smoothed conformal predictors -> conformal predictor]]:
Changed lines 47-49 from:

'''Theorem''' ''Any smoothed MCP based on a Mondrian taxonomy {$\kappa$} is category-wise exact w.r. to {$\kappa$}'', that is,
for all {$n$}, the conditional probability distribution of {$p_n$} given {$\kappa(1, z_1), p_1, \ldots, \kappa(n-1, z_{n-1}), p_{n-1}, \kappa(n, z_n)$} is uniform on [0, 1], where {$z_1, z_2, \ldots$} are examples generated from an [[exchangeable (probability) distribution]] on {$Z^{\infty}$} and {$p_1, p_2, \ldots$} are the p-values output by {$f$}.
to:
'''Theorem''' ''Any smoothed MCP based on a Mondrian taxonomy {$\kappa$} is category-wise exact w.r. to {$\kappa$}'', that is, for all {$n$}, the conditional probability distribution of {$p_n$} given {$\kappa(1, z_1), p_1, \ldots, \kappa(n-1, z_{n-1}), p_{n-1}, \kappa(n, z_n)$} is uniform on [0, 1], where {$z_1, z_2, \ldots$} are examples generated from an [[exchangeable (probability) distribution]] on {$Z^{\infty}$} and {$p_1, p_2, \ldots$} are the p-values output by {$f$}.
Changed line 12 from:
%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \rightarrow \bar{R}$},
to:
%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \rightarrow \bar{\mathbb{R}}$},
Deleted line 49:
As in the case of [[TCPs->transductive conformal predictor]], we can formulate analogous statements for ICPs using the same terminology.
Changed line 1 from:
''Mondrian conformal predictors (MCPs)'' are a wide class of [[confidence predictor]]s which is the generalisation of [[transductive conformal predictor]]s and [[inductive conformal predictor]]s. Their main property is validity within categories (see section '''Validity'''), a kind of validity that is stronger than usual [[conservative validity]] for TCMs.
to:
''Mondrian conformal predictors (MCPs)'' represent a wide class of [[confidence predictor]]s which is the generalisation of [[transductive conformal predictor]]s and [[inductive conformal predictor]]s with a new main property - validity within categories (see section '''Validity'''), a kind of validity that is stronger than usual [[conservative validity]] for TCMs.
''Mondrian conformal predictors (MCPs)'' are a wide class of [[confidence predictor]]s which is the generalisation of [[transductive conformal predictor]]s and [[inductive conformal predictor]]s. Their main property is validity within categories (see section '''Validity'''), a kind of validity that is stronger than usual [[conservative validity]] for TCMs.

!!Definition

''Mondrian taxonomy'' is a measurable function {$\kappa:N \times Z \rightarrow K$}, where {$Z$} is a set of [[examples -> confidence predictor]], {$K$} is the measurable space (at most countable with the discrete {$\sigma$}-algebra) of elements called ''categories'', with the following property:
the elements {$\kappa^{-1}(k)$} of each category {$k \in K$} form a rectangle {$A \times B$}, for some {$A \subseteq N$} and {$B \subseteq Z$}.
In words, a Mondrian taxonomy defines a division of the Cartesian product {$N \times Z$} into categories.

''Mondrian nonconformity measure'' based on a mondrian taxonomy {$\kappa$} is a family of measurable functions {$\{A_n:n\in N\}$} of the type

%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \rightarrow \bar{R}$},

where {$(Z ^ {(*)})^K$} is  a set of all functions mapping {$K$} to the set of all bags (multisets) of elements of {$Z$}.

The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma: Z^*\times X\times (0,1)^K\rightarrow 2^Y$}
({$2^Y$} is a set of all subsets of {$Y$}) such that the [[prediction set -> confidence predictor]] {$\Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$} is defined as the set of all lables {$y\in Y$} such that

%center% {$p_n > \epsilon_{\kappa(n, (x_n, y))}$},  {$p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i \ge \alpha_n \}| /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|$}

where {$\kappa_i := \kappa(i, z_i), z_i := (x_i, y_i)$} and

%center% {$\alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)$}

for {$i = 1, \ldots, n$} such that {$\kappa_i = \kappa_n$} ({$\{z_j: \ldots \}$} denotes a multiset).

The standard assumption for MCPs is [[randomness assumption]] (also called the i.i.d. assumption).

Important special cases of MCPs:
*[[transductive conformal predictor]]s
*[[inductive conformal predictor]]s

!! Validity

All the statements in the section are given under the [[randomness assumption]].

''Smoothed MCPs'' are defined analogously to [[smoothed conformal predictors -> transductive conformal predictor]]:

%center% {$\Gamma^{(\epsilon_k: k \in K)}(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$}

is set to the set of all labels {$y \in Y$} such that

%center% {$p_n > \epsilon{\kappa(n, (x_n, y))}$}, where
{$p_n = (|\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i > \alpha_n \}| + \eta_n |\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i = \alpha_n \}| )/|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|$},

the nonconformity scores {$\alpha_i$} are defined as before and {$\eta_n\in[0,1]$} is a random number.

As in the case of [[TCPs->transductive conformal predictor]], we can formulate analogous statements for ICPs using the same terminology.

'''Theorem''' ''Any smoothed MCP based on a Mondrian taxonomy {$\kappa$} is category-wise exact w.r. to {$\kappa$}'', that is,
for all {$n$}, the conditional probability distribution of {$p_n$} given {$\kappa(1, z_1), p_1, \ldots, \kappa(n-1, z_{n-1}), p_{n-1}, \kappa(n, z_n)$} is uniform on [0, 1], where {$z_1, z_2, \ldots$} are examples generated from an [[exchangeable (probability) distribution]] on {$Z^{\infty}$} and {$p_1, p_2, \ldots$} are the p-values output by {$f$}.

This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category {$k$} is equal to {$\epsilon_k$} for each {$k$}. Therefore, the frequency of errors made by MCPs on examples of category {$k$} does not exceed {$\epsilon_k$} for each {$k$}.