# Mondrian Conformal Predictor

## Main.MondrianConformalPredictor History

Hide minor edits - Show changes to output

Changed line 68 from:

* Vineeth N. Balasubramanian, Shen-Shyang Ho, and Vladimir Vovk, editors (2014). ''Conformal Prediction for Reliable Machine Learning: Theory, Adaptations, and Applications. Morgan Kaufmann, Chennai.

to:

* Vineeth N. Balasubramanian, Shen-Shyang Ho, and Vladimir Vovk, editors (2014). ''Conformal Prediction for Reliable Machine Learning: Theory, Adaptations, and Applications''. Morgan Kaufmann, Chennai.

Added lines 43-44:

Another example is where we want to make the analysis conditional on a type of objects. For example, if the objects are people, we might want to define $\kappa_i:=\text{male}$ if $x_i$ is male and $\kappa_i:=\text{female}$ if $x_i$ is female.

Changed line 59 from:

In the next theorem we assume that $\kappa_i$ depends only on $z_i$.

to:

In the next theorem we assume that $\kappa_i$ depends only on $z_i$ (without this assumption, we can still guarantee the right coverage probability but the independence of errors may be lost).

Added line 37:

(Formally, as defined here, an MCP is a [[confidence predictor]] only when all $\epsilon_k$ coincide.)

Changed line 58 from:

'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to ~~{~~$K$~~}~~'', that is, for all $n$, the conditional probability distribution of ~~{~~$(p_1,p_2,\ldots)$~~}~~ given ~~{~~$\kappa_1,\kappa_2,\ldots$~~}~~ is uniform on ~~{~~$[0,1]^{\infty}$~~}~~, where the observations $z_1,z_2,\ldots$ are generated from an [[exchangeable probability distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.

to:

'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $(p_1,p_2,\ldots)$ given $\kappa_1,\kappa_2,\ldots$ is uniform on ${[0,1]}^{\infty}$, where the observations $z_1,z_2,\ldots$ are generated from an [[exchangeable probability distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.

Changed line 58 from:

'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $(p_1,p_2,\ldots)$ given $\kappa_1,\kappa_2,\ldots$ is uniform on {$[0,1]^{\infty}$}, where the observations $z_1,z_2,\ldots$ are generated from an [[exchangeable probability distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.

to:

'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to {$K$}'', that is, for all $n$, the conditional probability distribution of {$(p_1,p_2,\ldots)$} given {$\kappa_1,\kappa_2,\ldots$} is uniform on {$[0,1]^{\infty}$}, where the observations $z_1,z_2,\ldots$ are generated from an [[exchangeable probability distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.

Changed line 7 from:

A ''simple Mondrian taxonomy'' is a function $K$ that maps any sequence of observations $z_1,\ldots,z_n$ to a same-length sequence $\kappa_1,\ldots,\kappa_n$ of categories that is equivariant with respect to permutations:

to:

A ''simple Mondrian taxonomy'' is a function $K$ that maps any finite sequence of observations $z_1,\ldots,z_n$ to a same-length sequence $\kappa_1,\ldots,\kappa_n$ of categories that is equivariant with respect to permutations:

Changed line 13 from:

We refer to $\kappa_i$ as the ''category of'' $z_i$. A ''$K$-conditional conformity measure'' is a measurable function $A$ that maps every data sequence $z_1,\~~lodts~~,z_n$ to a same-length sequence $\alpha_1,\ldots,\alpha_n$ and that is equivariant with respect to permutations leaving the categories intact.

to:

We refer to $\kappa_i$ as the ''category of'' $z_i$. A ''$K$-conditional conformity measure'' is a measurable function $A$ that maps every data sequence $z_1,\ldots,z_n$ to a same-length sequence $\alpha_1,\ldots,\alpha_n$ and that is equivariant with respect to permutations leaving the categories intact.

Changed line 58 from:

'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $(p_1,p_2,\ldots)$ given $\kappa_1,\kappa_2,\ldots$ is uniform on $[0,1]^{\infty}$, where the observations $z_1,~~ ~~z_2,~~ ~~\ldots$ are generated from an [[exchangeable ~~(~~probability~~)~~ distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.

to:

'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $(p_1,p_2,\ldots)$ given $\kappa_1,\kappa_2,\ldots$ is uniform on {$[0,1]^{\infty}$}, where the observations $z_1,z_2,\ldots$ are generated from an [[exchangeable probability distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.

May 13, 2017, at 04:30 PM
by - finished rewrite

Changed lines 58-60 from:

'''Theorem''' ''Any smoothed MCP ~~based on a simplified Mondrian taxonomy ~~$K$ is ~~category-wise exact w.r. to ~~$~~K~~$~~''~~, ~~that is, for all ~~$~~n$~~,~~ the conditional probability distribution of ~~$~~p~~_~~n$ given $\kappa_~~1,~~ p~~_~~1~~,~~ ~~\ldots~~, \kappa_n, z_{n-1}, p_{n-1}, \kappa_n~~$~~ is uniform on $~~[0,~~ ~~1]$, where ~~{~~$z_1, z_2, \ldots$~~}~~ are ~~examples ~~generated from an [[exchangeable (probability) distribution]] on ~~{~~$Z^{\infty}$~~}~~ and ~~{~~$p_~~1, ~~p_~~2, \ldots$} ~~are the p-values ~~output by {$f$}~~.

This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category {$k$} is equal to {$\epsilon_k$} for each {$k$}. Therefore, the frequency of errors made by MCPs on examples of category~~{~~$k$~~}~~ does not exceed ~~{~~$\epsilon_k$~~}~~ for each ~~{~~$k$~~}~~.

This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category {$k$} is equal to {$\epsilon_k$} for each {$k$}. Therefore, the frequency of errors made by MCPs on examples of category

to:

'''Theorem''' ''Any smoothed MCP is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $(p_1,p_2,\ldots)$ given $\kappa_1,\kappa_2,\ldots$ is uniform on $[0,1]^{\infty}$, where the observations $z_1, z_2, \ldots$ are generated from an [[exchangeable (probability) distribution]] on $Z^{\infty}$ and $p_i:=p_i^{y_i}$ are the p-values for the true labels.

This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category {$k$} is equal to {$\epsilon_k$} for each {$k$}. Therefore, the frequency of errors made by MCPs on examples of category $k$ does not exceed $\epsilon_k$ for each $k$.

An important feature of MCPs is that the theorem only requires the observations in the same category to be exchangeable: it will remain true if the equality in the definition of exchangeability is only required to hold for permutations $\pi$ leaving the categories intact. A nice example (due to [[http://daniil.ryabko.net/ | Daniil Ryabko]]) showing the significance of this relaxation is recognizing handwritten text (assuming the text has been divided into separate symbols): the label-conditional conformal predictor will be valid provided that different instances of the same symbol are exchangeable; we do not need to require that the sequence of symbols in the text be exchangeable (this assumption would be patently wrong).

This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category {$k$} is equal to {$\epsilon_k$} for each {$k$}. Therefore, the frequency of errors made by MCPs on examples of category $k$ does not exceed $\epsilon_k$ for each $k$.

An important feature of MCPs is that the theorem only requires the observations in the same category to be exchangeable: it will remain true if the equality in the definition of exchangeability is only required to hold for permutations $\pi$ leaving the categories intact. A nice example (due to [[http://daniil.ryabko.net/ | Daniil Ryabko]]) showing the significance of this relaxation is recognizing handwritten text (assuming the text has been divided into separate symbols): the label-conditional conformal predictor will be valid provided that different instances of the same symbol are exchangeable; we do not need to require that the sequence of symbols in the text be exchangeable (this assumption would be patently wrong).

May 13, 2017, at 03:10 PM
by - partially fixed

Changed lines 5-6 from:

A ''Mondrian taxonomy'' is a measurable function $\kappa:N \times Z \to K$, where $Z$ is a set of [[observations -> confidence predictor]] and $K$ is a countable set of elements called ''categories'', that satisfies some natural properties given in Vovk et al. (2005), Section 4.5. The definition involve partitions into rectangles, reminiscent of [[https://en.wikipedia.org/wiki/Piet_Mondrian | Piet Mondrian's]] paintings (see also [[https://scottlocklin.wordpress.com/2016/12/05/predicting-with-confidence-the-best-machine-learning-idea-you-never-heard-of/ | Scott Locklin's blog]]). ~~A simple example of MCPs is given by ''label-conditional conformal predictors'', which require the label space $\mathbf{Y}$ to be finite~~. ~~In this case the set of categories ~~$K~~=\mathbf{Y}~~$ ~~coincides with the set~~ of ~~labels.~~ ~~The ''Mondrian conformal predictor'' determined by a conformity measure ~~$~~A~~$ ~~and a set of significance levels $\epsilon_k$, $k \in K$, ~~

is defined by

is defined by

to:

A ''Mondrian taxonomy'' is a measurable function $\kappa:N \times Z \to K$, where $Z$ is a set of [[observations -> confidence predictor]] and $K$ is a countable set of elements called ''categories'', that satisfies some natural properties given in Vovk et al. (2005), Section 4.5. The definition involve partitions into rectangles, reminiscent of [[https://en.wikipedia.org/wiki/Piet_Mondrian | Piet Mondrian's]] paintings (see also [[https://scottlocklin.wordpress.com/2016/12/05/predicting-with-confidence-the-best-machine-learning-idea-you-never-heard-of/ | Scott Locklin's blog]]). However, in this article we will use a simpler definition given in Balasubramanian et al. (2014), Section 2.2.

A ''simple Mondrian taxonomy'' is a function $K$ that maps any sequence of observations $z_1,\ldots,z_n$ to a same-length sequence $\kappa_1,\ldots,\kappa_n$ of categories that is equivariant with respect to permutations:

A ''simple Mondrian taxonomy'' is a function $K$ that maps any sequence of observations $z_1,\ldots,z_n$ to a same-length sequence $\kappa_1,\ldots,\kappa_n$ of categories that is equivariant with respect to permutations:

Added lines 9-16:

(\kappa_1,\ldots,\kappa_n) = K(z_1,\ldots,z_n)

\Longrightarrow

(\kappa_{\pi(1)},\ldots,\kappa_{\pi(n)}) = K(z_{\pi(1)},\ldots,z_{\pi(n)}).

\]

We refer to $\kappa_i$ as the ''category of'' $z_i$. A ''$K$-conditional conformity measure'' is a measurable function $A$ that maps every data sequence $z_1,\lodts,z_n$ to a same-length sequence $\alpha_1,\ldots,\alpha_n$ and that is equivariant with respect to permutations leaving the categories intact.

The ''Mondrian conformal predictor'' determined by a $K$-conditional conformity measure $A$ and a set of significance levels $\epsilon_k$, $k \in K$, is defined by

\[

\Longrightarrow

(\kappa_{\pi(1)},\ldots,\kappa_{\pi(n)}) = K(z_{\pi(1)},\ldots,z_{\pi(n)}).

\]

We refer to $\kappa_i$ as the ''category of'' $z_i$. A ''$K$-conditional conformity measure'' is a measurable function $A$ that maps every data sequence $z_1,\lodts,z_n$ to a same-length sequence $\alpha_1,\ldots,\alpha_n$ and that is equivariant with respect to permutations leaving the categories intact.

The ''Mondrian conformal predictor'' determined by a $K$-conditional conformity measure $A$ and a set of significance levels $\epsilon_k$, $k \in K$, is defined by

\[

Changed line 29 from:

\frac{|\{i~~ ~~=~~ ~~1,~~ ~~\ldots,~~ ~~n+1~~: y_i = y~~_~~{n+1}~~ ~~\& \alpha^~~y_~~i \le~~ \~~alpha_{n+1}~~\~~}|}{|~~\~~{i = 1, ~~\~~ldots, n+1:~~ y_i~~ ~~= y_{n+1}\}|},

to:

\frac{|\{i=1,\ldots,n+1 \mid \kappa_i^y = y_{n+1}^y \;\&\; \alpha^y_i \le \alpha^y_{n+1}\}|}{|\{i=1,\ldots,n+1: \kappa^y_i = \kappa^y_{n+1}\}|},

Changed line 31 from:

where ~~{~~$\kappa_~~i := ~~\kappa~~(i, z~~_~~i~~)~~, z_i ~~:= (~~x~~_~~i~~,~~ y~~_~~i~~)$~~}~~ and

to:

where $(\kappa^y_1,\ldots,\kappa^y_{n+1}) := K(z_1,\ldots,z_{n+1})$ and

Changed lines 33-35 from:

\alpha_~~i := A_n(\kappa_~~1,~~ ~~\ldots,~~ ~~\~~kappa~~_{n~~-~~1}~~, (k \mapsto \{~~z_~~j: j ~~\~~in \{1~~,~~ \ldots~~,~~ i-~~1,~~ i+1, \ldots, n\} \& \kappa_j = k\}~~)~~, \kappa_n, z_i~~)

to:

(\alpha^y_1,\ldots,\alpha^y_{n+1})

:=

A(z_1,\ldots,z_n,(x_{n+1},y)).

:=

A(z_1,\ldots,z_n,(x_{n+1},y)).

Changed lines 37-44 from:

Important special cases of MCPs:

*[[conformal predictor]]s

*[[inductive conformal predictor]]s

to:

The standard assumption for MCPs is the [[randomness assumption]] (also called the IID assumption).

A simple example is given by ''label-conditional conformal predictors'', which require the label space $\mathbf{Y}$ to be finite. In this case the set of categories $K=\mathbf{Y}$ coincides with the set of labels, and the category of an observation $(x_i,y_i)$ is simply its label, $\kappa_i:=y_i$.

Changed lines 46-58 from:

''Smoothed MCPs'' are defined analogously to [[smoothed conformal predictors -> conformal predictor]]:

%center% {$ \Gamma^~~{(\epsilon_k: k \in K)}(x_1, y_1, ~~\~~ldots, x_{n-1},~~ y~~_{n-1}, x_n)$}~~

is set to the set of all labels{~~$y ~~\~~in~~ ~~Y$} such that~~

%center%{~~$p_~~n ~~>~~ \~~epsilon{~~\~~kappa(n, (x~~_~~n,~~ y~~))~~}~~ $~~}~~, where~~

~~{$p_n = (|~~\~~{i~~ =~~ ~~1,~~ ~~\ldots,~~ ~~n~~:~~ \kappa_i = ~~\kappa~~_n \&~~ ~~\alpha_i ~~>~~ \alpha_n~~ \~~}| ~~+~~ \~~eta_n |\~~{i~~ ~~=~~ ~~1,~~ ~~\ldots,~~ ~~n: \kappa_i = \kappa_n~~ ~~\~~& ~~\~~alpha_i = \alpha_n \}| )/|\{i = 1, \ldots, n: \kappa_i = ~~\~~kappa~~_~~n~~\~~}|$},~~

the nonconformity scores {$\alpha_i$} are defined as before and {$\eta_n\in[0,1]$} is a random number.

~~'''Theorem''' ''Any smoothed MCP based on a Mondrian taxonomy {~~$\kappa$~~} is category-wise exact w~~.~~r. to {$\kappa$}'', that is, for all {~~$~~n~~$~~}, the conditional probability distribution of {~~$~~p_n~~$~~} given {$\kappa(1~~, ~~z_1), p_1, \ldots~~, ~~\kappa(n-1, z~~_~~{~~n~~-1}), p~~_~~{n-~~1~~}~~, \~~kappa(n~~, ~~z~~_n~~)$} is uniform on [0,~~ 1], where {$z_1, z_2, \ldots$} are examples generated from an [[exchangeable (probability) distribution]] on {$Z^{\infty}$} and {$p_1, p_2, \ldots$} are the p-values output by {$f$}.

%center% {$ \Gamma

is set to the set of all labels

%center%

the nonconformity scores {$\alpha_i$} are defined as before and {$\eta_n\in[0,1]$} is a random number

to:

''Smoothed MCPs'' are defined analogously to [[smoothed conformal predictors -> conformal predictor]]: the definition of $p^y$ is now modified to

\[

p^y

:=

\frac{|\{i=1,\ldots,n+1 \mid \kappa_i^y = y_{n+1}^y \;\&\; \alpha^y_i < \alpha^y_{n+1}\}|

+ \eta_y |\{i=1,\ldots,n+1 \mid \kappa_i^y = y_{n+1}^y \;\&\; \alpha^y_i = \alpha^y_{n+1}\}|}

{|\{i=1,\ldots,n+1: \kappa^y_i = \kappa^y_{n+1}\}|},

\]

and the rest of the definition is the same. Here $\eta_y\sim U[0,1]$ are random numbers (chosen independently at different trials and independently of the observations).

In the next theorem we assume that $\kappa_i$ depends only on $z_i$.

'''Theorem''' ''Any smoothed MCP based on a simplified Mondrian taxonomy $K$ is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $p_n$ given $\kappa_1, p_1, \ldots, \kappa_n, z_{n-1}, p_{n-1}, \kappa_n$ is uniform on $[0, 1]$, where {$z_1, z_2, \ldots$} are examples generated from an [[exchangeable (probability) distribution]] on {$Z^{\infty}$} and {$p_1, p_2, \ldots$} are the p-values output by {$f$}.

\[

p^y

:=

\frac{|\{i=1,\ldots,n+1 \mid \kappa_i^y = y_{n+1}^y \;\&\; \alpha^y_i < \alpha^y_{n+1}\}|

+ \eta_y |\{i=1,\ldots,n+1 \mid \kappa_i^y = y_{n+1}^y \;\&\; \alpha^y_i = \alpha^y_{n+1}\}|}

{|\{i=1,\ldots,n+1: \kappa^y_i = \kappa^y_{n+1}\}|},

\]

and the rest of the definition is the same. Here $\eta_y\sim U[0,1]$ are random numbers (chosen independently at different trials and independently of the observations).

In the next theorem we assume that $\kappa_i$ depends only on $z_i$.

'''Theorem''' ''Any smoothed MCP based on a simplified Mondrian taxonomy $K$ is category-wise exact w.r. to $K$'', that is, for all $n$, the conditional probability distribution of $p_n$ given $\kappa_1, p_1, \ldots, \kappa_n, z_{n-1}, p_{n-1}, \kappa_n$ is uniform on $[0, 1]$, where {$z_1, z_2, \ldots$} are examples generated from an [[exchangeable (probability) distribution]] on {$Z^{\infty}$} and {$p_1, p_2, \ldots$} are the p-values output by {$f$}.

Added line 63:

* Vineeth N. Balasubramanian, Shen-Shyang Ho, and Vladimir Vovk, editors (2014). ''Conformal Prediction for Reliable Machine Learning: Theory, Adaptations, and Applications. Morgan Kaufmann, Chennai.

Changed lines 1-2 from:

''Mondrian conformal predictors (MCPs)'' represent a wide class of [[confidence predictor]]s which is the generalisation of [[conformal predictor]]s and [[inductive conformal predictor]]s with a new property of validity, validity within categories ~~(see section '''Validity'''); this kind of validity is stronger~~ than usual [[conservative validity]] for [[conformal predictor]]s.

to:

''Mondrian conformal predictors (MCPs)'' represent a wide class of [[confidence predictor]]s which is the generalisation of [[conformal predictor]]s and [[inductive conformal predictor]]s with a new property of validity, validity within categories; this kind of validity is stronger (for large data sets) than the usual [[conservative validity]] for [[conformal predictor]]s.

Changed lines 5-16 from:

A ''Mondrian taxonomy'' is a measurable function $\kappa:N \times Z \to K$, where ~~{~~$Z$~~}~~ is a set of [[observations -> confidence predictor]]~~, ~~$K$ is ~~the measurable space (at most countable with the discrete {$\sigma$}-algebra) of elements called ''categories'', with the following property:~~

the elements {$\kappa^{-1}(k)$} of each category {$k \in K$} form a rectangle {$A \times B$}, for some {$A \subseteq N$} and {$B \subseteq Z$}.

In words, a Mondrian taxonomy defines a division of ~~the Cartesian product {$N \times Z$} into categories.~~

''~~Mondrian nonconformity measure'' based on a Mondrian taxonomy {~~$\~~kappa~~$~~} is a family of measurable functions {~~$\{~~A_n:n\in N\~~}$~~} of~~ the ~~type~~

%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \to \bar{\mathbb{R}}$~~}, ~~

where {$(Z ^ {(*)})^K$} is a set of all functions mapping {$K$} to the set of all bags (multisets) of elements of {$Z$}.

The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma:\mathbf{Z}^*\times X\times (0,1)^K\to 2^Y$}

({$2^Y$} is a set of all subsets of {$Y$}) such that the [[prediction set -> confidence predictor]] {$ \Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$} is defined as the set of all labels {$y\in Y$} such that

the elements {$\kappa^{-1}(k)$} of each category {$k \in K$} form a rectangle {$A \times B$}, for some {$A \subseteq N$} and {$B \subseteq Z$}

In words, a Mondrian taxonomy defines a division

%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \to \bar{\mathbb{R}}

where {$(Z ^ {(*)})^K$} is a set of all functions mapping {$K$} to the set of all bags (multisets) of elements of {$Z$}.

The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma:\mathbf{Z}^*\times X\times (0,1)^K\to 2^Y$}

({$2^Y$} is a set of all subsets of {$Y$}) such that the [[prediction set -> confidence predictor]] {$ \Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$} is defined as the set of all labels {$y\in Y$} such that

to:

A ''Mondrian taxonomy'' is a measurable function $\kappa:N \times Z \to K$, where $Z$ is a set of [[observations -> confidence predictor]] and $K$ is a countable set of elements called ''categories'', that satisfies some natural properties given in Vovk et al. (2005), Section 4.5. The definition involve partitions into rectangles, reminiscent of [[https://en.wikipedia.org/wiki/Piet_Mondrian | Piet Mondrian's]] paintings (see also [[https://scottlocklin.wordpress.com/2016/12/05/predicting-with-confidence-the-best-machine-learning-idea-you-never-heard-of/ | Scott Locklin's blog]]). A simple example of MCPs is given by ''label-conditional conformal predictors'', which require the label space $\mathbf{Y}$ to be finite. In this case the set of categories $K=\mathbf{Y}$ coincides with the set of labels. The ''Mondrian conformal predictor'' determined by a conformity measure $A$ and a set of significance levels $\epsilon_k$, $k \in K$,

is defined by

is defined by

Changed lines 8-10 from:

p

to:

\Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n}, y_{n}, x_{n+1})

:=

\left\{

y \in \mathbf{Y}

\mid

p^y > \epsilon_y

\right\},

:=

\left\{

y \in \mathbf{Y}

\mid

p^y > \epsilon_y

\right\},

Changed line 16 from:

to:

where

Changed lines 18-20 from:

to:

p^y

:=

\frac{|\{i = 1, \ldots, n+1: y_i = y_{n+1} \& \alpha^y_i \le \alpha_{n+1}\}|}{|\{i = 1, \ldots, n+1: y_i = y_{n+1}\}|},

:=

\frac{|\{i = 1, \ldots, n+1: y_i = y_{n+1} \& \alpha^y_i \le \alpha_{n+1}\}|}{|\{i = 1, \ldots, n+1: y_i = y_{n+1}\}|},

Added lines 22-25:

where {$\kappa_i := \kappa(i, z_i), z_i := (x_i, y_i)$} and

\[

\alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)

\]

\[

\alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)

\]

Changed lines 51-54 from:

This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category {$k$} is equal to {$\epsilon_k$} for each {$k$}. Therefore, the frequency of errors made by MCPs on examples of category {$k$} does not exceed {$\epsilon_k$} for each {$k$}.

to:

This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category {$k$} is equal to {$\epsilon_k$} for each {$k$}. Therefore, the frequency of errors made by MCPs on examples of category {$k$} does not exceed {$\epsilon_k$} for each {$k$}.

'''Bibliography'''

* Vladimir Vovk, Alexander Gammerman, and Glenn Shafer (2005). [[http://www.alrw.net | Algorithmic learning in a random world]]. Springer, New York.

'''Bibliography'''

* Vladimir Vovk, Alexander Gammerman, and Glenn Shafer (2005). [[http://www.alrw.net | Algorithmic learning in a random world]]. Springer, New York.

Changed line 15 from:

The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma:~~ ~~Z^*\times X\times (0,1)^K\to 2^Y$}

to:

The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma:\mathbf{Z}^*\times X\times (0,1)^K\to 2^Y$}

Changed lines 18-20 from:

p_n > \epsilon_{\kappa(n, (x_n, y))}~~$}~~, ~~{$~~p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i \ge \alpha_n \}| /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|

to:

p_n > \epsilon_{\kappa(n, (x_n, y))},

\qquad

p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i \ge \alpha_n \}| /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|

\qquad

p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i \ge \alpha_n \}| /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|

May 13, 2017, at 01:27 PM
by - slightly updated

Changed lines 1-3 from:

''Mondrian conformal predictors (MCPs)'' represent a wide class of [[confidence predictor]]s which is the generalisation of [[~~transductive ~~conformal predictor]]s and [[inductive conformal predictor]]s with a new ~~main ~~property ~~-~~ validity within categories (see section '''Validity''')~~, a~~ kind of validity ~~that ~~is stronger than usual [[conservative validity]] for ~~TCMs~~.

to:

''Mondrian conformal predictors (MCPs)'' represent a wide class of [[confidence predictor]]s which is the generalisation of [[conformal predictor]]s and [[inductive conformal predictor]]s with a new property of validity, validity within categories (see section '''Validity'''); this kind of validity is stronger than usual [[conservative validity]] for [[conformal predictor]]s.

Changed line 5 from:

to:

A ''Mondrian taxonomy'' is a measurable function $\kappa:N \times Z \to K$, where {$Z$} is a set of [[observations -> confidence predictor]], $K$ is the measurable space (at most countable with the discrete {$\sigma$}-algebra) of elements called ''categories'', with the following property:

Changed lines 9-12 from:

''Mondrian nonconformity measure'' based on a ~~mondrian~~ taxonomy {$\kappa$} is a family of measurable functions {$\{A_n:n\in N\}$} of the type

%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \~~rightarrow~~ \bar{\mathbb{R}}$},

%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \

to:

''Mondrian nonconformity measure'' based on a Mondrian taxonomy {$\kappa$} is a family of measurable functions {$\{A_n:n\in N\}$} of the type

%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \to \bar{\mathbb{R}}$},

%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \to \bar{\mathbb{R}}$},

Changed lines 15-23 from:

The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma: Z^*\times X\times (0,1)^K\~~rightarrow~~ 2^Y$}

({$2^Y$} is a set of all subsets of {$Y$}) such that the [[prediction set -> confidence predictor]] {$ \Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$} is defined as the set of all~~lables~~ {$y\in Y$} such that

~~%center% {$~~p_n > \epsilon_{\kappa(n, (x_n, y))}$}, {$p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i \ge \alpha_n \}| /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|$~~}~~

where {$\kappa_i:= \kappa(i, z_i), z_i := (x_i, y_i)$} and

%center% {$\alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)~~$}~~

({$2^Y$} is a set of all subsets of {$Y$}) such that the [[prediction set -> confidence predictor]] {$ \Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$} is defined as the set of all

where {$\kappa_i

%center% {$

to:

The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma: Z^*\times X\times (0,1)^K\to 2^Y$}

({$2^Y$} is a set of all subsets of {$Y$}) such that the [[prediction set -> confidence predictor]] {$ \Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$} is defined as the set of all labels {$y\in Y$} such that

\[

p_n > \epsilon_{\kappa(n, (x_n, y))}$}, {$p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i \ge \alpha_n \}| /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|

\]

where {$\kappa_i := \kappa(i, z_i), z_i := (x_i, y_i)$} and

\[

\alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)

\]

({$2^Y$} is a set of all subsets of {$Y$}) such that the [[prediction set -> confidence predictor]] {$ \Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$} is defined as the set of all labels {$y\in Y$} such that

\[

p_n > \epsilon_{\kappa(n, (x_n, y))}$}, {$p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i \ge \alpha_n \}| /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|

\]

where {$\kappa_i := \kappa(i, z_i), z_i := (x_i, y_i)$} and

\[

\alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)

\]

Deleted line 25:

Changed line 29 from:

*[[~~transductive ~~conformal predictor]]s

to:

*[[conformal predictor]]s

Deleted line 31:

Changed lines 36-37 from:

''Smoothed MCPs'' are defined analogously to [[smoothed conformal predictors ->~~ transductive~~ conformal predictor]]:

to:

''Smoothed MCPs'' are defined analogously to [[smoothed conformal predictors -> conformal predictor]]:

Changed lines 47-49 from:

for all {$n$}, the conditional probability distribution of {$p_n$} given {$\kappa(1, z_1), p_1, \ldots, \kappa(n-1, z_{n-1}), p_{n-1}, \kappa(n, z_n)$} is uniform on [0, 1], where {$z_1, z_2, \ldots$} are examples generated from an [[exchangeable (probability) distribution]] on {$Z^{\infty}$} and {$p_1, p_2, \ldots$} are the p-values output by {$f$}.

to:

'''Theorem''' ''Any smoothed MCP based on a Mondrian taxonomy {$\kappa$} is category-wise exact w.r. to {$\kappa$}'', that is, for all {$n$}, the conditional probability distribution of {$p_n$} given {$\kappa(1, z_1), p_1, \ldots, \kappa(n-1, z_{n-1}), p_{n-1}, \kappa(n, z_n)$} is uniform on [0, 1], where {$z_1, z_2, \ldots$} are examples generated from an [[exchangeable (probability) distribution]] on {$Z^{\infty}$} and {$p_1, p_2, \ldots$} are the p-values output by {$f$}.

Changed line 12 from:

%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \rightarrow \bar{R}$},

to:

%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \rightarrow \bar{\mathbb{R}}$},

Deleted line 49:

Changed line 1 from:

''Mondrian conformal predictors (MCPs)'' ~~are~~ a wide class of [[confidence predictor]]s which is the generalisation of [[transductive conformal predictor]]s and [[inductive conformal predictor]]s~~. Their main property is~~ validity within categories (see section '''Validity'''), a kind of validity that is stronger than usual [[conservative validity]] for TCMs.

to:

''Mondrian conformal predictors (MCPs)'' represent a wide class of [[confidence predictor]]s which is the generalisation of [[transductive conformal predictor]]s and [[inductive conformal predictor]]s with a new main property - validity within categories (see section '''Validity'''), a kind of validity that is stronger than usual [[conservative validity]] for TCMs.

Added lines 1-55:

''Mondrian conformal predictors (MCPs)'' are a wide class of [[confidence predictor]]s which is the generalisation of [[transductive conformal predictor]]s and [[inductive conformal predictor]]s. Their main property is validity within categories (see section '''Validity'''), a kind of validity that is stronger than usual [[conservative validity]] for TCMs.

!!Definition

''Mondrian taxonomy'' is a measurable function {$\kappa:N \times Z \rightarrow K$}, where {$Z$} is a set of [[examples -> confidence predictor]], {$K$} is the measurable space (at most countable with the discrete {$\sigma$}-algebra) of elements called ''categories'', with the following property:

the elements {$\kappa^{-1}(k)$} of each category {$k \in K$} form a rectangle {$A \times B$}, for some {$A \subseteq N$} and {$B \subseteq Z$}.

In words, a Mondrian taxonomy defines a division of the Cartesian product {$N \times Z$} into categories.

''Mondrian nonconformity measure'' based on a mondrian taxonomy {$\kappa$} is a family of measurable functions {$\{A_n:n\in N\}$} of the type

%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \rightarrow \bar{R}$},

where {$(Z ^ {(*)})^K$} is a set of all functions mapping {$K$} to the set of all bags (multisets) of elements of {$Z$}.

The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma: Z^*\times X\times (0,1)^K\rightarrow 2^Y$}

({$2^Y$} is a set of all subsets of {$Y$}) such that the [[prediction set -> confidence predictor]] {$ \Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$} is defined as the set of all lables {$y\in Y$} such that

%center% {$p_n > \epsilon_{\kappa(n, (x_n, y))}$}, {$p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i \ge \alpha_n \}| /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|$}

where {$\kappa_i := \kappa(i, z_i), z_i := (x_i, y_i)$} and

%center% {$\alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)$}

for {$i = 1, \ldots, n$} such that {$\kappa_i = \kappa_n$} ({$\{z_j: \ldots \}$} denotes a multiset).

The standard assumption for MCPs is [[randomness assumption]] (also called the i.i.d. assumption).

Important special cases of MCPs:

*[[transductive conformal predictor]]s

*[[inductive conformal predictor]]s

!! Validity

All the statements in the section are given under the [[randomness assumption]].

''Smoothed MCPs'' are defined analogously to [[smoothed conformal predictors -> transductive conformal predictor]]:

%center% {$ \Gamma^{(\epsilon_k: k \in K)}(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$}

is set to the set of all labels {$y \in Y$} such that

%center% {$p_n > \epsilon{\kappa(n, (x_n, y))} $}, where

{$p_n = (|\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i > \alpha_n \}| + \eta_n |\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i = \alpha_n \}| )/|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|$},

the nonconformity scores {$\alpha_i$} are defined as before and {$\eta_n\in[0,1]$} is a random number.

As in the case of [[TCPs->transductive conformal predictor]], we can formulate analogous statements for ICPs using the same terminology.

'''Theorem''' ''Any smoothed MCP based on a Mondrian taxonomy {$\kappa$} is category-wise exact w.r. to {$\kappa$}'', that is,

for all {$n$}, the conditional probability distribution of {$p_n$} given {$\kappa(1, z_1), p_1, \ldots, \kappa(n-1, z_{n-1}), p_{n-1}, \kappa(n, z_n)$} is uniform on [0, 1], where {$z_1, z_2, \ldots$} are examples generated from an [[exchangeable (probability) distribution]] on {$Z^{\infty}$} and {$p_1, p_2, \ldots$} are the p-values output by {$f$}.

This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category {$k$} is equal to {$\epsilon_k$} for each {$k$}. Therefore, the frequency of errors made by MCPs on examples of category {$k$} does not exceed {$\epsilon_k$} for each {$k$}.

!!Definition

''Mondrian taxonomy'' is a measurable function {$\kappa:N \times Z \rightarrow K$}, where {$Z$} is a set of [[examples -> confidence predictor]], {$K$} is the measurable space (at most countable with the discrete {$\sigma$}-algebra) of elements called ''categories'', with the following property:

the elements {$\kappa^{-1}(k)$} of each category {$k \in K$} form a rectangle {$A \times B$}, for some {$A \subseteq N$} and {$B \subseteq Z$}.

In words, a Mondrian taxonomy defines a division of the Cartesian product {$N \times Z$} into categories.

''Mondrian nonconformity measure'' based on a mondrian taxonomy {$\kappa$} is a family of measurable functions {$\{A_n:n\in N\}$} of the type

%center% {$A_n: K^{n-1} \times (Z ^ {(*)})^K \times K \times Z \rightarrow \bar{R}$},

where {$(Z ^ {(*)})^K$} is a set of all functions mapping {$K$} to the set of all bags (multisets) of elements of {$Z$}.

The ''Mondrian conformal predictor'' determined by a the Mondrian nonconformity measure {$A_n$} and a set of significance levels {$\epsilon_k, k \in K$} is a [[confidence predictor]] {$\Gamma: Z^*\times X\times (0,1)^K\rightarrow 2^Y$}

({$2^Y$} is a set of all subsets of {$Y$}) such that the [[prediction set -> confidence predictor]] {$ \Gamma^ {(\epsilon_k: k \in K)} (x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$} is defined as the set of all lables {$y\in Y$} such that

%center% {$p_n > \epsilon_{\kappa(n, (x_n, y))}$}, {$p_n = |\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i \ge \alpha_n \}| /|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|$}

where {$\kappa_i := \kappa(i, z_i), z_i := (x_i, y_i)$} and

%center% {$\alpha_i := A_n(\kappa_1, \ldots, \kappa_{n-1}, (k \mapsto \{z_j: j \in \{1, \ldots, i-1, i+1, \ldots, n\} \& \kappa_j = k\}), \kappa_n, z_i)$}

for {$i = 1, \ldots, n$} such that {$\kappa_i = \kappa_n$} ({$\{z_j: \ldots \}$} denotes a multiset).

The standard assumption for MCPs is [[randomness assumption]] (also called the i.i.d. assumption).

Important special cases of MCPs:

*[[transductive conformal predictor]]s

*[[inductive conformal predictor]]s

!! Validity

All the statements in the section are given under the [[randomness assumption]].

''Smoothed MCPs'' are defined analogously to [[smoothed conformal predictors -> transductive conformal predictor]]:

%center% {$ \Gamma^{(\epsilon_k: k \in K)}(x_1, y_1, \ldots, x_{n-1}, y_{n-1}, x_n)$}

is set to the set of all labels {$y \in Y$} such that

%center% {$p_n > \epsilon{\kappa(n, (x_n, y))} $}, where

{$p_n = (|\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i > \alpha_n \}| + \eta_n |\{i = 1, \ldots, n: \kappa_i = \kappa_n \& \alpha_i = \alpha_n \}| )/|\{i = 1, \ldots, n: \kappa_i = \kappa_n\}|$},

the nonconformity scores {$\alpha_i$} are defined as before and {$\eta_n\in[0,1]$} is a random number.

As in the case of [[TCPs->transductive conformal predictor]], we can formulate analogous statements for ICPs using the same terminology.

'''Theorem''' ''Any smoothed MCP based on a Mondrian taxonomy {$\kappa$} is category-wise exact w.r. to {$\kappa$}'', that is,

for all {$n$}, the conditional probability distribution of {$p_n$} given {$\kappa(1, z_1), p_1, \ldots, \kappa(n-1, z_{n-1}), p_{n-1}, \kappa(n, z_n)$} is uniform on [0, 1], where {$z_1, z_2, \ldots$} are examples generated from an [[exchangeable (probability) distribution]] on {$Z^{\infty}$} and {$p_1, p_2, \ldots$} are the p-values output by {$f$}.

This implies that in the long run the frequency of erroneous predictions given by smoothed MCPs on examples of category {$k$} is equal to {$\epsilon_k$} for each {$k$}. Therefore, the frequency of errors made by MCPs on examples of category {$k$} does not exceed {$\epsilon_k$} for each {$k$}.