Anomaly Detection

One possible definition of an anomaly is an observation, or several observations, that make the observed sequence $z_1,z_2,\ldots$ non-IID; before the anomaly the sequence appears to satisfy the randomness assumption but the anomaly stands out as violating it.

Known approaches to anomaly detection using conformal prediction include:

  • using conformal martingales for gambling against the randomness assumption
  • interpreting the objects for which a conformal predictor always outputs a small p-value, regardless of the postulated label, as anomalies.

This article discusses the second approach.

The idea was explained in the talk Toward Explainable Uncertainty (slides) by Alan Fern and Tom Dietterich. They consider "closed worlds" and "open worlds", allowing test observations with novel classes in the latter. They show that some standard conformity measures (based on random forests and neural networks) do not detect novel classes. Implicitly, the chosen conformity measures try to achieve object conditionality, which is a wrong goal in an open world. The conformity score $A(z_1,\ldots,z_n),(x,y))$ should not only say how well the new label $y$ conforms to the labels in the set $z_1,\ldots,z_n$ (given $x$) but also (and this is even more important) how well $x$ conforms to the objects in $z_1,\ldots,z_n$ given $y$.

A similar idea is explored in a series of articles by Lorenzo Cavallaro and his colleagues.

Bibliography

  • Amit Deo, S. K. Dash, G. Suarez-Tangil, V. Vovk, and L. Cavallaro (2016). Prescience: Probabilistic guidance on the retraining conundrum for malware detection. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security (AISec'16), New York, ACM, 71–82.