What If the Input is Expanded in OOD Detection?

Boxuan Zhang^*¹, Jianing Zhu^*², Zengmao Wang¹, Tongliang Liu³, Bo Du¹, Bo Han^2,4,

¹School of Computer Science, Wuhan University, ²TMLR Group, Department of Computer Science, Hong Kong Baptist University, ³Sydney AI Center, The University of Sydney, ⁴RIKEN Center for Advanced Intelligence Project

( ^*Equal Contribution)

Paper arXiv Code

CoVer with multiple inputs performs a better ID-OOD separability

Select the Corruption Type

How Score distributions are being Morphed by CoVer using Corruptions with Severity Levels 1-5

Abstract

Out-of-distribution (OOD) detection aims to identify OOD inputs from unknown classes, which is important for the reliable deployment of machine learning models in the open world. Various scoring functions are proposed to distinguish it from in-distribution (ID) data. However, existing methods generally focus on excavating the discriminative information from a single input, which implicitly limits its representation dimension.

In this work, we introduce a novel perspective, i.e., employing different common corruptions on the input space, to expand that. We reveal an interesting phenomenon termed confidence mutation, where the confidence of OOD data can decrease significantly under the corruptions, while the ID data shows a higher confidence expectation considering the resistance of semantic features.

Based on that, we formalize a new scoring method, namely, Confidence aVerage (CoVer), which can capture the dynamic differences by simply averaging the scores obtained from different corrupted inputs and the original ones, making the OOD and ID distributions more separable in detection tasks. Extensive experiments and analyses have been conducted to understand and verify the effectiveness of CoVer.

Confidence Mutation

We reveal an interesting phenomenon termed Confidence Mutation, where the confidences of OOD data can vary to significantly lower than ID data under corruptions

Overall Framework of CoVer

Confidence aVerage (CoVer) is a simple average of the confidence estimated from extended corrupted inputs and the original one.

Experimental Results

First, to evaluate the effectiveness of CoVer, we compare it with existing baseline OOD detection methods on the ImageNet-1K benchmark in two aspects.

The performance comparison with traditional OOD detection methods using ResNet-50 as the backbone. CoVer combined with ASH-S can achieve better OOD detection performance

The results compared with VLM-based OOD detection methods. CoVer consistently achieves better performance across the four OOD datasets

Then, we conduct compatibility experiments of CoVer combined with different OOD detection methods.

CoVer can consistently help these methods gain better performance without specific modality limitations, which shows the algorithmic robustness of our proposed method.

Additionally, we conduct various ablation studies to provide a thorough understanding of CoVer.

Ablation Study. (a) superiority of the multi-dimensional scoring framework; (b) exploration of different quantity of expanded input dimensions; (c) using different severity levels of a specific corruption type; (d) comparison with different realizations for each dimensional confidence score.

Call for Explanation and Validation

We are releasing two calls alongside this paper to encourage, increase, and broaden the reach of scientific interactions and collaborations. The two calls are an invitation for fellow researchers to address two questions that are not yet sufficiently answered by this work:

What are plausible explanations and theoretical understandings of the effectiveness of CoVer, a simple input expansion and confidence average technique, on ID and OOD tasks?
Are there other research domains, application areas, topics and tasks where CoVer (or a similar procedure) is applicable, and what are the findings?

For each call we provide possible directions to explore the answer, however, we encourage novel quests beyond what's suggested below.

Call for explanation and theoretical understanding. In our paper, we provide a possible explanation of the effectiveness of CoVer from the empirical perspective. We suggest that common corruptions might act as perturbations of high-frequency features within the input representation. For OOD samples, which inherently lack ID semantic features, altering high-frequency features could potentially lead to notable changes in model confidence, while the ID data shows relatively better resistance on it. However, there lacks of rigorous theoretical understandings on this empirical observation.

Call for validation in other fields. We believe any domains that use a deep neural network (or a similar intelligent system) to learn representations of data when optimizing for a training task could be a fertile ground for validating CoVer. A clear domain is natural language processing, where both original and corrupted inputs, similar to CoVer, could be applied to language models like LLMs. Can the combination of original and corrupted inputs effectively enhance the OOD detection capability in language models like LLMs? Another possible domain for validation could be time-series models, where temporal disruptions in the input space could simulate perturbations analogous to our proposed method. Would such input variations help to enhance OOD detection in time-series predictions? We invite exploration of CoVer's potential in these and other fields to assess whether utilizing both original and corrupted inputs can provide new perspectives for improving model performance and robustness.

We are still working to set up a proper portal for submitting, reviewing and discussing answers to both calls. In the meantime, feel free to email zhangboxuan1005@gmail.com to start a conversation.