Revealing distribution discrepancy by sampling transfer in unlabeled data

Zhilin Zhao, Longbing Cao, Xuhui Fan, Wei Shi Zheng*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

There are increasing cases where the class labels of test samples are unavailable, creating a significant need and challenge in measuring the discrepancy between training and test distributions. This distribution discrepancy complicates the assessment of whether the hypothesis selected by an algorithm on training samples remains applicable to test samples. We present a novel approach called Importance Divergence (I-Div) to address the challenge of test label unavailability, enabling distribution discrepancy evaluation using only training samples. I-Div transfers the sampling patterns from the test distribution to the training distribution by estimating density and likelihood ratios. Specifically, the density ratio, informed by the selected hypothesis, is obtained by minimizing the Kullback-Leibler divergence between the actual and estimated input distributions. Simultaneously, the likelihood ratio is adjusted according to the density ratio by reducing the generalization error of the distribution discrepancy as transformed through the two ratios. Experimentally, I-Div accurately quantifies the distribution discrepancy, as evidenced by a wide range of complex data scenarios and tasks.

Original languageEnglish
Title of host publicationNeurIPS 2024
Subtitle of host publication38th Conference on Neural Information Processing Systems: proceedings
EditorsA. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, C. Zhang
Place of PublicationSydney, NSW
PublisherCurran Associates
Pages1-28
Number of pages28
ISBN (Print)9798331314385
Publication statusPublished - 2024
EventConference on Neural Information Processing Systems (38th : 2024) - Vancouver, Canada
Duration: 10 Dec 202415 Dec 2024

Publication series

NameAdvances in Neural Information Processing Systems
Volume37
ISSN (Print)1049-5258

Conference

ConferenceConference on Neural Information Processing Systems (38th : 2024)
Abbreviated titleNeurIPS 2024
Country/TerritoryCanada
CityVancouver
Period10/12/2415/12/24

Cite this