CoSENT: consistent sentence embedding via similarity ranking

Xiang Huang, Hao Peng, Dongcheng Zou, Zhiwei Liu, Jianxin Li, Kay Liu, Jia Wu, Jianlin Su, Philip S. Yu

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)

Abstract

Learning the representation of sentences is fundamental work in the field of Natural Language Processing. Although BERT-like transformers have achieved new SOTAs for sentence embedding in many tasks, they have been proven difficult to capture semantic similarity without proper fine-tuning. A common idea to measure Semantic Textual Similarity (STS) is considering the distance between two text embeddings defined by the dot product or cosine function. However, the semantic embedding spaces induced by pretrained transformers are generally non-smooth and tend to deviate from a normal distribution, which makes traditional distance metrics imprecise. In this paper, we first empirically explain the failure of cosine similarity in semantic textual similarity measuring, and present CoSENT, a novel Co nsistent SENT ence embedding framework. Concretely, a supervised objective function is designed to optimize the Siamese BERT network by exploiting ranked similarity labels of sample pairs. The loss function utilizes uniform cosine similarity-based optimization for both the training and prediction phases, improving the consistency of the learned semantic space. Additionally, the unified objective function can be adaptively applied to different datasets with various types of annotations and different comparison schemes of the STS tasks only by using sortable labels. Empirical evaluations on 14 common textual similarity benchmarks demonstrate that the proposed CoSENT excels in performance and reduces training time cost.
Original languageEnglish
Pages (from-to)2800-2813
Number of pages14
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume32
DOIs
Publication statusPublished - 2024

Fingerprint

Dive into the research topics of 'CoSENT: consistent sentence embedding via similarity ranking'. Together they form a unique fingerprint.

Cite this