This paper studies a link-text algorithm to model scientific documents by citation influences, which is applied to document clustering and influence prediction. Most existing link-text algorithms ignore the different weights of citation influences that cited documents have on the corresponding citing document. In fact, citation influences reveal the latent structure of citation networks which is more accurate to describe the knowledge flow than the original citation structure. In this study, a citation influence is modeled as a weight of linear combination that approximates the text of a document by the content of its citations. Then, we present a novel matrix factorization algorithm, called Citation-Influences-Text Nonnegative Matrix Factorization (CIT-NMF), which incorporates text and citations to obtain better document representations by learning influence weights. In addition, an efficient optimization method is derived to solve the optimization problem. Experimental results on several real datasets show satisfactory improvements over the baseline models.
- Citation influence
- Citation networks
- Document clustering
- Nonnegative matrix factorization