Mashups are a dominant approach for building data-centric applications, especially mobile applications, in recent years. Since mashups are predominantly based on public data sources and existing APIs, it requires no sophisticated programming knowledge of people to develop mashup applications. The recent prevalence of open APIs and open data sources in the Big Data era has provided new opportunities for mashup development, but at the same time increase the difficulty of selecting the right services for a given mashup task. The API recommendation for mashup differs from traditional service recommendation tasks in lacking the specific QoS information and formal semantic specification of the APIs, which limits the adoption of many existing methods. Although there are a significant number of service recommendation approaches, most of them focus on improving the recommendation accuracy and work pays attention to the diversity of the recommendation results. Another challenge comes from the existence of both explicit and implicit correlations among the different APIs, which are generally neglected by existing recommendation methods. In this paper, we address the above deficiencies of existing approaches by exploring API recommendation for mashups in the reusable composition context, with the goal of helping developers identify the most appropriate APIs for their composition tasks. In particular, we propose a probabilistic matrix factorization approach with implicit correlation regularization to solve the recommendation problem and enhance the recommendation diversity. We conjecture that the co-invocation of APIs in real-world mashups is driven by both the explicit textual similarity and implicit correlations of APIs such as the similarity or the complementary relationship of APIs. We develop a latent variable model to uncover the latent correlations between APIs by analyzing their co-invocation patterns. We further explore the relationships of topics/categories to the proposed approach. We demonstrate the effectiveness of our approach by conducting extensive experiments on a real dataset crawled from ProgrammableWeb.
- matrix factorization
- latent variable model