Software clustering using automated feature subset selection

Zubair Shah, Rashid Naseem, Mehmet A. Orgun, Abdun Mahmood, Sara Shahzad

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

11 Citations (Scopus)


This paper proposes a feature selection technique for software clustering which can be used in the architecture recovery of software systems. The recovered architecture can then be used in the subsequent phases of software maintenance, reuse and re-engineering. A number of diverse features could be extracted from the source code of software systems, however, some of the extracted features may have less information to use for calculating the entities, which result in dropping the quality of software clusters. Therefore, further research is required to select those features which have high relevancy in finding associations between entities. In this article first we propose a supervised feature selection technique for unlabeled data, and then we apply this technique for software clustering. A number of feature subset selection techniques in software architecture recovery have been proposed. However none of them focus on automated feature selection in this domain. Experimental results on three software test systems reveal that our proposed approach produces results which are closer to the decompositions prepared by human experts, as compared to those discovered by the well-known K-Means algorithm.

Original languageEnglish
Title of host publicationAdvanced Data Mining and Applications
Subtitle of host publication9th International Conference, ADMA 2013, Hangzhou, China, December 14-16, 2013, Proceedings, Part 2
EditorsHiroshi Motoda, Zhaohui Wu, Longbing Cao, Osmar Zaiane, Min Yao, Wei Wang
Place of PublicationHeidelberg
PublisherSpringer, Springer Nature
Number of pages12
ISBN (Electronic)9783642539176
ISBN (Print)9783642539169
Publication statusPublished - 2013
Event9th International Conference on Advanced Data Mining and Applications, ADMA 2013 - Hangzhou, China
Duration: 14 Dec 201316 Dec 2013

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other9th International Conference on Advanced Data Mining and Applications, ADMA 2013


Dive into the research topics of 'Software clustering using automated feature subset selection'. Together they form a unique fingerprint.

Cite this