Stacked convolutional denoising auto-encoders for feature representation

Bo Du, Wei Xiong, Jia Wu, Lefei Zhang*, Liangpei Zhang, Dacheng Tao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

331 Citations (Scopus)


Deep networks have achieved excellent performance in learning representation from visual data. However, the supervised deep models like convolutional neural network require large quantities of labeled data, which are very expensive to obtain. To solve this problem, this paper proposes an unsupervised deep network, called the stacked convolutional denoising auto-encoders, which can map images to hierarchical representations without any label information. The network, optimized by layer-wise training, is constructed by stacking layers of denoising auto-encoders in a convolutional way. In each layer, high dimensional feature maps are generated by convolving features of the lower layer with kernels learned by a denoising auto-encoder. The auto-encoder is trained on patches extracted from feature maps in the lower layer to learn robust feature detectors. To better train the large network, a layer-wise whitening technique is introduced into the model. Before each convolutional layer, a whitening layer is embedded to sphere the input data. By layers of mapping, raw images are transformed into high-level feature representations which would boost the performance of the subsequent support vector machine classifier. The proposed algorithm is evaluated by extensive experimentations and demonstrates superior classification performance to state-of-the-art unsupervised networks.

Original languageEnglish
Pages (from-to)1017-1027
Number of pages11
JournalIEEE Transactions on Cybernetics
Issue number4
Publication statusPublished - 1 Apr 2017
Externally publishedYes


  • Convolution
  • Deep learning
  • Denoising auto-encoders
  • Unsupervised learning


Dive into the research topics of 'Stacked convolutional denoising auto-encoders for feature representation'. Together they form a unique fingerprint.

Cite this