Abstract
Nonlinear feature extraction of speech signals has been the main concern of many researches in recent years. In this paper, feature extraction of phonemes using NPC (neural predictive coding) model is generalized to a combination of time and DCT domains. Two main ideas were proposed and evaluated in this paper. First, a frame-wise DCT-based NPC feature extractor is proposed to overcome the computational complexity deficiency of the system. The basis of this approach is the application of a DCT pre-feature extractor to remove unwanted additional data. In this approach, the extracted features are the output of the hidden layer. It is shown that the use of a pre-processing stage can improve both computational complexity efficiency and accuracy issues. At the second approach, we proposed a complementary role for DCT domain features in classic NPC modeling. This approach uses the signal residual of the predicted signal in the DCT domain. The experiments were conducted on voiced plosive phonemes of TIMIT database. Simulations showed that the performance of the combined method is good at the plosive phonemes. The achieved accuracy that was resulted from the proposed method was 70. 3% recognition rate on /b/d/g/ phonemes, which is higher than the results of traditional NPC approaches.
Original language | English |
---|---|
Pages (from-to) | 565-574 |
Number of pages | 10 |
Journal | Neural Computing and Applications |
Volume | 21 |
Issue number | 3 |
DOIs | |
Publication status | Published - 1 Apr 2012 |
Externally published | Yes |
Keywords
- Automatic feature extraction
- Discrete cosine transform
- Neural network
- Nonlinear predictive coding