The Use of Sparse Representation and Machine Learning Algorithms in Voice Activity Detection

Document Type : Original Article

Author

Department of Electrical Engineering, Nowshahr Branch, Islamic Azad University, Nowshahr, Iran

Abstract

This paper proposes the Voice Activity Detection (VAD) method was made using two-dimensional STRF (Spectro-Temporal Response Field) space based on sparse representation and learning algorithm. Two-dimensional Spectral-temporal components have two dimensions of time and frequency. In recent years, sparse representation has gained a prominent place in speech processing techniques, including improved speech and noise separation methods, the basic idea in this method is to reconstruct each speech signal using a finite number of basic atoms. In this algorithm, using auditory spectrogram and sparse representation, a dictionary with different atom sizes and KSVD and NMF learning methods were constructed. The performance of this VAD in Persian speech and English speech was evaluated. For example, the proposed VAD performance was obtained in SNRs greater than 0dB in English speech is more than 92.71 percent and 91.82 percent in White noise and Car noise respectively and for Persian speech more than 90 percent, which shows the good performance of the proposed VAD compared to other methods.

Keywords


CAPTCHA Image