Content Details | NUML Online Research Repository

Title Machine Learning for Identification of Regional Languages of Pakistan From Short Utterance

Author(s) Ammara Imtiaz

Abstract An interesting problem in speech analysis is automatic identification of languages from short utterances. Language Identification (LID) related research is gaining importance. It tries to overcome communication barrier among the speakers in sharing information with each other in their native languages. LID has wide range of applications in spoken languages such as language to language translation, language understanding, telephone based system, voice dialling, tourism, e-health, and distance learning. The thesis focuses on application of LID in classifying major regional languages of Pakistan. These languages are Urdu, Balochi, Punjabi, Pashto and Sindhi. Urdu is national language of Pakistan whereas other four are regional and provincial languages. The thesis proposes a new method for LID, which is referred to as Nearest Neighbour Feature Matching (NNFM) strategy to efficiently classify the languages of Pakistan in recordings. To identify languages with NNFM, a three step process is implemented. In the first step, Mel Frequency Cepstral Coefficients (MFCC) algorithm is applied to the speech samples of training and test set to extract speech features. The extracted features are then normalized such that the magnitude of each feature becomes equal to unity. In the second step, the normalized features of a test speech samples are matched with features of all the speech samples of the training set using dot product. The dot product produces maximum values where a test feature perfectly matches with its Nearest Neighbour (NN) feature in a speech sample of the training set. Then the maximum dot product values are obtained. The maximum values are averaged over all the features of the test speech sample. The average value quantifies the similarity of the test sample with the samples of the training set. The training sample that gives maximum average value is selected and its features, which are referred to as NN features are used to replace the features of the test samples. In the third step, Gaussian Mixture Model-Universal Background Model (GMM-UBM) is trained on the training samples. The GMM-UBM computes a General Language model and a specific language model. The NN features are then provided to GMM-UBM for prediction of a language in the test sample. Based on the two models GMM-UBM computes log-likelihood. The language category of the training set that gives the maximum log likelihood is selected as a predicted language for the test sample. Experiments are performed on Corpus of Regional Languages (CRL) of Pakistan. The experimental results show that GMM-UBM classifier with proposed NN-FM method gives better results than GMM-UBM without NNFM method. The experimental results show that GMM-UBM without NNFM achieves average 48%, 50%, 52% and 53.3% accuracies on test utterances of duration three, five, ten and fifteen seconds, respectively. Whereas with NNFM, GMM-UBM achieves average 56.7%, 60.7%, 63.3% and 65.3% accuracies, on three, five, ten and fifteen seconds test utterances, respectively. The proposed NNFM efficiently improves the accuracy of GMM-UBM by almost 8.7% to 12%. Experiments on a Call friend corpus consisting of six different international languages are also performed the experimental results show that NNFM also significantly improves the performance of GMM-UBM. Keywords: Language Identification, Nearest Neighbour Feature Matching, Speech Signal, Speech Features, Gaussian Mixture Model

Type Thesis/Dissertation MS

Faculty Engineering and Computer Science

Department Computer Science

Language English

Publication Date 2020-01-07

Subject Computer Science, Machine Learning, Accent Recognition

Publisher NUML

Contributor(s) Ammara Imtiaz, and Dr. Sajid Saleem ( Co-Supervisor)

Format PDF

Identifier

Source

Relation

Coverage

Rights NUML

Category MSCS Thesis by Ammara Imtiaz

Description