Home
Repository Search
Listing
Academics - Research coordination office
R-RC -Acad
Admin-Research Repository
Engineering and Computer Science
Computer Science
Engineering
Mathematics
Languages
Arabic
Chinese
English
French
Persian
Urdu
German
Korean
Management Sciences
Economics
Governance and Public Policy
Management Sciences
Management Sciences Rawalpindi Campus
ORIC
Oric-Research
Social Sciences
Education
International Relations
Islamic thought & Culture
Media and Communication Studies
Pakistan Studies
Peace and Conflict Studies
Psychology
Content Details
Back to Department Listing
Title
Machine Learning for Identification of Regional Languages of Pakistan From Short Utterance
Author(s)
Ammara Imtiaz
Abstract
An interesting problem in speech analysis is automatic identification of languages from short utterances. Language Identification (LID) related research is gaining importance. It tries to overcome communication barrier among the speakers in sharing information with each other in their native languages. LID has wide range of applications in spoken languages such as language to language translation, language understanding, telephone based system, voice dialling, tourism, e-health, and distance learning. The thesis focuses on application of LID in classifying major regional languages of Pakistan. These languages are Urdu, Balochi, Punjabi, Pashto and Sindhi. Urdu is national language of Pakistan whereas other four are regional and provincial languages. The thesis proposes a new method for LID, which is referred to as Nearest Neighbour Feature Matching (NNFM) strategy to efficiently classify the languages of Pakistan in recordings. To identify languages with NNFM, a three step process is implemented. In the first step, Mel Frequency Cepstral Coefficients (MFCC) algorithm is applied to the speech samples of training and test set to extract speech features. The extracted features are then normalized such that the magnitude of each feature becomes equal to unity. In the second step, the normalized features of a test speech samples are matched with features of all the speech samples of the training set using dot product. The dot product produces maximum values where a test feature perfectly matches with its Nearest Neighbour (NN) feature in a speech sample of the training set. Then the maximum dot product values are obtained. The maximum values are averaged over all the features of the test speech sample. The average value quantifies the similarity of the test sample with the samples of the training set. The training sample that gives maximum average value is selected and its features, which are referred to as NN features are used to replace the features of the test samples. In the third step, Gaussian Mixture Model-Universal Background Model (GMM-UBM) is trained on the training samples. The GMM-UBM computes a General Language model and a specific language model. The NN features are then provided to GMM-UBM for prediction of a language in the test sample. Based on the two models GMM-UBM computes log-likelihood. The language category of the training set that gives the maximum log likelihood is selected as a predicted language for the test sample. Experiments are performed on Corpus of Regional Languages (CRL) of Pakistan. The experimental results show that GMM-UBM classifier with proposed NN-FM method gives better results than GMM-UBM without NNFM method. The experimental results show that GMM-UBM without NNFM achieves average 48%, 50%, 52% and 53.3% accuracies on test utterances of duration three, five, ten and fifteen seconds, respectively. Whereas with NNFM, GMM-UBM achieves average 56.7%, 60.7%, 63.3% and 65.3% accuracies, on three, five, ten and fifteen seconds test utterances, respectively. The proposed NNFM efficiently improves the accuracy of GMM-UBM by almost 8.7% to 12%. Experiments on a Call friend corpus consisting of six different international languages are also performed the experimental results show that NNFM also significantly improves the performance of GMM-UBM. Keywords: Language Identification, Nearest Neighbour Feature Matching, Speech Signal, Speech Features, Gaussian Mixture Model
Type
Thesis/Dissertation MS
Faculty
Engineering and Computer Science
Department
Computer Science
Language
English
Publication Date
2020-01-07
Subject
Computer Science, Machine Learning, Accent Recognition
Publisher
NUML
Contributor(s)
Ammara Imtiaz, and Dr. Sajid Saleem ( Co-Supervisor)
Format
PDF
Identifier
Source
Relation
Coverage
Rights
NUML
Category
MSCS Thesis by Ammara Imtiaz
Description
Attachment
Name
Timestamp
Action
807ed0ce70.pdf
2020-02-12 15:52:55
Download