Cross Project Model For Churn Prediction In Telecom Sector
Customer churn is an important and critical issue in telecommunication sector. With acquiring new customers, the high cost is associated, so due to this customer churn prediction is one of the most important activities for a project manager and has indispensable part of industry’s strategic decision making and planning process. Unlike traditional customer churn prediction models that identify customer churn, cross projects just in time prediction is relative new and more practical alternate to traditional churn prediction techniques, providing immediate feedback while design decisions are still fresh in the minds of the project managers. The proposed model requires a large size of training data, usually such amount of data not available when the companies are at initial stage. To address this challenge in traditional churn prediction, prior studies have proposed cross-project models (CPM). Cross Project Model learned from previous projects of same nature with sufficient history. However, only few studies have focused on transferring prediction models from one project to another. This research do an early attempt which makes the use of just-in-time approach needed for customer churn prediction with cross-project model. Along with this there is always the problem of accuracy in CPM which are addressed by embedding ensemble technique. Ensemble application has shown tremendous increase in the accuracy of prediction for customer churn. With ensemble technique, genetic algorithm outperforms other classifiers by achieving an optimized accuracy of 68% which is 11% more than the previous technique that is without ensemble technique for cross project model.
Enhancing Buffer Management In Delay Tolerant Networks Via Novel Message Drop Policy
Delay Tolerant Network is referred to such network in which end to end connectivity is rarely exists. Delay Tolerant networking is an approach that pursues to report the problems that reduces communication in disrupted networks. DTN works on Store-Carry and Forward mechanism in such a way that a message may be stored by a node for a comparatively large amount of time and carry it until a proper forwarding opportunity appears. To store a message for long delays a proper buffer management scheme is required to select a message for dropping upon buffer overflow. Every time dropping messages leads toward the wastage of valuable resources which the message already consumed. The proposed solution is a size based policy which determines an inception size for the selection of message for deletion as buffer becomes overflow. The basic theme behind this scheme is that by determining the exact buffer space requirement one can easily select a message of an appropriate size to be discarded. By doing so, it can overcome unnecessary message drop and ignore biasness just before selection of specific sized message. The proposed scheme Spontaneous Size Drop (SS-Drop) implies a simple but intelligent mechanism to determine the inception size to drop a message upon overflow of buffer. After simulation in ONE (Opportunistic Network Environment) simulator the SS-Drop outperforms the opponent drop polices in terms of high deliver ratio by giving 66.3% delivery probability value and minimize the overhead ratio up to 41.25 %. SS-Drop also showed a prominent reduction in dropping of messages and buffer time average.
FOG-ASSISTED CONGESTION AVOIDANCE SCHEME FOR INTERNET OF VEHICLES
Internet of Vehicles (IoVs) is an emerging research area. It has wide ranging applications such as traffic management, vehicle security and communication among vehicle etc. Most of these applications require vehicles to continuously update their information to a centralized repository or server in order to gain various services. IoV message dissemination schemes are identified with congestion issues due to large number of messages populated by vehicles in the area. However, frequent transmission of messages by a large number of vehicles may not only overwhelm a centralized server but also causes a congestion which may be dangerous in emergency situations. The aim of this research is to minimize congestion for smooth communication.
This work presents a fog-assisted congestion avoidance scheme for IoV named Energy Efficient Message Dissemination (E2MD). To capitalize on the merits of fog computing and minimize delay, E2MD uses a distributed approach by employing a fog server to balance services in IoVs. In E2MD, vehicles continuously update their status to a fog server either directly or through intermediate nodes. In case of an emergency, the fog server will inform upcoming traffic to slow down the speed, dispatch rescue teams to provide necessary services, and coordinate patrolling missions to clear the road. Proposed scheme considers a reality based model having intercity highways as well as roads in urban areas. Each road consists of three lanes where left most is slowest and in the right lane vehicles are moving at high speed.
The performance of the proposed scheme is validated through NS 2.35 simulations. Simulation results confirm the performance supremacy of E2MD compared to contemporary schemes in terms of delay, message overhead and packet delivery ratio. E2MD consume 5 microseconds while contemporary schemes cause delays in milliseconds. E2MD improves message delivery cost by 108% and decrease message overhead cost by 73% and 98% respectively than other schemes. In future need to work for the scenario if AV is blasted and unable to inform nearby vehicles.
SECURE AND DE-DUPLICATION BASED DATA AGGREGATION IN WIRELESS BODY AREA NETWORKS
Wireless Body Area Networks (WBAN) are helpful for monitoring, diagnostic, and therapeutic levels. These networks gather real time medical information by using various sensors with secure communication links. It facilitates doctors to observe a patient’s health conditions by monitoring patient’s vital signs away from the hospital. Sensors sense the data and forward it to the head node. The collector node consumes power to process this redundant information. It wastes too much power by sending same kind of data to next level repeatedly. During data aggregation, the collector node receives input data packets, process them and transmits it as a single packet that causes communication, energy and storage overhead.
A data de-duplication approach has been proposed to remove redundancy and ensure single instantiation of data. In this work, we have proposed a de-duplication based data aggregation mechanism that includes adaptive chunking algorithm (ACA). It identifies a cut-point between two windows. It includes fixed size and variable sized window that is identified as per minimum threshold for windows size. Our algorithm locates a second level variable length chunk based on the delimiter to improve the size of variable length window. The algorithms have been simulated using NS-2.35 on Ubuntu where TCL code is used for deploying sensing devices and message initiation. C language is used for implementing the algorithms, message receiving and sending among sensors, head nodes and sink nodes. Test results show that increase in variable sized window is measured by 65.6%, 68% and 71.2% in case of RAM, AE and proposed ACA, respectively. It results in better de-duplication identification. In this case, collector nodes consume 64% more energy as compared to sensor nodes. Results show better performance of proposed scheme over counterparts in terms of cut-point identification failure, fixed and variable length chunk size, average chunk size, number of chunks, cut-point identification failure and energy consumption.
PROBLEMS, CONSEQUENCES AND THEIR SOLUTIONS FOR EMOTION BASED REQUIREMENT ENGINEERING IN GLOBAL SOFTWARE DEVELOPMENT – A GUIDELINE
Software Requirement Engineering (SRE) is a valued domain of software
engineering. The success of a software project is mainly dependent on good
requirement engineering practices. Emotion based requirement engineering is
said to increase the credibility of requirement engineering. When requirement
engineering is taken to a bigger scenario of global software development (GSD),
it becomes more tricky and difficult to handle. There is a lack of studies focusing
on emotion based requirement engineering in GSD. Due to lack of such studies
academicians, researchers and practitioners are unaware of the problems and their
consequences on successful software development.
The proposed study identifies the problems due to lack of emotion based
requirement visualization, consequences of these problems, overcoming
strategies / solutions for these problems. The systematic literature review (SLR),
expert evaluation and survey are used as methodology instrument. Twenty three
(23) problems were identified through SLR. Besides, the consequences and
solutions of the identified problems are also found out by SLR and are evaluated
through experts. In SLR conduction, ak9t first 60 papers were collected which
reduced to 30 after assessing their quality. For extraction of potential problems
from the literature, their consequences and solutions, grounded theory was
applied. Furthermore, a survey is conducted to evaluate the practicality of the
identified problems, consequences and overcoming solution strategies in real
working environment. The study provides a comprehensive guideline for the
practitioners, academicians and researchers for performing better visualization of
emotion based requirement engineering in GSD environment which increases
ratio of success. The visualization support of requirements may best be achieved
in this way.
POSITION BASED ROUTING IN VEHICULAR AD HOC NETWORKS
Internet of Things (IoT) involves a large number of smart gadgets along with sensing capabilities to exchange the information across multiple networks. IoT enabled Vehicular Ad-hoc Network (I-VANET) comprises of a large number of vehicles that are connected with neighboring vehicles to exchange data with central repositories. In this scenario, network has a dynamic nature due to high mobility of vehicles or nodes in a smart city environment. Present routing protocols do not meet the challenging requirements for this scenario and position based routing protocols are considered to be a suitable solution. Position based routing protocols also encounter problems in city environment due to obstacles like buildings, trees that block line of sight communication among vehicles within a small area.
In this research work, we have proposed a Dynamic Position Based Routing (D-PBR) scheme. It considers the vehicle’s position coordinates along with direction of movement parameters to decide about the next node towards the destination. In this scenario, we have considered the road junctions where different vehicles can join or leave to bring a change in the neighboring vehicle set. We have presented a Dynamic Next-hop Identification (DNI) algorithm that selects the best suitable next-hop vehicle available at the junction to forward the packet towards the destination vehicle. It calculates the distance and direction of neighboring nodes and then identifies the vehicles that can transmit the message in the direction of destination vehicle. It also maintains array-lists to store expected next-hop vehicles and then select the one vehicle. It considers least distance and more accurate direction as per current position of the vehicle that contains the packet for forwarding to the destination vehicle.
The work has been validated by simulations using NS 2.35 with TCL scripts and C code along with AWK scripts to extract results from trace files. Results show on the improvement over the existing RIDE protocol regarding end-to-end delay, residual energy, mean hop count, average throughput and average number of vehicles. The average number of vehicles for different densities decreases by 42.86% and mean hop count used for message exchange is decreased by 60% as compared to RIDE.
ENERGY EFFICENCY IN LINEAR WIRELESS SENSOR NETWORK FOR AUTONOMOUS MONITORING AND MAINTENANCE OF LIFELINE INFRASTRUCTURES
Recently, linear wireless sensor networks (LWSNs) have been eliciting increasing attention because of their suitability for applications such as protection of critical infrastructures. Most of these applications require LWSN to remain operational for a longer period. However, the non-replenishable limited battery power of sensor nodes does not allow them to meet these expectations. Therefore, a shorter network lifetime is one of the most prominent barriers in large-scale deployment of LWSN. Unlike most existing studies, in this study, we analyze the impact of node placement and clustering on LWSN network lifetime. This research work has implemented and analyzed conventional clustering protocols such as Distributed Energy-efficient Clustering (DEEC), Developed Distributed Energy-Efficient Clustering (DDEEC), and Energy Efficient Scheme for Clustering Protocol Prolonging the Lifetime of Heterogeneous Wireless Sensor Networks (TDEEC) in context LWSN.
First, existing node placement and clustering schemes have been categorized and classified for LWSN and various node placement schemes have been introduced for disparate applications. Then, we highlight the peculiarities of LWSN applications and discuss their unique characteristics. The research work has implemented and analyzed different node placement schemes for linear wireless sensor network. Simulation results use MATLAB clearly indicates that, Grid-Triangular node placement scheme, enhances network lifetime as compared linear sequential and linear parallel node placement scheme. The performance metric used in all node placement schemes is similar to DEEC, DDEEC and TDEEC based conventional clustering schemes. Grid Triangular node placement scheme improves 51 % network lifetime compared to linear sequential and linear parallel node placement schemes. Other than this, it has also been observed that, node placement and clustering schemes significantly affect LWSN lifetime.
Linear wireless sensor networks, node placement, clustering, network lifetime, energy efficiency, performance analysis.
Hybrid Indoor Position Estimation Technique using Fingerprinting and MinMax Approach
Position estimation means locating position with reference to some coordinate system, i.e. two dimensional (x, y) or with reference of an object to some known land mark. This thesis focuses on indoor position estimation using Bluetooth, which is a low cost, easily available Radio Frequency (RF) based wireless technology. Most of the latest indoor positioning systems use Bluetooth due to its low cost and wide spread use in most of the wireless gadgets including smart phones, digital watches, and other handheld devices.
Accuracy is one of the most challenging issues in position estimation, which depends on accurate signal transmission and reception, conversion of received signal to distance estimates and modeling of distance estimates to localize object position. Position estimation consists of two main steps, signal measurements and position estimation based on signal. In this thesis, we have focused on both steps, i.e. signal modeling and localization or position estimation. In step one, we perform real time experiments to collect Bluetooth signal measurements, i.e. Received Signal Strength Indicator (RSSI), which is a parameter widely used for distance and position estimation. Experimental and simulation results conclude that there is 10 dBm variation in RSSI due to additive noise, multipath, shadowing, interferences with physical objects and hence affect distance estimation, which ultimately results in position estimation error. Real time experimental results validate this variation, and conclude that if optimized radio propagational constants are chosen, position estimation accuracy up to 1.32 m can be achieved in the presence of 10 dBm variation in the radio signal. In step two, we propose a new hybrid position estimation approach which integrates fingerprinting based K-Nearest Neighbors (K-NN) and lateration based MinMax position estimation technique. The novel idea in our proposed hybrid approach is use of Euclidian distance formulation instead of indoor radio propagation model to convert the signal to distance. We have tested our proposed hybrid position estimation technique in Matlab 7.1 using real time experimental data and compared its results with fingerprinting and lateration based position estimation techniques. Simulation results show that, the proposed hybrid approach performs better in terms of mean error compared to Trilateration, MinMax, K-NN, and existing Hybrid approach.
Keywords: Localization, Distance Estimation, Fingerprinting, K-NN, MinMax, Trilateration
Extraction of Accent Information from Urdu Speech for Forensic Speaker recognition
This thesis presents a new method for extraction of accent information from Urdu
speech signals. Accent is used in speaker recognition system especially in forensic
cases and plays a vital role in identifying people of different groups, communities
and origins due to their different speaking styles. Other applications of accent are
telephone banking, voice dialing, e-health and biometric authentication. This thesis
focuses on only the forensic applications of the accent. Forensic detection through
accent helps in criminal investigation and provides additional information such as
territorial origins of the suspects.
The proposed method is based on Gaussian Mixture Model-Universal Background
Model (GMM-UBM) and a new Feature Mapping (FM) process. The proposed
method is named as GMM-FM. The FM process maps Mel-Frequency Cepstral
Coefficients (MFCC) features to higher dimensional space and improves the accent
extraction and forensic speaker recognition performances of GMM-UBM.
In the proposed method, GMM-UBM is used to obtain accent independent model.
For this purpose the training MFCC features of the training set are processed with the
proposed FM method. The processed features of all the accent categories of the
training set are combined and different GMM components are computed with GMMUBM. Each GMM component is parameterized by a mean vector, mixture weight
and covariance matrix.
In the second step, the GMM components estimated for accent independent model
are used in a Bayesian process to adapt GMM components for each accent category
of the training set. Such GMM components are referred to as accent dependent
GMM. To classify accent in a speech sample the log-likelihood is computed using
the GMMs of both accent dependent and independent models. Then accent is
predicted for the test sample based on maximizing the log-likelihood values.
Experiments are performed on Urdu and Kaggle accent corpuses. The experimental
results show that the proposed GMM-FM obtains on average 2.5% and 3.5% better
equal error rate and accuracy than GMM-UBM, respectively.
Keywords: Accent, Urdu corpus, speech signals, Gaussian components, speech
features, recognition and forensic.
Feature Point based Image Registration between Satellite and Aerial Images of Agricultural Land
Image Registration is a process of geometrically aligning two images (reference and target) of the same scene, taken from different viewpoints, at different times or by different sensors. Image registration is used in a wide range of remote sensing applications. The rapid advancement in remote sensing sensors has drastically increase the use of Satellite Imagery (SI) and Unmanned Aerial Vehicle (UAV) images in different applications such as traffic monitoring, agriculture land analysis, early warning systems and damage assessment. This thesis focuses on an agricultural application of SI and UAV images. The SI are low resolution images as they are captured from very high altitude, whereas, the UAV images are taken from low flying platform, have high resolution and relatively good quality. But the UAV images lack geo-referencing and cannot be used directly in remote sensing applications. This problem in literature is dealt with feature point based geo-registration between SI-UAV images. In case of agricultural SI-UAV images, the registration process is a challenging task. This is due to temporal nature of agricultural crops, which results in high textural and intensity differences between the SI-UAV images. Existing feature points such as Scale Invariant Feature Transform (SIFT), Speeded-Up Robust Features (SURF), Oriented FAST and Rotated BRIEF (ORB), are not invariant to temporal, textural and intensity differences and underperform in the image registration task. This thesis proposes a new method that combines the strength of Nearest Neighbor (NN) and Brute Force (BF) descriptor matching strategies to register SI-UAV images. The proposed method is named as NNBF method. In the proposed NN-BF method, the corresponding feature point matches are first identified between SI-UAV images of the training set with overlap error. Then the corresponding feature point matches are used with NN and BF based descriptor matching strategies to register the SI-UAV images of the test set. Experiments are performed on SI-UAV image dataset of agricultural land. The experimental results show that NN-BF improves the matching and precision scores of SIFT by 20.4% and 32% respectively. Whereas in case of SURF, the NN-BF method improves the matching and precision scores by 19.5% and 21.8% respectively.
Keywords: Agriculture land, Feature point detectors, Feature point descriptors, Image registration, Satellite imagery, UAV images.
Real Time Dynamic Indoor Positioning using Machine Learning Techniques
Position estimation is the process to find the actual location of an object with reference to some coordinate system or known landmark. This thesis focuses on position estimation of an object dynamically moving in an indoor environment. Previous studies focused more on static position estimation and used traditional position estimation techniques. In traditional position estimation techniques, RSSI measurements are used for distance estimation, which requires modelling of radio propagation to get distance estimates. Modelling of radio propagation in indoor environment is a challenging task due to multipath fading, reflection, refraction of light, temperature and presence of humans etc. All these parameters affecting the received signal and produces variations in RSSI. Due to variations in RSSI, distance and position estimation error occurs. To address the issue, this thesis presents fingerprinting based position estimation with the help of machine learning. Our proposed machine learning based indoor position estimation system consists of two steps. In step one, we perform real time experiments using Bluetooth Low Energy (BLE) Beacons and developed a radio fingerprints map. In second step, we investigated five different types of machine learning techniques. These techniques are Naive Bayes, K-Nearest Neighbours (KNN), Linear Discriminant Analysis (LDA), Support Vector Machine (SVM) and Decision Tree in order to enhance position estimation accuracy especially for mobile objects.
Real time experiments are performed to evaluate performance of our proposed real time dynamic object tracking system, using five different trajectories in a 10 x 10 meters’ indoor setup. These trajectories represent real time dynamic movement in different directions and speed. Experimental results show that LDA achieved highest mean accuracy of 79.34 % followed by SVM 78.38 %, while K-NN achieved 70.04 %.
Keywords: Position Estimation, Localization, Bluetooth, RSSI, Machine Learning.
Machine Learning for Identification of Regional Languages of Pakistan From Short Utterance
An interesting problem in speech analysis is automatic identification of languages from short utterances. Language Identification (LID) related research is gaining importance. It tries to overcome communication barrier among the speakers in sharing information with each other in their native languages. LID has wide range of applications in spoken languages such as language to language translation, language understanding, telephone based system, voice dialling, tourism, e-health, and distance learning.
The thesis focuses on application of LID in classifying major regional languages of Pakistan. These languages are Urdu, Balochi, Punjabi, Pashto and Sindhi. Urdu is national language of Pakistan whereas other four are regional and provincial languages. The thesis proposes a new method for LID, which is referred to as Nearest Neighbour Feature Matching (NNFM) strategy to efficiently classify the languages of Pakistan in recordings.
To identify languages with NNFM, a three step process is implemented. In the first step, Mel Frequency Cepstral Coefficients (MFCC) algorithm is applied to the speech samples of training and test set to extract speech features. The extracted features are then normalized such that the magnitude of each feature becomes equal to unity. In the second step, the normalized features of a test speech samples are matched with features of all the speech samples of the training set using dot product. The dot product produces maximum values where a test feature perfectly matches with its Nearest Neighbour (NN) feature in a speech sample of the training set. Then the maximum dot product values are obtained. The maximum values are averaged over all the features of the test speech sample. The average value quantifies the similarity of the test sample with the samples of the training set. The training sample that gives maximum average value is selected and its features, which are referred to as NN features are used to replace the features of the test samples.
In the third step, Gaussian Mixture Model-Universal Background Model (GMM-UBM) is trained on the training samples. The GMM-UBM computes a General Language model and a specific language model. The NN features are then provided to GMM-UBM for prediction of a language in the test sample. Based on the two models GMM-UBM computes log-likelihood. The language category of the training set that gives the maximum log likelihood is selected as a predicted language for the test sample.
Experiments are performed on Corpus of Regional Languages (CRL) of Pakistan. The experimental results show that GMM-UBM classifier with proposed NN-FM method gives better results than GMM-UBM without NNFM method. The experimental results show that GMM-UBM without NNFM achieves average 48%, 50%, 52% and 53.3% accuracies on test utterances of duration three, five, ten and fifteen seconds, respectively. Whereas with NNFM, GMM-UBM achieves average 56.7%, 60.7%, 63.3% and 65.3% accuracies, on three, five, ten and fifteen seconds test utterances, respectively. The proposed NNFM efficiently improves the accuracy of GMM-UBM by almost 8.7% to 12%. Experiments on a Call friend corpus consisting of six different international languages are also performed the experimental results show that NNFM also significantly improves the performance of GMM-UBM.
Keywords: Language Identification, Nearest Neighbour Feature Matching, Speech Signal, Speech Features, Gaussian Mixture Model