PROJECT TOPIC ON STUDY ON INFERENCES AND APPLICATIONS OF ODD GENERALIZED EXPONENTIAL-RAYLEIGH DISTRIBUTION
ABSTRACT
The Rayleigh distribution has wide range of applications in Applied Sciences; some of which are related to sea waves, harbor, coastal-engineering and studies on wind-wave heights among others. In this work, we propose a new lifetime model entitled Odd Generalized Exponential-Rayleigh (OGE-R) Distribution and present some of its statistical properties comprising moments, moment generating function, quantile function, median, reliability analysis and order statistics. The plot for the pdf of the distribution showed that OGE-R distribution is positively skewed, which makes it a good candidate for fitting dataset that are positively skewed. Also, the plot of the hazard funtion indicates that the proposed model can adequately fit dataset with J-shape. A method of maximum likelihood was used to estimate the parameters of the new distribution. The applicability of the new distribution using real dataset was also illustrated and its performance compared with other distributions. The result of the comparison shows that the distribution is competitive in modeling positively skewed datasets because it has the smallest values of the AIC=18.1925, AICC=18.5454 and BIC=17.7644 than the other three distributions considered.
CHAPTER ONE
INTRODUCTION
1.1 Background of the study
The Rayleigh distribution was named after Lord Rayleigh (1842-1919) a British physicist as well as mathematician also known as John William Strutt. In 1895 he discovered the inert gas Argon (Ar), the research that earned him the 1904 Nobel Prize in Physics Venkatesh and Manikandan (2016).
The Rayleigh distribution has wide range of applications in the field of applied sciences, especially in modeling the lifetime of an object or service time. Battjes (1969) stated some areas where the distribution can also be applied, these includes sea waves, harbor, coastal and ocean engineering, heights and periods of wind waves. Despite its applicability, the distribution suﬀer’s the same problem other classical distributions suﬀered from, which is lack of flexibility due to the fact that it has only one parameter. For instance, Tahir and Cordeiro (2016) stated that, the well-known classical/baseline distributions such as exponential, Rayleigh, Weibull and gamma are limited in their characteristics and are unable to show wide flexibility. Because of this and several other problems; several researchers have worked and some are still trying to overcome these challenges by generalizing some of these classical distributions to come up with compound distributions.
According to Eugene et al. (2002) generalization of distributions started in the year 1925; Also, Ahuja and Nash (1967) introduced the generalized Gompertz-Verhulst family of distributions to study growth curve mortality. Gupta et al. (1998) added one parameter to the cumulative distribution function of the baseline distribution to define the exponentiated-G class of distributions and several others follows.
Also Gupta and Kundu (1999) pioneered the study of two-parameter generalization, in which they studied two-parameter Generalized Exponential (GE) distribution also called Exponentiated Exponential (EE) distribution, after which several other
authors worked on GE distribution due to its attractive features, among which are Gupta and Kundu (1999), Kundu et al. (2005), Nadarajah (2006) and so on.
PROJECT TOPIC ON STUDY ON INFERENCES AND APPLICATIONS OF ODD GENERALIZED EXPONENTIAL-RAYLEIGH DISTRIBUTION
1.2 Statement of the problem
The importance of the assumption of probability distribution for which a random variable follows cannot be overemphasized in the statistical analysis of such variable for meaningful conclusion. Because of the fact that most of the lifetime data are unique, the need to have appropriate distribution that lifetime data follows is of great importance to improve the validity of the statistical analysis. In recent time, several generalization of probability distributions have been introduced for modeling lifetime data. However, not all datasets can adequately be modeled by the classical distributions due to the uniqueness of the data.
Hence, developing new distributions that would adequately address this problem is of paramount importance, with this in mind, we intend to propose a new distribution called the Odd Generalized Exponential-Rayleigh (OGE-R) distribution.
1.3 Generalized Exponential Distribution and Rayleigh Distribution
The Generalized Exponential (GE) distribution is a continuous probability distribution belonging to the Exponentiated Family (EF) of distributions. Tahir and Nadarajah (2015) said “The origin of this family can be traced back to the early part of the nineteenth century during which Gompertz (1825) and Verhulst (1838, 1845, 1847) used the cumulative distribution function (CDF) G(t) = (1 − e^{− t}) for t > ^{−1}log where, ; and are greater than zero”. The generalized Gompertz-Verhulst family of distributions was introduced by Ahuja and Nash (1967) to study growth curve
mortality. The Gompertz-Verhulst’s CDF happened to be the first member of the EF of distributions. The GE distribution is a particular case of this family for = 1. The CDF and the pdf of the GE distribution is given by equation (1.3.1) and (1.3.2) respectively.
G(x) = (1 − e^{− x}) | (1.3.1) |
g(x) = e^{− x}(1 − e^{− x}) ^{−1} | (1.3.2) |
where x > 0; and ; > 0
The Rayleigh distribution naturally arises in case when the wind speed data is analysed into two-orthogonal dimensional vector components, where the magnitude of components is independent and normally distributed with equal variances. Also, this distribution arises in the case of random complex numbers whose real and imaginary components are independently and identically normally distributed random variables; other areas where this distribution can be applied include health, agriculture and biology, among others (Gomes et al., 2014).
A random variable X is said to have a Rayleigh distribution if the CDF and the pdf are respectively given as:
_{−} x^{2}
G(x; ) = 1 − e 2 ^{2}
x − x^{2}
g(x; ) = _{2 }e ^{2 2}
where x > 0, is scale parameter and > 0:
(1.3.3)
(1.3.4)
1.4 Aim and objectives
The aim of this research is to develop an Odd Generalized Exponential-Rayleigh distribution, study some of its properties and evaluate its performance using real datasets. The stated aim is expected to be achieved through the following objectives. By
i.) defining and expressing the pdf of the proposed distribution.
ii.) deriving some statistical properties of the proposed distribution comprising the moments, moment generating function, Quantile, Median, reliability analysis and the distribution of order statistics.
iii.) estimating the parameters of the proposed distribution using the method of Maximum Likelihood.
iv.) evaluating the performance of the proposed distribution compared to other generalizations of Rayleigh distribution.
1.5 Scope of the study
The study is focused only on extending the Rayleigh distribution and deriving mathematical expressions for some selected properties of the proposed distribution such as moment, moment generating function, quantile, median, density functions for the minimum and maximum order statistics and estimating the model parameters by using only the method of Maximum Likelihood.
1.6 Significance of the study
The study of the proposed distribution, its properties and the parameter estimation will increase the flexibility of the Rayleigh distribution and easily model various datasets that cannot be properly fitted by the existing generalization of Rayleigh distributions. In addition, it will makes the kurtosis more flexible (compared to the baseline distribution) and possibly to construct heavy-tailed distributions for modeling real data. This study will compare the proposed distribution to some existing generalizations of the Rayleigh distribution to identify the distribution that will provide better fit using a real data sets.
1.7 Motivation
Tahir et al.(2015) highlighted some special distributions that can be obtained using their generator, these include Weibull, Frechet and normal distributions and gave the CDF and the pdf of these special distributions which can be useful in applied survival analysis, their study also estimated the parameters and compared the fitness of these models using real dataset.
Johnson et al. (1994) stated that the use of four-parameter distributions should be suﬃcient for most practical purposes. According to them, “at least three parameters are needed but they doubted any noticeable improvement arising from including a fifth or sixth parameter”.
The Rayleigh distribution is a popular lifetime distribution and it is one of the most important distributions for problems in the field of applied sciences and reliability engineering. However, there has been little or no study on the Odd generalized exponential-Rayleigh distribution since Tahir et al. (2015) proposed their generator. Hence, we intend to study the odd generalized exponential-Rayleigh distribution, some of its properties and its application to real dataset.
1.8 Definition of Terms
1.8.1 Continuous Random Variable
A nondiscrete random variable X is said to be absolutely continuous, or simply continuous, if its distribution function may be represented as
x | |
F (X) = P (X ≤ x) = _{S}_{−∞} f(u)du | (1.8.1) |
for every real number x and evaluated from −∞ to a real number x. while the probability density function (pdf) is defined as;
dF (x)
f(x) = (1.8.2)
dx
and it has the following properties:
- f(x) ≥ 0; ∀x
- _{∫}_{−}^{∞}_{∞} f(x)dx = 1
- P (a < X < b) = _{∫}_{a}^{b} f(x)dx
1.8.2 Moments
This is defined as a statistical technique used to study some of the most important features and characteristics of a random variable such as mean (central tendency measure), variance (dispersion measure), skewness (Sk) and kurtosis (Ku). Also moments of a probability distributions is a collection of descriptive constants that can be used for measuring its properties. Mathematically, it can be define as follows: let X be a continuous random variable, then the r^{th} moment of X about the origin is given in equation (1.8.3)
_{r} | E | _{X}r | ∞ | X^{r}f | x | dx | (1.8.3) |
′ _{=} | ( | ^{)}^{=} S_{−∞} | ( | ) |
where f(x) is any pdf.
And, the r_{th} central moment of X , say _{r} also known as moment about the mean is obtain using equation (1.8.4)
_{r} = E[X − ^{′}_{1}]^{n} | (1.8.4) |
where ^{′}_{1} is the 1^{st} moment about the origin while the variance ( ^{2}) is the 2^{nd} central moment or moment about the mean.
Quantile Function
Quantile function is essentially the inverse of the cumulative distribution function. It is use for calculating the median, measure of location and for simulation of random numbers. Mathematically, it’s defined as
F (x_{q}) = q | (1.8.5) |
where F (x_{q}) is the cumulative distribution function and 0 < q < 1
Skewness
Skewness is a measure of the degree of asymmetry or lack of symmetry of a distribution. The coeﬃcient of skewness is the standardized third central moment of X or moment about the mean and can be obtained using the expression in equation (1.8.5);
= | ( | x | − | ′_{1} | ) | 3 | = | _{3} | |||||||
SK | E | (1.8.6) | |||||||||||||
3 | |||||||||||||||
where _{3 }is the third moment about mean and ^{3} is standardized third deviation.
Kurtosis
Kurtosis is defined as a measure of the degree of peakedness or flatness of a density near the center. It can also be define as the standardized fourth population moment
about the mean, and is given by;
= | ( | x | − | ′_{1} | ) | 4 | = | _{4} | |||||||
KU | E | (1.8.7) | |||||||||||||
4 | |||||||||||||||
where _{3 }is the fourth moment about mean and ^{3} is standardized fourth deviation.
1.8.3 Moment Generating Function
The Moment Generating Function provides an alternative way to analytical result rather than working directly with the pdf. The MGF of a random variable X can be obtain using equation (1.8.7)
M_{x} t | ) = | E | ( | _{e}tx | ∞ | e^{tx}f | ( | x | dx | (1.8.8) |
( | ^{)}^{=} S_{−∞} | ) |
In other words, the MGF generates the moments of X by diﬀerentiation, that is to say, for any real number say j, the j^{th} derivative of M_{x}(t) evaluated at t = 0 is the j^{th} moment ^{′}_{j} of X.
1.8.4 Reliability Analysis
Reliability (Survival) analysis is the process of modeling time-to-event dataset or it is a statistical technique used to describe and quantify time-to-event data. This analysis was originally introduced for the purpose of evaluating the treatment eﬃcacy of fatal condition like cancer. It’s applicability expands to numerous areas of human endeavor including engineering, medicine, sciences, industry and so on.
Survival Function
This is simply define as the probability that a system will survive beyond a specified time. Mathematically it’s defined as
∞
S(x) = P (X > x) = _{S} f(u)du = 1 − F (x) (1.8.9)
x
where F (x) is the CDF of any distribution.
Harzard Function
This is define as the probability per unit time that a case which has survived to the beginning of the respective interval will fail in that interval. Mathematically it’s defined as
f x | 1 | f | ( | x | ) | ||||||||||||
h | x | S | (_{x}) | (1.8.10) | |||||||||||||
f x | ( | ) = | ( ) ^{=} | − | ( | ||||||||||||
F | x | ) | |||||||||||||||
where ( | ) is pdf of any distribution. | ||||||||||||||||
1.8.5 Order Statistics
Order statistics are widely used in many areas of statistical theory and practice, for instance, detection of outlier in statistical quality control processes.
Suppose X_{1}; X_{2}; ; X_{n} is a random sample from a distribution with pdf f(x) and let X_{1}_{n}; X_{2}_{n}; ; X_{i n} denote the corresponding order statistic obtained from this sample. The pdf f_{i n}(x) of the i^{th} order statistic can be express as
n! | |
^{f}i n^{(x)}^{=}_{(i}_{−}_{1)!(n}_{−}_{i)!}^{f(x)F}^{(x)}^{i−1}^{[1}^{−}^{F}^{(x)]}^{n−i} | (1.8.11) |
1.8.6 Maximum Likelihood Method
From a statistical perspective, the method of maximum likelihood estimation is with some exceptions,considered to be the most robust of the parameter estimation techniques. Suppose X_{1}; X_{2}; ; X_{n} is a random sample from a population with pdf f(x; ), where is an unknown parameter to be estimated. The likelihood function, L( ), is defined to be the joint density of the random variables X_{1}; X_{2}; ; X_{n}. That is,
n | f(x_{i}; ) | |
L( ) = _{i}_{1} | (1.8.12) | |
M_{=} |
The sample statistic that maximizes the likelihood function L( ) is known as maximum
^
likelihood estimator of and is denoted as .