Jurnal Karya Asli Lorekan Ahli Matematik Vol. 3 No.1 (2010) Page 39 - 50
Jurnal
Karya Asli Lorekan
Ahli Matematik
© 2010 Jurnal Karya Asli Lorekan Ahli Matematik
Published by Pustaka Aman Press Sdn. Bhd.
Selection between logistic and normal distributions
S. A. Al-Subha, K.Ibrahimb & A. A. Jemainc
a,b,cSchool of Mathematical Sciences, Universiti Kebangsaan Malaysia, Selangor, Malaysia
[email protected] , [email protected] , [email protected]
Abstract : Selection of a correct distribution for a particular set of data is an important issue. When the two distributions have the same characteristics, it is often difficult to discriminate between them. In this paper, we use ratio of the likelihoods for selecting between the logistic and normal distributions for describing a set of data generated using simple random sampling. The parameters for the logistic and normal distributions are estimated by using maximum likelihood, moments, order statistic and L-moment methods. By using Monte Carlo simulations, discriminating between the two distributions is investigated in terms of the probability of correct selection and comparisons are made on the results found based on the different methods of estimation. In general, it is found that the method of maximum likelihood outperform all the other methods.
Abstrak : Pemilihan satu taburan untuk memperihalkan sesuatu set data tertentu merupakan satu isu yang penting.
Bila dua taburan mempunyai ciri yang sama, adalah sukar untuk mendiskriminasi antara kedua-duanya. Di dalam
kertas kerja ini, kami gunakan nisbah kebolehjadian untuk memilih antara taburan logistik dan normal bagi
menerangkan suatu set data yang dijanakan menggunakan pensampelan rawak ringkas. Parameter taburan logistik
dan normal dianggarkan dengan menggunakan kaedah kebolehjadian maksimum, momen, statistic tertib dan L-
momen. Menerusi simulasi Monte Carlo, pendiskriminasi antara dua taburan diselidiki dengan menggunakan
kebarangkalian pemilihan yang betul dan perbandingan dibuat terhadap keputusan yang didapati berdasarkan
kaedah penganggaran yang berlainan. Secara umumnya, didapati kaedah kebolehjadian maksimum adalah lebih
baik dari kaedah lain.
Keywords: Logistic, Normal Distributions.
1. Introduction
The logistic distribution has interesting application in many different fields, such as public
health, graduation of mortality statistics, survival data, income distributions, human population and
biology (see Balakrishnan 1992). When compared to other distribution, including logistic, the normal
distribution is the most widely used family of distributions in statistics and many statistical tests are
based on the assumption of normality. Logistic and normal distributions are quite similar since the
shapes of these two density functions are similar for certain ranges of the parameters.
The problem of testing whether some given data come from one of the two probability
distributions is quite old. Many authors have developed procedures of selection statistics and decision
rules based on the maximized likelihood function for the purpose of discriminating between the two
distributions. Atkinson (1969, 1970), Chen (1980), Chambers and Cox (1967), Cox (1961, 1962),
Dyer (1973) are examples of works which have considered problem of discriminating between two
models. The effect of choosing a wrong model between two models was originally discussed by Cox
(1961). Special attention has been given to the problem of discriminating between the lognormal and
Weibull distributions by Dumonceaux and Antle (1973) and between the lognormal and gamma by
Jackson (1969), discrimination among some parametric models by Prentice (1975) and between the
gamma and Weibull distribution by Bain and Engelhard (1980).
Recently, many authors studied on discrimination between two skewed distributions. Gupta and
Kundu (1999, 2001a, 2001b, 2002) introduced and studied quite extensively generalized exponential
distribution in a series of papers. Also, Gupta and Kundu (2003a) have discussed the closeness of
gamma and generalized exponential distribution while Gupta and Kundu (2003b) have introduced
method to discriminate between Weibull and generalized exponential distributions. Gupta and Kundu
(2004) have discriminated between gamma and the generalized exponential distribution. The readers
Jurnal KALAM Vol. 3, No. 1, Page 39 - 50
40
may refer to Kundu, Gupta and Manglick (2004), Kundu et al. (2005), Strupczewski et al. (2006),
Kundu and Raqab (2007), Kim and Yum (2008) for more studies in discrimination between
distributions. Alzaid and Sultan (2009) discussed the use of the coefficient of skewness as a goodness-
of-fit test to distinguish between the gamma and lognormal distributions.
In this paper, the problem of discriminating between logistic and normal distributions for data
generated using simple random sampling is considered. The ratio of maximized likelihood as
suggested by Gupta and Kundu in a series of their papers is used for discriminating between the two
distributions. The methods of estimation that are considered are maximum likelihood, moment, order
statistic and L-moment. Monte Carlo simulation experiments are conducted for various combinations
of sample sizes, and the performance of the ratio of maximized likelihood procedures is investigated
in terms of the probability of correct selection (PCS) which is the ratio of the number of simulation
experiments in which the procedure selects the true distribution relative to the total number of
simulation runs.
This paper is organized as the followings. In Section 2, two tests of hypotheses are given and
the ratio of the likelihoods method is described. In Section 3, the estimators based on maximum
likelihood (mle), moment (moe), order statistic (ose) and L-moment (lme) are found for logistic and
normal distributions. The algorithm to calculate the PCS of the tests is defined in Section 4.
Comparison of results obtained from the simulation study is shown in Section 5 and finally,
conclusions are stated in Section 6.
2. Ratio of the maximized likelihoods procedures
In this section, the ratio of likelihoods is determined for logistic and normal distributions. The
two tests of hypotheses are given by
0 1: Logistic : Normal,H vs H (1)
and
0 1: Normal : Logistic.H vs H (2)
Suppose 1 2, ,..., nX X X be a random sample of size n from any one of the two distribution
functions. The probability density function of a logistic random variable, denoted by ( , ),Lo is
given by 2
1( ; , ) exp 1 exp , ,L
x xf x x
(3)
where and 0 are the location and scale parameters respectively. The probability density
function of a normal random variable, denoted by 2,N is given as
21 1
( ; , ) exp , ,22
N
xf x x
(4)
where and 0 are the location and scale parameters respectively.
Both logistic and normal distributions are assumed to be effective in analyzing the same set of
data since the shapes of these two density functions are being quite close. It is clear that the shapes of
the probability density function (pdf) and cumulative distribution function (cdf) of these two
distributions are similar for certain range of the parameters. For example, as shown in Figures 1 and 2
the pdf and cdf for (0,0.55) and (0,0.88)Lo N , it is found that these distributions are quite
identical.
Al- Subh et al.
41
Figure 1 Probability density function for a random variable X which follow either
(0,0.55) and (0,0.88)Lo N
Figure 2 Distribution function for a random variable X which follow either
(0,0.55) and (0,0.88)Lo N
Assuming that the data come from 2( , ) or ( , ),Lo N the likelihood functions are given
by
Jurnal KALAM Vol. 3, No. 1, Page 39 - 50
42
1
( , ) ( ; , ),n
L L
i
l f x
(5)
and
1
( , ) ( ; , ),n
N N
i
l f x
(6)
respectively. The logarithm of the likelihood functions in (5) and (6) for 2( , ) or ( , )Lo N are
given by
1 1
( , ) log 2 log 1 exp ,n n
i iL
i i
x xL n
(7)
and
2
1
1( , ) log 2
2
ni
N
i
xL n
(8)
respectively.
The test statistic that is applied for discriminating the two distributions is called ratio of
maximized likelihood, as based on the work of Gupta and Kundu (2004), for example. An
introduction to this approach with several examples was given in Cox (1961, 1962), where the statistic
considered was the log of the ratio of the maximized likelihoods of two separate families of
distributions.
If the method of parameter estimation used is maximum likelihood, then the test statistic is
given by
1
ˆˆ( , )log ,
ˆ ˆ( , )
L mle mle
N mle mle
lT
l
(9)
where ˆˆ( , )mle mle and ˆ ˆ( , )mle mle are the estimators found based on mle for the logistic and
normal distributions respectively. Accordingly, the other test statistics, denoted as 2 3 4, and T T T can
be determined by substitution of the estimators found by moe, ose and lme in equation (9). For
logistic distribution, the estimators are denoted by
ˆ ˆ ˆ ˆˆ ˆ ˆ ˆ( , ), ( , ), ( , ) and ( , )mle mle moe moe ose ose lme lme respectively. For the normal parameters,
the respective estimators are denoted as ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ( , ), ( , ), ( , ) and ( , ).mle mle moe moe ose oseS lme lme
3. Methods of estimation
3.1 Maximum likelihood method
By taking the partial derivatives for the log-likelihood function of the logistic
distribution with respect to and and equating the resulting quantities to zero and with
some algebraic manipulation, we obtain the following equations:
1
exp( , )
0,2
1 exp
i
n
i i
X
L n
X
and
Al- Subh et al.
43
1 1
exp( , ) 1
0.2 2
1 exp
i i
n ni
i i i
X X
XL n
X
(10)
There is no explicit solution for these simultaneous equations. So, for logistic distribution no
closed form can be found for the mles. For the normal distribution, the mle are given by
22
1
1ˆ ˆ ˆ and .
1
n
mle i
i
X Xn
(11)
3.2 Method of moment
The moment estimators for logistic and normal distributions are given by
3ˆˆ and ,moe moeX S
(12)
and 2 2ˆ ˆ and ,moe moeX S (13)
where and X S are the mean and standard deviation for the random sample of size n
respectively.
3.3 Order statistic method
The thp quantile for the logistic distribution is given by
1( ; , ) ( ) log .1
pQ p F p
p
(14)
It is known that the lower, median, upper quartiles, denoted as 1 1 1(.25), (.5), (.25)F F F
respectively and distributional limits, 1 1(0), (1),F F gave a feel for the spread of the
distribution over the axis. Since the interquartile range IQR given by 1 1(0.75) (0.25),IQR F F (15)
is independent of the location parameter , the scale parameter can be estimated using IQR.
Based on (14), if we substitute 0.25 and 0.75,p p corresponding to the first and third
quartiles, namely lower and upper quartile, we will have
1 0.25(0.25) log log 3 ,
1 0.25F
and
1 0.75(0.75) log log 3 ,
1 0.75F
respectively. Thus, the two parameters, i.e. and , can be defined as
1 11(0.75) (0.25) ,
2F F (16)
and
1 11(0.75) (0.25) ,
2 log3F F (17)
respectively. Accordingly, the two estimators of and , denoted as ˆˆ and ,ose ose are
Jurnal KALAM Vol. 3, No. 1, Page 39 - 50
44
1 11 ˆ ˆˆ (0.75) (0.25) ,2
ose F F (18)
and
1 11ˆ ˆ ˆ(0.75) (0.25) .2 log(3)
ose F F (19)
It is known that the thp quantile for the normal distribution is given by
1 1( ; , ) ( ) ,Q p F p p (20)
where 1(.) is the inverse cdf for the standard normal distribution. Based on (20), if we
substitute 0.25 and 0.75,p p corresponding to the first and third quartiles for normal
distribution, we have
1 1(0.25) 0.25 0.674 ,F
and
1 1(0.75) 0.75 0.674 ,F
respectively. Since 1 1(0.75) (0.25) 1.348F F and 1 1(0.75) (0.25) 2 .F F
The two parameter and , can be defined as
1 11(0.75) (0.25) ,
2F F (21)
and
1 11(0.75) (0.25) ,
1.348F F (22)
respectively. Accordingly, the two estimators of and , denoted as ˆ ˆ and ,ose ose are
given by
1 11 ˆ ˆˆ (0.75) (0.25) ,2
ose F F (23)
and
1 11 ˆ ˆˆ (0.75) (0.25) .1.348
ose F F (24)
These estimators can be easily shown to be unbiased.
3.4 L-moment method
The L-moment of a probability distribution, denoted as ,m as explained in Hosking
(1990) is given by
1
:
0
111 , 1, 2,...
mj
m m j m
j
mm
jm
(25)
where :i m is defined as
: : : ( ),i n i n i nE X xf x
(26)
where
-1 -
:
1( ) ( ) 1- ( ) ( ).
( , - 1)
i n i
i nf x F x F x f xB i n i
(27)
Al- Subh et al.
45
Using (25), the first and second L-moments, respectively, are given by
1 1:1 2 2:2 1:2
1 and ,
2
where 1
2:2
0
log ,(2,1) 1
uudu
B u
and
1
1:2
0
log 1- .(1,2) 1
uu du
B u
Hence, 2:2 1:2 2 implies that 2:2 1:2 2
1.
2
Now we calculate the estimated value of 1:2 2:2 and by using (28)
: ; :, , , 1 ,s n d
s d n i i n
i s
M a s d n X s d n
(28)
where 1
.1
i
i n i na
s d s d
Here, ia is the probability of drawing a subsample 1: :,...,d d dX X without replacement from
1: :,..., ,n n nX X as given in David and Nagaraja (2003). For 2d and from (28), the estimated
value of 1:2 2:2 and , denoted as 1:2 1:2 2:2 2:2ˆ ˆ= and M M respectively, are given by
1 1
1:2 : :
1 1
11 2ˆ ,
1 1 2 1 1
2
n n
i n i n
i i
i n iX n i X
n n n
and
1 1
2:2 : 1:
1 1
11 2ˆ X .
1 1 2 2 1
2
n n
i n i n
i i
i n iX i
n n n
Thus,
1 1
2:2 1:2 1: :
1 1
2 2ˆ ˆ
1 ( 1)
n n
i n i n
i i
i X n i Xn n n n
1 1
1: :
1 1
2 X .
1
n n
i n i n
i i
i n i Xn n
(29)
The L-moment estimators, denoted as ˆˆ and ,lme lme respectively, are given by
ˆ ,lme X
and
1 1
2:2 1:2 1: :
1 1
1 1ˆ ˆ ˆ X .2 1
n n
lme i n i n
i i
i n i Xn n
(30)
The first and second L-moments, respectively, for normal distribution are given by
1 1:1 2 2:2 1:2
1 and ,
2
where
Jurnal KALAM Vol. 3, No. 1, Page 39 - 50
46
1
1
2:2
0
12 ( ) 2 ,
2u udu
and 1
1
1:2
0
12 ( )(1 ) 2 ,
2u u du
where
1
1
0
( ) 0.u du
Hence, 2:2 1:2
2
implies that 2:2 1:2 2.
2
By using (29), the L-moment estimators, denoted as ˆ ˆ and ,lme lme respectively, are given by
ˆ ,lme X
and
2 2:2 1:2ˆˆ ˆ ˆ
2lme
1 1
1: :
1 1
X .1
n n
i n i n
i i
i n i Xn n
(31)
The summary of all estimators for logistic and normal distributions are reported in Tables 1
and 2 respectively.
Table 1 Estimators for the two parameters of logistic distribution
Method ̂ ̂
mle No closed form No closed form moe X 3
S
ose 1 11 ˆ ˆ(0.75) (0.25)
2F F
1 11 ˆ ˆ(0.75) (0.25)2 log(3)
F F
lme X
1 1
1: :
1 1
1 X
1
n n
i n i n
i i
i n i Xn n
Table 2 Estimators for the two parameters of normal distribution
Method ̂ ̂
mle X S
moe X S
ose 1 11 ˆ ˆ(0.75) (0.25)
2F F
1 11 ˆ ˆ(0.75) (0.25)1.348
F F
lme X
1 1
1: :
1 1
X1
n n
i n i n
i i
i n i Xn n
4. Algorithm for PCS comparison
Al- Subh et al.
47
In this simulation, without any loss of generality, in assessing the relative performance of the
selection procedure, random samples are generated from (0, 1) and (0, 1).Lo N
To calculate the PCS when 0 ,H Logistic the following algorithm is introduced:
a) Let , 1,...,iX i n be a sample of size n from (0, 1).Lo
b) The parameters ( , ) and ( , ) are estimated by mle, moe, ose or lme.
c) The test statistics 1T are calculated.
d) Check whether 1 0.T
e) The steps (a)-(d) are repeated 10, 000 times to get 1 , 1,...,10000.tT t
f) PCS is calculated by
10,000*
1 1
1
1( 0),
10,000t
t
T I T
where (.)I stands for indicator function.
The same procedure can be repeated for the other test statistics.
5. Simulation results
The results of some numerical experiments are performed and presented in this section. In order
to assess the performances of these procedures, estimates of the PCS can be based on the simulation
results for several different cases using a Monte Carlo simulation of 10,000 runs according to the
algorithm of Section 4. For different values of the sample size, i.e.
6, 15, 36, 60, 90, 120, 150, 180, 210, 240, 270, 300,n the results of PCS under both null
hypotheses are calculated. First, the case when the null distribution is logistic and the alternative
normal are considered and the results are summarized in Table 3. Second, the case when the null
distribution is normal and the alternative logistic is studied and the results found are presented in
Table 4.
Table 3 The values of PCS using Monte Carlo simulation for various sample sizes with 10,000
replications when the null distribution is logistic and the alternative is normal based on
estimators found using mle, moe, ose and lme
n : (0,1)0
H Logistic
mle moe ose lme 6 .263 .193 .310 .171
15 .381 .355 .340 .220
36 .522 .497 .517 .350
60 .605 .595 .550 .471
90 .679 .662 .588 .506
120 .721 .708 .605 .554
150 .764 .743 .640 .604
180 .792 .760 .680 .658
210 .817 .786 .716 .770
240 .835 .801 .750 .792
270 .857 .814 .772 .809
300 .875 .826 .791 .820
Jurnal KALAM Vol. 3, No. 1, Page 39 - 50
48
Table 4 The values of PCS using Monte Carlo simulation for various sample sizes with 10,000
replications when the null distribution is normal and the alternative is logistic based on
estimators found using mle, moe, ose and lme
n : (0,1)0
H Normal
mle
moe
ose lme
6 .581 .613 .605 .620
15 .663 .637 .620 .642
36 .701 .670 .643 .665
60 .730 .707 .666 .687
90 .755 .727 .690 .710
120 .785 .758 .710 .734
150 .805 .794 .730 .765
180 .822 .818 .759 .788
210 .840 .841 .775 .810
240 .867 .865 .793 .829
270 .888 .877 .810 .845
300 .901 .885 .824 .855
Based on the simulation, with 10,000 iterations, as reported in Tables 3 and 4, the following
remarks can be made:
(i) It is observed that the two densities and distributions are quite close and indistinguishable.
(ii) When testing for logistic against normal for the data generated, it is found that PCS is small and
increases as n increases when either mle, moe, or ose is used. PCS is high for the case of lme.
(iii) When testing for normal against logistic for the data generated, it is found that PCS are high and
increases as n increases when either mle, moe, ose or lme is used.
6. Conclusions
Determining a correct distribution for a given set of data is an important issue. When describing
a data set, it is often difficult to choose between two distributions which have the same characteristics.
In this paper, the problem of discriminating between the two symmetric distributions, namely, the
logistic and normal distributions is considered. The statistics based on the logarithm of the ratio of the
maximized likelihoods are considered. The performance of the ratio of maximized likelihood
procedures under simple random sampling is investigated in terms of the PCS which is the ratio of
number of simulation experiments in which the procedure selects the true distribution to the total
number of simulation runs. The probabilities of correct selection obtained are compared for the results
based on different estimators, namely, mle, moe, ose amd lme, using Monte Carlo simulations.
When the null hypothesis is logistic, when either , , or mle moe ose lme is used, the power is
small but gets larger as the sample increases. However, when the null distribution is normal, the
power is high, even for small sample size.
Acknowledgements
The authors would like to thank the Ministry of Higher Education of Malaysia for supporting
this work under the project UKM-ST-06-FRGS 0096-2009.
References
1. Alzaid, A. and Sultan, K.S. 2009. Discriminating between gamma and lognormal distributions with applications.
Journal of King Saud University (Science) 21: 99-108.
Al- Subh et al.
49
2. Atkinson, A. 1969. A test of discriminating between models. Biometrika 56: 337–341.
3. Atkinson, A. 1970. A method for discriminating between models. Journal of the Royal Statistical Society, Ser.B, 32:
323–353 (with discussions).118 R. D. GUPTA AND D. KUNDU
4. Bain, L.J. 1978. Statistical Analysis of Reliability and Life-testing Models. Marcel Dekker, New York.
5. Bain, L. J. and Englehardt, M. 1980. Probability of correct selection of Weibull versus gamma based on likelihood
ratio. Communications in Statistics, Ser. A 9: 375-381.
6. Balakrishnan, N. 1992. Handbook of the logistic distribution. New York, Marcel Dekker, Inc.
7. Chambers, E. A. and Cox, D. R. 1967. Discriminating between alternative binary response models. Biometrika 54:
573–578.
8. Chandra, M., Singpurwalla, N. D. and Stephens, M. A. 1981. Kolmogorov statistics for tests of fit for the extreme
value and weibull distributions. Journal of the American Statistical Association 76(375): 729-731.
9. Chen, W. W. 1980. On the tests of separate families of hypotheses with small sample size. Journal of Statistical
Computations and Simulations 2: 183–187.
10. Cox, D. R. 1961. Tests of separate families of hypotheses. Proceedings of the Fourth Berkeley Symposium in
Mathematical Statistics and Probability, University of California Press, Berkeley: 105–123.
11. Cox, D. R. 1962. Further results on tests of separate families of hypotheses. J. of Royal Statistical Society, Ser. B 24:
406–424.
12. D'Agostine, R. and Stephens, M. 1986. Goodness of fit Techniques. Marcel Dekker Inc., New York
13. Dumonceaux, R. and Antle, C. E. 1973. Discriminating between the log-normal and Weibull distribution.
Technometrics 15: 923–926.
14. Dyer, A. R. 1973. Discrimination procedure for separate families of hypotheses. Journal of the American Statistical
Association 68: 970–974.
15. Gupta, R.D. and Kundu, D. 1999. Generalized exponential distributions. Australian and New Zealand Journal of
Statistics 41: 173-188.
16. Gupta, R.D. and Kundu, D. 2001b. Exponentiated Exponential Distribution; An alternative to gamma or Weibull
distribution. Biometrical Journal 43: 117-130.
17. Gupta, R.D. and Kundu, D. 2001a. Generalized exponential distributions; Different Method of Estimations. Journal of
Statistical Computation and Simulation 69: 315-338.
18. Gupta, R.D. and Kundu, D. (2002). Generalized exponential distributions; Statistical Inferences. Journal of Statistical
Theory and Applications 1: 101-118.
19. Gupta, R.D. and Kundu, D. 2003a. Closeness of gamma and generalized exponential distributions. Commun. Stat.-
Theory Methods 32: 705–721.
20. Gupta, R.D. and Kundu, D. 2003b. Discriminating between Weibull and generalized exponential distributions.
Comput. Stat. Data Anal. 43: 179–196.
21. Gupta, R.D. and Kundu, D. 2004. Discriminating between gamma and generalized exponential distributions. Journal of
Statistical Computations and Simulations 74: 107–121.
22. Jackson, O.A.Y. 1969. Fitting a gamma or lognormal distribution to fibre-diameter measurements on wool tops.
Journal of Applied Statistics 8: 70–75.
23. Kim, J. S. and Yum, B. J. 2008. Selection between Weibull and lognormal distributions: A comparative simulation
study. Computational Statistics and Data Analysis 53: 477-485.
24. Kundu, D. and Raqab, M. Z. 2007. Discriminating between the generalized Rayleigh and Log-normal distribution.
Statistics 41(6): 505-515.
Jurnal KALAM Vol. 3, No. 1, Page 39 - 50
50
25. Kundu, D. and Manglick, A. 2004. Discriminating between the Weibull and Log-normal distributions. Navel Research
Logistics 51 (6):893-905.
26. Kundu, D. Gupta R. D. and Manglick, A. 2005. Discriminating between Log-normal and generalized exponential
distributions. Journal of Statistical planning and inference 127: 213-227.
27. Mudholkar, G.S. & Hutson, A.D. 1998. LQ-moments: Analogs of L-moments. Journal of Statistical Planning and
Inference 71: 191-208.
28. Prentice, R.L. 1975. Discrimination among Some Parametric Models. Biometrika 62 (3): 607-614.
29. Strupczewski, W.G., Mitosek, H.T., Kochanek, K., Singh, V.P. and Weglarczyk, S. 2006. Probability of correct
selection from lognormal and convective diffusion models based on the likelihood ratio. Stochastic Environmental
Research and Risk Assessment, 20(3): 152-163.