Download - Selection between logistic and normal distributions fileJurnal KALAM Vol. 3, No. 1, Page 39 - 50 40 may refer to Kundu, Gupta and Manglick (2004), Kundu et al. (2005), Strupczewski

Jurnal Karya Asli Lorekan Ahli Matematik Vol. 3 No.1 (2010) Page 39 - 50

Jurnal

Karya Asli Lorekan

Ahli Matematik

© 2010 Jurnal Karya Asli Lorekan Ahli Matematik

Published by Pustaka Aman Press Sdn. Bhd.

Selection between logistic and normal distributions

S. A. Al-Subha, K.Ibrahimb & A. A. Jemainc

a,b,cSchool of Mathematical Sciences, Universiti Kebangsaan Malaysia, Selangor, Malaysia

[email protected] , [email protected] , [email protected]

Abstract : Selection of a correct distribution for a particular set of data is an important issue. When the two distributions have the same characteristics, it is often difficult to discriminate between them. In this paper, we use ratio of the likelihoods for selecting between the logistic and normal distributions for describing a set of data generated using simple random sampling. The parameters for the logistic and normal distributions are estimated by using maximum likelihood, moments, order statistic and L-moment methods. By using Monte Carlo simulations, discriminating between the two distributions is investigated in terms of the probability of correct selection and comparisons are made on the results found based on the different methods of estimation. In general, it is found that the method of maximum likelihood outperform all the other methods.

Abstrak : Pemilihan satu taburan untuk memperihalkan sesuatu set data tertentu merupakan satu isu yang penting.

Bila dua taburan mempunyai ciri yang sama, adalah sukar untuk mendiskriminasi antara kedua-duanya. Di dalam

kertas kerja ini, kami gunakan nisbah kebolehjadian untuk memilih antara taburan logistik dan normal bagi

menerangkan suatu set data yang dijanakan menggunakan pensampelan rawak ringkas. Parameter taburan logistik

dan normal dianggarkan dengan menggunakan kaedah kebolehjadian maksimum, momen, statistic tertib dan L-

momen. Menerusi simulasi Monte Carlo, pendiskriminasi antara dua taburan diselidiki dengan menggunakan

kebarangkalian pemilihan yang betul dan perbandingan dibuat terhadap keputusan yang didapati berdasarkan

kaedah penganggaran yang berlainan. Secara umumnya, didapati kaedah kebolehjadian maksimum adalah lebih

baik dari kaedah lain.

Keywords: Logistic, Normal Distributions.

1. Introduction

The logistic distribution has interesting application in many different fields, such as public

health, graduation of mortality statistics, survival data, income distributions, human population and

biology (see Balakrishnan 1992). When compared to other distribution, including logistic, the normal

distribution is the most widely used family of distributions in statistics and many statistical tests are

based on the assumption of normality. Logistic and normal distributions are quite similar since the

shapes of these two density functions are similar for certain ranges of the parameters.

The problem of testing whether some given data come from one of the two probability

distributions is quite old. Many authors have developed procedures of selection statistics and decision

rules based on the maximized likelihood function for the purpose of discriminating between the two

distributions. Atkinson (1969, 1970), Chen (1980), Chambers and Cox (1967), Cox (1961, 1962),

Dyer (1973) are examples of works which have considered problem of discriminating between two

models. The effect of choosing a wrong model between two models was originally discussed by Cox

(1961). Special attention has been given to the problem of discriminating between the lognormal and

Weibull distributions by Dumonceaux and Antle (1973) and between the lognormal and gamma by

Jackson (1969), discrimination among some parametric models by Prentice (1975) and between the

gamma and Weibull distribution by Bain and Engelhard (1980).

Recently, many authors studied on discrimination between two skewed distributions. Gupta and

Kundu (1999, 2001a, 2001b, 2002) introduced and studied quite extensively generalized exponential

distribution in a series of papers. Also, Gupta and Kundu (2003a) have discussed the closeness of

gamma and generalized exponential distribution while Gupta and Kundu (2003b) have introduced

method to discriminate between Weibull and generalized exponential distributions. Gupta and Kundu

(2004) have discriminated between gamma and the generalized exponential distribution. The readers

mailto:[email protected]



Jurnal KALAM Vol. 3, No. 1, Page 39 - 50

40

may refer to Kundu, Gupta and Manglick (2004), Kundu et al. (2005), Strupczewski et al. (2006),

Kundu and Raqab (2007), Kim and Yum (2008) for more studies in discrimination between

distributions. Alzaid and Sultan (2009) discussed the use of the coefficient of skewness as a goodness-

of-fit test to distinguish between the gamma and lognormal distributions.

In this paper, the problem of discriminating between logistic and normal distributions for data

generated using simple random sampling is considered. The ratio of maximized likelihood as

suggested by Gupta and Kundu in a series of their papers is used for discriminating between the two

distributions. The methods of estimation that are considered are maximum likelihood, moment, order

statistic and L-moment. Monte Carlo simulation experiments are conducted for various combinations

of sample sizes, and the performance of the ratio of maximized likelihood procedures is investigated

in terms of the probability of correct selection (PCS) which is the ratio of the number of simulation

experiments in which the procedure selects the true distribution relative to the total number of

simulation runs.

This paper is organized as the followings. In Section 2, two tests of hypotheses are given and

the ratio of the likelihoods method is described. In Section 3, the estimators based on maximum

likelihood (mle), moment (moe), order statistic (ose) and L-moment (lme) are found for logistic and

normal distributions. The algorithm to calculate the PCS of the tests is defined in Section 4.

Comparison of results obtained from the simulation study is shown in Section 5 and finally,

conclusions are stated in Section 6.

2. Ratio of the maximized likelihoods procedures

In this section, the ratio of likelihoods is determined for logistic and normal distributions. The

two tests of hypotheses are given by

0 1: Logistic : Normal,H vs H (1)

and

0 1: Normal : Logistic.H vs H (2)

Suppose 1 2, ,..., nX X X be a random sample of size n from any one of the two distribution

functions. The probability density function of a logistic random variable, denoted by ( , ),Lo is

given by 2

1( ; , ) exp 1 exp , ,L

x xf x x

(3)

where and 0 are the location and scale parameters respectively. The probability density

function of a normal random variable, denoted by 2,N is given as

21 1

( ; , ) exp , ,22

N

xf x x

(4)

where and 0 are the location and scale parameters respectively.

Both logistic and normal distributions are assumed to be effective in analyzing the same set of

data since the shapes of these two density functions are being quite close. It is clear that the shapes of

the probability density function (pdf) and cumulative distribution function (cdf) of these two

distributions are similar for certain range of the parameters. For example, as shown in Figures 1 and 2

the pdf and cdf for (0,0.55) and (0,0.88)Lo N , it is found that these distributions are quite

identical.

Al- Subh et al.

41

Figure 1 Probability density function for a random variable X which follow either

(0,0.55) and (0,0.88)Lo N

Figure 2 Distribution function for a random variable X which follow either

(0,0.55) and (0,0.88)Lo N

Assuming that the data come from 2( , ) or ( , ),Lo N the likelihood functions are given

by


42

1

( , ) ( ; , ),n

L L

i

l f x

(5)

and

1

( , ) ( ; , ),n

N N

i

l f x

(6)

respectively. The logarithm of the likelihood functions in (5) and (6) for 2( , ) or ( , )Lo N are

given by

1 1

( , ) log 2 log 1 exp ,n n

i iL

i i

x xL n

(7)

and

2

1

1( , ) log 2

2

ni

N

i

xL n

(8)

respectively.

The test statistic that is applied for discriminating the two distributions is called ratio of

maximized likelihood, as based on the work of Gupta and Kundu (2004), for example. An

introduction to this approach with several examples was given in Cox (1961, 1962), where the statistic

considered was the log of the ratio of the maximized likelihoods of two separate families of

distributions.

If the method of parameter estimation used is maximum likelihood, then the test statistic is

given by

1

ˆˆ( , )log ,

ˆ ˆ( , )

L mle mle

N mle mle

lT

l

(9)

where ˆˆ( , )mle mle and ˆ ˆ( , )mle mle are the estimators found based on mle for the logistic and

normal distributions respectively. Accordingly, the other test statistics, denoted as 2 3 4, and T T T can

be determined by substitution of the estimators found by moe, ose and lme in equation (9). For

logistic distribution, the estimators are denoted by

ˆ ˆ ˆ ˆˆ ˆ ˆ ˆ( , ), ( , ), ( , ) and ( , )mle mle moe moe ose ose lme lme respectively. For the normal parameters,

the respective estimators are denoted as ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ( , ), ( , ), ( , ) and ( , ).mle mle moe moe ose oseS lme lme

3. Methods of estimation

3.1 Maximum likelihood method

By taking the partial derivatives for the log-likelihood function of the logistic

distribution with respect to and and equating the resulting quantities to zero and with

some algebraic manipulation, we obtain the following equations:

1

exp( , )

0,2

1 exp

i

n

i i

X

L n

X

and

Al- Subh et al.

43

1 1

exp( , ) 1

0.2 2

1 exp

i i

n ni

i i i

X X

XL n

X

(10)

There is no explicit solution for these simultaneous equations. So, for logistic distribution no

closed form can be found for the mles. For the normal distribution, the mle are given by

22

1

1ˆ ˆ ˆ and .

1

n

mle i

i

X Xn

(11)

3.2 Method of moment

The moment estimators for logistic and normal distributions are given by

3ˆˆ and ,moe moeX S

(12)

and 2 2ˆ ˆ and ,moe moeX S (13)

where and X S are the mean and standard deviation for the random sample of size n

respectively.

3.3 Order statistic method

The thp quantile for the logistic distribution is given by

1( ; , ) ( ) log .1

pQ p F p

p

(14)

It is known that the lower, median, upper quartiles, denoted as 1 1 1(.25), (.5), (.25)F F F

respectively and distributional limits, 1 1(0), (1),F F gave a feel for the spread of the

distribution over the axis. Since the interquartile range IQR given by 1 1(0.75) (0.25),IQR F F (15)

is independent of the location parameter , the scale parameter can be estimated using IQR.

Based on (14), if we substitute 0.25 and 0.75,p p corresponding to the first and third

quartiles, namely lower and upper quartile, we will have

1 0.25(0.25) log log 3 ,

1 0.25F

and

1 0.75(0.75) log log 3 ,

1 0.75F

respectively. Thus, the two parameters, i.e. and , can be defined as

1 11(0.75) (0.25) ,

2F F (16)

and

1 11(0.75) (0.25) ,

2 log3F F (17)

respectively. Accordingly, the two estimators of and , denoted as ˆˆ and ,ose ose are


44

1 11 ˆ ˆˆ (0.75) (0.25) ,2

ose F F (18)

and

1 11ˆ ˆ ˆ(0.75) (0.25) .2 log(3)

ose F F (19)

It is known that the thp quantile for the normal distribution is given by

1 1( ; , ) ( ) ,Q p F p p (20)

where 1(.) is the inverse cdf for the standard normal distribution. Based on (20), if we

substitute 0.25 and 0.75,p p corresponding to the first and third quartiles for normal

distribution, we have

1 1(0.25) 0.25 0.674 ,F

and

1 1(0.75) 0.75 0.674 ,F

respectively. Since 1 1(0.75) (0.25) 1.348F F and 1 1(0.75) (0.25) 2 .F F

The two parameter and , can be defined as

1 11(0.75) (0.25) ,

2F F (21)

and

1 11(0.75) (0.25) ,

1.348F F (22)

respectively. Accordingly, the two estimators of and , denoted as ˆ ˆ and ,ose ose are

given by

1 11 ˆ ˆˆ (0.75) (0.25) ,2

ose F F (23)

and

1 11 ˆ ˆˆ (0.75) (0.25) .1.348

ose F F (24)

These estimators can be easily shown to be unbiased.

3.4 L-moment method

The L-moment of a probability distribution, denoted as ,m as explained in Hosking

(1990) is given by

1

:

0

111 , 1, 2,...

mj

m m j m

j

mm

jm

(25)

where :i m is defined as

: : : ( ),i n i n i nE X xf x

(26)

where

-1 -

:

1( ) ( ) 1- ( ) ( ).

( , - 1)

i n i

i nf x F x F x f xB i n i

(27)

Al- Subh et al.

45

Using (25), the first and second L-moments, respectively, are given by

1 1:1 2 2:2 1:2

1 and ,

2

where 1

2:2

0

log ,(2,1) 1

uudu

B u

and

1

1:2

0

log 1- .(1,2) 1

uu du

B u

Hence, 2:2 1:2 2 implies that 2:2 1:2 2

1.

2

Now we calculate the estimated value of 1:2 2:2 and by using (28)

: ; :, , , 1 ,s n d

s d n i i n

i s

M a s d n X s d n

(28)

where 1

.1

i

i n i na

s d s d

Here, ia is the probability of drawing a subsample 1: :,...,d d dX X without replacement from

1: :,..., ,n n nX X as given in David and Nagaraja (2003). For 2d and from (28), the estimated

value of 1:2 2:2 and , denoted as 1:2 1:2 2:2 2:2ˆ ˆ= and M M respectively, are given by

1 1

1:2 : :

1 1

11 2ˆ ,

1 1 2 1 1

2

n n

i n i n

i i

i n iX n i X

n n n

and

1 1

2:2 : 1:

1 1

11 2ˆ X .

1 1 2 2 1

2

n n

i n i n

i i

i n iX i

n n n

Thus,

1 1

2:2 1:2 1: :

1 1

2 2ˆ ˆ

1 ( 1)

n n

i n i n

i i

i X n i Xn n n n

1 1

1: :

1 1

2 X .

1

n n

i n i n

i i

i n i Xn n

(29)

The L-moment estimators, denoted as ˆˆ and ,lme lme respectively, are given by

ˆ ,lme X

and

1 1

2:2 1:2 1: :

1 1

1 1ˆ ˆ ˆ X .2 1

n n

lme i n i n

i i

i n i Xn n

(30)

The first and second L-moments, respectively, for normal distribution are given by

1 1:1 2 2:2 1:2

1 and ,

2

where


46

1

1

2:2

0

12 ( ) 2 ,

2u udu

and 1

1

1:2

0

12 ( )(1 ) 2 ,

2u u du

where

1

1

0

( ) 0.u du

Hence, 2:2 1:2

2

implies that 2:2 1:2 2.

2

By using (29), the L-moment estimators, denoted as ˆ ˆ and ,lme lme respectively, are given by

ˆ ,lme X

and

2 2:2 1:2ˆˆ ˆ ˆ

2lme

1 1

1: :

1 1

X .1

n n

i n i n

i i

i n i Xn n

(31)

The summary of all estimators for logistic and normal distributions are reported in Tables 1

and 2 respectively.

Table 1 Estimators for the two parameters of logistic distribution

Method ̂ ̂

mle No closed form No closed form moe X 3

S

ose 1 11 ˆ ˆ(0.75) (0.25)

2F F

1 11 ˆ ˆ(0.75) (0.25)2 log(3)

F F

lme X

1 1

1: :

1 1

1 X

1

n n

i n i n

i i

i n i Xn n

Table 2 Estimators for the two parameters of normal distribution

Method ̂ ̂

mle X S

moe X S

ose 1 11 ˆ ˆ(0.75) (0.25)

2F F

1 11 ˆ ˆ(0.75) (0.25)1.348

F F

lme X

1 1

1: :

1 1

X1

n n

i n i n

i i

i n i Xn n

4. Algorithm for PCS comparison

Al- Subh et al.

47

In this simulation, without any loss of generality, in assessing the relative performance of the

selection procedure, random samples are generated from (0, 1) and (0, 1).Lo N

To calculate the PCS when 0 ,H Logistic the following algorithm is introduced:

a) Let , 1,...,iX i n be a sample of size n from (0, 1).Lo

b) The parameters ( , ) and ( , ) are estimated by mle, moe, ose or lme.

c) The test statistics 1T are calculated.

d) Check whether 1 0.T

e) The steps (a)-(d) are repeated 10, 000 times to get 1 , 1,...,10000.tT t

f) PCS is calculated by

10,000*

1 1

1

1( 0),

10,000t

t

T I T

where (.)I stands for indicator function.

The same procedure can be repeated for the other test statistics.

5. Simulation results

The results of some numerical experiments are performed and presented in this section. In order

to assess the performances of these procedures, estimates of the PCS can be based on the simulation

results for several different cases using a Monte Carlo simulation of 10,000 runs according to the

algorithm of Section 4. For different values of the sample size, i.e.

6, 15, 36, 60, 90, 120, 150, 180, 210, 240, 270, 300,n the results of PCS under both null

hypotheses are calculated. First, the case when the null distribution is logistic and the alternative

normal are considered and the results are summarized in Table 3. Second, the case when the null

distribution is normal and the alternative logistic is studied and the results found are presented in

Table 4.

Table 3 The values of PCS using Monte Carlo simulation for various sample sizes with 10,000

replications when the null distribution is logistic and the alternative is normal based on

estimators found using mle, moe, ose and lme

n : (0,1)0

H Logistic

mle moe ose lme 6 .263 .193 .310 .171

15 .381 .355 .340 .220

36 .522 .497 .517 .350

60 .605 .595 .550 .471

90 .679 .662 .588 .506

120 .721 .708 .605 .554

150 .764 .743 .640 .604

180 .792 .760 .680 .658

210 .817 .786 .716 .770

240 .835 .801 .750 .792

270 .857 .814 .772 .809

300 .875 .826 .791 .820


48

Table 4 The values of PCS using Monte Carlo simulation for various sample sizes with 10,000

replications when the null distribution is normal and the alternative is logistic based on

estimators found using mle, moe, ose and lme

n : (0,1)0

H Normal

mle

moe

ose lme

6 .581 .613 .605 .620

15 .663 .637 .620 .642

36 .701 .670 .643 .665

60 .730 .707 .666 .687

90 .755 .727 .690 .710

120 .785 .758 .710 .734

150 .805 .794 .730 .765

180 .822 .818 .759 .788

210 .840 .841 .775 .810

240 .867 .865 .793 .829

270 .888 .877 .810 .845

300 .901 .885 .824 .855

Based on the simulation, with 10,000 iterations, as reported in Tables 3 and 4, the following

remarks can be made:

(i) It is observed that the two densities and distributions are quite close and indistinguishable.

(ii) When testing for logistic against normal for the data generated, it is found that PCS is small and

increases as n increases when either mle, moe, or ose is used. PCS is high for the case of lme.

(iii) When testing for normal against logistic for the data generated, it is found that PCS are high and

increases as n increases when either mle, moe, ose or lme is used.

6. Conclusions

Determining a correct distribution for a given set of data is an important issue. When describing

a data set, it is often difficult to choose between two distributions which have the same characteristics.

In this paper, the problem of discriminating between the two symmetric distributions, namely, the

logistic and normal distributions is considered. The statistics based on the logarithm of the ratio of the

maximized likelihoods are considered. The performance of the ratio of maximized likelihood

procedures under simple random sampling is investigated in terms of the PCS which is the ratio of

number of simulation experiments in which the procedure selects the true distribution to the total

number of simulation runs. The probabilities of correct selection obtained are compared for the results

based on different estimators, namely, mle, moe, ose amd lme, using Monte Carlo simulations.

When the null hypothesis is logistic, when either , , or mle moe ose lme is used, the power is

small but gets larger as the sample increases. However, when the null distribution is normal, the

power is high, even for small sample size.

Acknowledgements

The authors would like to thank the Ministry of Higher Education of Malaysia for supporting

this work under the project UKM-ST-06-FRGS 0096-2009.

References

1. Alzaid, A. and Sultan, K.S. 2009. Discriminating between gamma and lognormal distributions with applications.

Journal of King Saud University (Science) 21: 99-108.

Al- Subh et al.

49

2. Atkinson, A. 1969. A test of discriminating between models. Biometrika 56: 337–341.

3. Atkinson, A. 1970. A method for discriminating between models. Journal of the Royal Statistical Society, Ser.B, 32:

323–353 (with discussions).118 R. D. GUPTA AND D. KUNDU

4. Bain, L.J. 1978. Statistical Analysis of Reliability and Life-testing Models. Marcel Dekker, New York.

5. Bain, L. J. and Englehardt, M. 1980. Probability of correct selection of Weibull versus gamma based on likelihood

ratio. Communications in Statistics, Ser. A 9: 375-381.

6. Balakrishnan, N. 1992. Handbook of the logistic distribution. New York, Marcel Dekker, Inc.

7. Chambers, E. A. and Cox, D. R. 1967. Discriminating between alternative binary response models. Biometrika 54:

573–578.

8. Chandra, M., Singpurwalla, N. D. and Stephens, M. A. 1981. Kolmogorov statistics for tests of fit for the extreme

value and weibull distributions. Journal of the American Statistical Association 76(375): 729-731.

9. Chen, W. W. 1980. On the tests of separate families of hypotheses with small sample size. Journal of Statistical

Computations and Simulations 2: 183–187.

10. Cox, D. R. 1961. Tests of separate families of hypotheses. Proceedings of the Fourth Berkeley Symposium in

Mathematical Statistics and Probability, University of California Press, Berkeley: 105–123.

11. Cox, D. R. 1962. Further results on tests of separate families of hypotheses. J. of Royal Statistical Society, Ser. B 24:

406–424.

12. D'Agostine, R. and Stephens, M. 1986. Goodness of fit Techniques. Marcel Dekker Inc., New York

13. Dumonceaux, R. and Antle, C. E. 1973. Discriminating between the log-normal and Weibull distribution.

Technometrics 15: 923–926.

14. Dyer, A. R. 1973. Discrimination procedure for separate families of hypotheses. Journal of the American Statistical

Association 68: 970–974.

15. Gupta, R.D. and Kundu, D. 1999. Generalized exponential distributions. Australian and New Zealand Journal of

Statistics 41: 173-188.

16. Gupta, R.D. and Kundu, D. 2001b. Exponentiated Exponential Distribution; An alternative to gamma or Weibull

distribution. Biometrical Journal 43: 117-130.

17. Gupta, R.D. and Kundu, D. 2001a. Generalized exponential distributions; Different Method of Estimations. Journal of

Statistical Computation and Simulation 69: 315-338.

18. Gupta, R.D. and Kundu, D. (2002). Generalized exponential distributions; Statistical Inferences. Journal of Statistical

Theory and Applications 1: 101-118.

19. Gupta, R.D. and Kundu, D. 2003a. Closeness of gamma and generalized exponential distributions. Commun. Stat.-

Theory Methods 32: 705–721.

20. Gupta, R.D. and Kundu, D. 2003b. Discriminating between Weibull and generalized exponential distributions.

Comput. Stat. Data Anal. 43: 179–196.

21. Gupta, R.D. and Kundu, D. 2004. Discriminating between gamma and generalized exponential distributions. Journal of

Statistical Computations and Simulations 74: 107–121.

22. Jackson, O.A.Y. 1969. Fitting a gamma or lognormal distribution to fibre-diameter measurements on wool tops.

Journal of Applied Statistics 8: 70–75.

23. Kim, J. S. and Yum, B. J. 2008. Selection between Weibull and lognormal distributions: A comparative simulation

study. Computational Statistics and Data Analysis 53: 477-485.

24. Kundu, D. and Raqab, M. Z. 2007. Discriminating between the generalized Rayleigh and Log-normal distribution.

Statistics 41(6): 505-515.


50

25. Kundu, D. and Manglick, A. 2004. Discriminating between the Weibull and Log-normal distributions. Navel Research

Logistics 51 (6):893-905.

26. Kundu, D. Gupta R. D. and Manglick, A. 2005. Discriminating between Log-normal and generalized exponential

distributions. Journal of Statistical planning and inference 127: 213-227.

27. Mudholkar, G.S. & Hutson, A.D. 1998. LQ-moments: Analogs of L-moments. Journal of Statistical Planning and

Inference 71: 191-208.

28. Prentice, R.L. 1975. Discrimination among Some Parametric Models. Biometrika 62 (3): 607-614.

29. Strupczewski, W.G., Mitosek, H.T., Kochanek, K., Singh, V.P. and Weglarczyk, S. 2006. Probability of correct

selection from lognormal and convective diffusion models based on the likelihood ratio. Stochastic Environmental

Research and Risk Assessment, 20(3): 152-163.

Download - Selection between logistic and normal distributions fileJurnal KALAM Vol. 3, No. 1, Page 39 - 50 40 may refer to Kundu, Gupta and Manglick (2004), Kundu et al. (2005), Strupczewski

Top Related