shahid lecture-4-mkag1273

73
MAL1303: STATISTICAL HYDROLOGY Hypothesis Test Dr. Shamsuddin Shahid Department of Hydraulics and Hydrology Faculty of Civil Engineering, Universiti Teknologi Malaysia Room No.: M46-332; Phone: 07-5531624; Mobile: 0182051586 Email: [email protected] 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Upload: nchakori

Post on 18-Jan-2017

262 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Shahid Lecture-4-MKAG1273

MAL1303: STATISTICAL HYDROLOGY

Hypothesis TestDr. Shamsuddin Shahid

Department of Hydraulics and HydrologyFaculty of Civil Engineering, Universiti Teknologi Malaysia

Room No.: M46-332; Phone: 07-5531624; Mobile: 0182051586 Email: [email protected]

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 2: Shahid Lecture-4-MKAG1273

How can we solve it?

Groundwater depth (m)data is collected from twoaquifer namely X and Y. Wewant to know isgroundwater depth is bothaquifers are same or not.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 3: Shahid Lecture-4-MKAG1273

How can we solve it?

After using a new technique,groundwater yield hasincreased significantly. Howcan we prove it.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 4: Shahid Lecture-4-MKAG1273

How can we solve it?

Environmental activist claimthat after introduction offertilizer based agriculturegroundwater quality of the areahas been deteriorated. Is itpossible to prove?

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 5: Shahid Lecture-4-MKAG1273

Is it the solution?

Sixteen (16) river dischargedata (randomly selected) oftwo rivers are collected. Fromthe mean of the dischargedata it is clear that River-Bhas higher dischargecompared to River-A. It ispossible to say discharge ofRiver-B is higher than River-A?

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 6: Shahid Lecture-4-MKAG1273

Interval of Mean Discharge

For River-A at 95% level of confidence:30.2 A 215.5

For River-B at 95% level of confidence:60.4 B 190.7

River-A and River-B can have same mean discharge value.

Is it the solution?

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 7: Shahid Lecture-4-MKAG1273

One tailed Test:Rejection region forHa: 520 when a .025

Two tailed Test:Rejection region forHa: 520 when a .025

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 8: Shahid Lecture-4-MKAG1273

Comparing two sets of data

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 9: Shahid Lecture-4-MKAG1273

Comparing two sets of data

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 10: Shahid Lecture-4-MKAG1273

Hypothesis Tests

One important use of hypothesis tests is to evaluate andcompare groups of data. Statistical tests are the mostquantitative ways to determine whether hypotheses can besubstantiated, or whether they must be modified orrejected outright.

Hypothesis tests have at least two advantages over educatedopinion:

1. They insure that every analyst of a data set using thesame methods will arrive at the same result.

2. They present a measure of the strength of the evidence(the p-value).

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 11: Shahid Lecture-4-MKAG1273

1) Choose the appropriate test.2) Establish the null and alternate hypotheses.3) Decide on an acceptable error rate α.4) Compute the test statistic from the data.5) Compute the p-value.6) Reject the null hypothesis if p ≤ α.

Structure of Hypothesis Tests

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 12: Shahid Lecture-4-MKAG1273

Selection of Appropriate Test

There are a larger number of hypothesis tests. They are classified based on

1. The measurement scales of the data2. Distribution of the data

If the measurement scales are interval/ratio and data distribution isnormal, we use parametric hypothesis tests

If the measurement scales are not interval/ration (such as ordinal orcategorical) or event interval/ratio but not normally distribution,then we use non-parametric hypothesis tests.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 13: Shahid Lecture-4-MKAG1273

Null Hypothesis and Alternative Hypothesis

The 'null' often refers to the common view of something, while thealternative hypothesis is what the researcher really thinks is the causeof a phenomenon. The null hypothesis is a hypothesis which theresearcher tries to disprove, reject or nullify.

The null hypothesis, denoted as H0

The alternative hypothesis, denoted as Ha

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 14: Shahid Lecture-4-MKAG1273

Want to test mean can be 190?

Ho: = 190 when =0.05 [Null hypothesis: mean value can be 190]Ha: 190 when =0.05 [Alternative hypothesis: mean value can not be190]

Comparing two population means, µ1 and µ2:

Null Hypothesis, H0: µ1 = µ2.

The alternative hypothesis, H1: µ1 ≠ µ2 (two-tailed t test),

H1: µ1 < µ2 (one tailed t test),orH1: µ1 > µ2 (one-tailed t test).

Example: Null and Alternative Hypothesis

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 15: Shahid Lecture-4-MKAG1273

1) Choose the appropriate test.2) Establish the null and alternate hypotheses.3) Decide on an acceptable error rate α.4) Compute the test statistic from the data.5) Compute the p-value.6) Reject the null hypothesis if α p.

Structure of Hypothesis Tests

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 16: Shahid Lecture-4-MKAG1273

Permiability of groundwater is found to vary very widely in an area. Onehundred (n=100) permiability measurements are done in an area.Calculated mean of permiability of 100 measurements is 190. For someengineering purpose we need to know whether groundwater permiabilityin the area can have a mean value of 180 or not? We want to determine itat 95% level of confidence.

Ho: = 190 when =0.05 [Null hypothesis: mean value can be 190]

Ha: 190 when =0.05 [Alternative hypothesis: mean value can not be 190]

A Simple Example

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 17: Shahid Lecture-4-MKAG1273

A Simple Example

Accepted Region=

Result: 180 can not be the mean permeability in the region

At 95% level of confidence:

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 18: Shahid Lecture-4-MKAG1273

Comparing Two sets of Data: Student t-test

Underlying assumptions made in using the t test to comparetwo population means:

1. The underlying distributions for both populations arenormal.

2. The variances of the two populations are approximatelyequal:

s1 = s2

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 19: Shahid Lecture-4-MKAG1273

Null HypothesisThe null hypothesis, denoted as H0, is expressed as follows for thet-test comparing two population means, µ1 and µ2:

H0: µ1 = µ2.

Alternative HypothesisThe alternative hypothesis, denoted as H1, is expressed as one ofthe following for the t test comparing two population means, µ1 andµ2:

H1: µ1 ≠ µ2 (two-tailed t test),

H1: µ1 < µ2 (one tailed t test),orH1: µ1 > µ2 (one-tailed t test).

Null Hypothesis

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 20: Shahid Lecture-4-MKAG1273

Student t-test: Comparing two sets of data

Standard Error in Mean

t-statistic estimated using:

Where,n1 is the number of xi observations, n2 is the number of yiobservations,Sx

2 is the sample variance of xi , Sy2 is the sample variance of yi,

x is the sample average for xi , and y is the sample average for yi

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 21: Shahid Lecture-4-MKAG1273

1. Once the t-statistic has been computed, we can compare ourestimated t value to critical t values given in a table for the tdistribution.

2. If estimated t value is greater than the critical t value entry in the ttable associated with a significance level of α (one-sided t test) orα/2 (two-sided t test) we can reject the null hypothesis.

3. Thus, we compare our t value to the t distribution table entry for:

t(α, n1 + n2 − 2) (one-sided)or

t(α/2, n1 + n2 − 2) (two-sided)

where α is the level of significance (equal to 1 – level ofconfidence), and n1 and n2 are the number of samples from eachof the two populations being compared.

Making Decision

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 22: Shahid Lecture-4-MKAG1273

Student t-test: ExampleGroundwater samples are from near aunderground mining area before thestarting mining and after mining are givenbelow. It is anticipated by many scientiststhat increasing concentration of Chemical-Xin groundwater due to the mining. Is it true?

Null Hypothesis, H0: µ1 = µ2[No change in groundwater quality]

Alternative Hypothesis, H1: µ1 ≠ µ2[Groundwater quality has changed]

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 23: Shahid Lecture-4-MKAG1273

Student t-test: Example

t(calculated) = 0.7968

Degree of freedom= n1 + n2 -2= 16 + 14 – 2 = 28

At Alpha = 0.05t(critical) = t(0.025, 28) = 2.3685

t(calculated) < t(critical)

Decision: Null hypothesis can not be rejected at 95% level of confidence.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 24: Shahid Lecture-4-MKAG1273

ANalysis Of VAriance (ANOVA)

Analysis of variance (ANOVA) is a method for testing the hypothesisthat there is no difference between two or more population means(usually at least three).

Why t-test cannot be applied?

• t-test, which is based on the standard error of the differencebetween two means, can only be used to test differencesbetween two means

• With more than two means, could compare each mean witheach other mean using t-tests. Conducting multiple t-tests canlead to error and is NOT RECOMMENDED

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 25: Shahid Lecture-4-MKAG1273

Three groups tightly spread about their respective means, the variabilitywithin each group is relatively small.

Three groups have the same means as in previous figure but thevariability within each group is much larger.

ANOVA examines the difference between the groups as well as thedifference within a group.

Analysis of Variance (ANOVA)

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 26: Shahid Lecture-4-MKAG1273

Assumptions of ANOVA

1. The observations are sampled independently, the groupsunder consideration are Independent. Selection of onesample has no effect on another

2. Each of the populations is Normally distributed with thesame variance (homogeneity of variance)

3. Population variances are equal

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 27: Shahid Lecture-4-MKAG1273

Calculating an ANOVA means that we want to calculate the Fstatistic. There are six steps to calculating the F statistic:

1. Calculation of “sum of squares” between the groups,2. Calculation of “sum of squares” within the groups,3. Determine the degrees of freedom for each.4. Calculation of “mean square between” and “mean square

within”5. Calculation of the F ratio (or F statistic)6. Making a decision

Calculating an ANOVA

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 28: Shahid Lecture-4-MKAG1273

Calculating an ANOVA

Mean Square Between (MSB)

Mean Square Within (MSW)

F-statistics

Larger F-statistics mean more variation between the groupcompared to within the group. Larger F-statistics support thegroups are from different population.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 29: Shahid Lecture-4-MKAG1273

Calculation of Degree of Freedom

Degrees of freedom between (DFB) and the degrees of freedomwithin (DFW) can be calculated by following way:

DFB = No. of groups - 1

DFW = Population size - No. of groups

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 30: Shahid Lecture-4-MKAG1273

Example ANOVA Test

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 31: Shahid Lecture-4-MKAG1273

Hypotheses

We may test the

Null Hypothesis : There is no difference in groundwater depth in three catchments

against the

Alternative Hypothesis : the groundwater depth of at least one pair of catchments are not equal

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 32: Shahid Lecture-4-MKAG1273

Example ANOVA Test

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 33: Shahid Lecture-4-MKAG1273

Sum of Square Between (SSB)

38.798SSB

SSB11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 34: Shahid Lecture-4-MKAG1273

Total Sum Square(TSS)

Total sum square = Sum square between (SSB) + Sum square within (SSW)

44.735TSS

TSS11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 35: Shahid Lecture-4-MKAG1273

Total sum square (TSS)= Sum square between (SSB) + Sum square within (SSW)

Therefore,

SSW = TSS – SSB

= 44.735 – 38.798

= 5.937

Mean Square Within (MSW)

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 36: Shahid Lecture-4-MKAG1273

Determine Degree of Freedoms

Between group degree of freedom (BDF) =Number of group – 1= 3 -1 =2

Within group degree of freedom (WDF) =Total population – Total Group= 30 – 3=27

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 37: Shahid Lecture-4-MKAG1273

Mean Squares

Between Group Mean Square = SSB / BDF= 38.798 / 2= 19.399

Within Group Mean Square= SSW / WDF= 5.937 / 27= 0.2199

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 38: Shahid Lecture-4-MKAG1273

F-Statistics

Between Group Mean Square F = --------------------------------------------------

Within Group Mean Square

= 19.399 / 0.2199

= 88.2

F (0.05; 2,27) = 3.36

F(calculated)>F(critical). Therefore, we can reject null hypothesis.

Important:The F statistic doesn’t advise us about which groups are different, itonly says that mean values does or does not differ significantly bydifferent groups. In this case, it only says groundwater depth differssignificantly in different catchments.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 39: Shahid Lecture-4-MKAG1273

One-way and Two-way ANOVA

When there is only one qualitative variable which denotes the groups and onlyone measurement variable (quantitative), a one-way ANOVA is carried out. Thepurpose of one-way ANOVA is to find out whether data from several groups havea common mean. That is, to determine whether the groups are actually differentin the measured characteristic.

The purpose of two-way ANOVA is to test the effectives of two independentvariables of several groups. One-way ANOVA and two-way ANOVA differ in thatthe groups in two-way ANOVA have two categories of defining characteristicsinstead of one.

Suppose sediment samples are collected from three different areas. Contents of two minerals (A & B) are measured for each sample. We want to see are the samples are different from area to area as well as from types of mineral contents.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 40: Shahid Lecture-4-MKAG1273

Chi-square Test of Normality

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 41: Shahid Lecture-4-MKAG1273

Chi-square Test of Normality

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 42: Shahid Lecture-4-MKAG1273

Chi-square Test of Normality

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 43: Shahid Lecture-4-MKAG1273

Chi-square Test of Normality

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 44: Shahid Lecture-4-MKAG1273

Normsdist(z) [Excel Function]

Normsdist(-1) = 0.158655Normsdist (-1) – (Normsdist(0) = 0.341345Normsdist(0 ) – Normsdist(1) = 0.341345Normsdist(1) – Normsdist(2) = 0.135905

Expected Frequency = n x [probability of z-value occurring in that class interval]

Example = 12 x 0.158655 = 1.903863

Chi-square Test of Normality

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 45: Shahid Lecture-4-MKAG1273

Example: (2 – 1.903863)2/1.903863= 0.004855

Chi (calculated) = 0.09292Chi(critical) (alpha,df) = ?

Degree of Freedom (df) = m – k – 1

Where, m is the number of class (here 4)We estimated y(bar) and s, so k = 2Therefore, df = 4 – 2 – 1 =1

Chi (0.05, 1) = 3.841459Chi(calculated) < Chi(critical)

Null hypothesis can not be rejected.

Chi-square Test of Normality

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 46: Shahid Lecture-4-MKAG1273

We can conclude that, the measurements has come from normal distribution at 95% level of confidence

Chi-square Test of Normality

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 47: Shahid Lecture-4-MKAG1273

Parametric and Non-parametric Tests

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 48: Shahid Lecture-4-MKAG1273

Mann-Whitney U-Test

Computational Steps

1. Two samples are taken.

2. The data are put into order, based on size.

3. Data can be ranked from highest to lowest or lowest to highest values

4. Calculate Mann-Whitney U statistic

U = n1n2 + n1(n1+1) – R12

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 49: Shahid Lecture-4-MKAG1273

Example of Mann-Whitney U-test

Two tailed null hypothesisthat there is no differencebetween transmissivity in twoaquifers

Ho: Aquifer-A and Aquifer-Bhave same Transmissivity

HA: Transmissivity ofAquifer-A and Aquifer-B arenot same.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 50: Shahid Lecture-4-MKAG1273

Transmis.Aquifer-A

Transmis.Aquifer-A

Ranks of Trans. Of A

Ranks of Trans. Of B

193 175 1 7

188 173 2 8

185 168 3 10

183 165 4 11

180 163 5 12

178 6

170 9

n2 = 7 n1 = 5 R1 = 30 R2 = 48

Example of Mann-Whitney U testU1 = n1n2 + n1(n1+1) – R1

2U1 =(5)(7) + (5)(6) – 30

2U1 = 35 + 15 – 30U1 = 20

U 0.05,7,5 = 5

The value is equal to our value, Therefore, Ho is rejected.

We can say at 95% level of confidence that the two samples have different mean

U2 = n1n2 + n2(n2+1) – R22

U2 =(5)(7) + (7)(8) – 482

U2 = 35 + 28 – 48U2 = 15

U2 ~ U1 = 15 ~ 20

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 51: Shahid Lecture-4-MKAG1273

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 52: Shahid Lecture-4-MKAG1273

• The Kruskal-Wallis test is a nonparametric (distribution free) test,which is used to compare three or more groups of sample data.

• Kruskal-Wallis Test is used when assumptions of ANOVA are not met.In ANOVA, we assume that distribution of each group should benormally distributed. In Kruskal-Wallis Test, we do not assume anyassumption about the distribution. So Kruskal-Wallis Test is adistribution free test.

• If normality assumptions are met, then the Kruskal-Wallis Test is not aspowerful as ANOVA.

• The Kruskal-Wallis Test was developed by Kruskal and Wallis jointlyand is named after them.

Kruskal-Wallis Test

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 53: Shahid Lecture-4-MKAG1273

Steps of Kruskal-Wallis Test

1. Arrange the data of all samples in a single series in ascending order.2. Assign rank to them in ascending order. In the case of a repeated

value, assign ranks to them by averaging their rank position.3. Different samples are separated and summed up as R1 R2 R3, etc.4. To calculate the value of Kruskal-Wallis Test, apply the following

formula:

Where,H = Kruskal-Wallis Testn = total number of observations in all samplesRi = Rank of the sample

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 54: Shahid Lecture-4-MKAG1273

Calculation of Degree of Freedom:Degree of freedom = k-1; population is each group should be morethan 5.

Kruskal-Wallis Test statistics is approximately a chi-squaredistribution.

Value of Kruskal-Wallis Test < The chi-square table value:The null hypothesis is can not be rejected. The sample comes fromsame population.

Value of Kruskal-Wallis Test H > Tthe chi-square table value: Thenull hypothesis is rejected. The sample comes from a differentpopulation.

Kruskal-Wallis Test

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 55: Shahid Lecture-4-MKAG1273

Example: Groundwater depth in three catchments (A, B, C) aremeasured. Is there any variation in groundwater depth in threecatchments?

Kruskal-Wallis Test: Example

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 56: Shahid Lecture-4-MKAG1273

Example: Cont..

H = 9.84

Degree of Freedom = No. of groups -1= 3 -1 = 2

H(critical) = 5.99

H (calculated) > H (critical) at p = 0.01

Null hypothesis rejected.

Result: Significant difference exists in groundwater depth of three catchments.11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 57: Shahid Lecture-4-MKAG1273

Chi-square Table

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 58: Shahid Lecture-4-MKAG1273

Nonparametric Methods

Mann-Whitney-Wilcoxon Test Kruskal-Wallis Test Sign Test Wilcoxon Signed-Rank Test Run Test

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 59: Shahid Lecture-4-MKAG1273

Example: Sign Test

As part of research, studies were carried out to measure whether thenew method proposed by you (Method-A) can remote the Arsenic inwater more than the well-known existing method (Method-B). Atotal of 36 case studies were conducted. The obtained result is givenbelow. Do the data shown below indicate a significant difference inthe two method?

18 found Method-A is better (+ sign recorded)12 found Method-B is better (_ sign recorded)

6 cases both methods gives similar ambiguity

The analysis is based on a sample size of 18 + 12 = 30.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 60: Shahid Lecture-4-MKAG1273

HypothesesH0: No preference for one method over the other existsHa: A preference for one method over the other exists

Rejection RuleIf binomial table value is less than certain p value (such as 0.05)

Test StatisticNEGBINOMDIST(12,18,0.5) = 0.1145 (cumulative value)

ConclusionDo not reject H0. There is insufficient evidence in the sample toconclude that a difference in methods exists

We could reject if success is 20 and failure is 10 (Table value: 0.034).

Example

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 61: Shahid Lecture-4-MKAG1273

Example: Sign Test -Prevalence of one mineral

ProblemAs part of study, we want to seewhether concentration ofMineral-A is more compared toMineral-B in a place. We havecollected 14 samples and measurethe concentration of Mineral-Aand Mineral-B is the samples. Isthere any difference inconcentration of minerals in thesamples?

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 62: Shahid Lecture-4-MKAG1273

Example: Prevalence of one mineral

Test StatisticYes = 11, No, 3, Cumulative Binomial Value = 0.023

ConclusionBinomial values is less than 0.05. Therefore, Reject H0 at 95% levelof confidence.

Decision: There is sufficient evidence in the sample to conclude thatconcentration of one mineral is more compared to other.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 63: Shahid Lecture-4-MKAG1273

Example: Wilcoxon Signed-Rank TestThis test is the nonparametric alternative to the parametric matched-sampletest

AsAs partpart ofof study,study, wewe wantwant toto seesee whetherwhether concentrationconcentration ofof MineralMineral--AA isis moremorecomparedcompared toto MineralMineral--BB inin aa placeplace.. WeWe havehave collectedcollected 1010 samplessamples andand measuremeasurethethe concentrationconcentration ofof MineralMineral--AA andand MineralMineral--BB inin thethe samplessamples.. IsIs therethere anyanydifferencedifference inin concentrationconcentration ofof mineralsminerals inin thethe samples?samples?

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 64: Shahid Lecture-4-MKAG1273

WilcoxonWilcoxon SignedSigned--Rank TestRank Test

Preliminary Steps of the Test• Compute the differences between the paired observations.• Discard any differences of zero.• Rank the absolute value of the differences from lowest to

highest. Tied differences are assigned the average ranking of their positions.

• Give the ranks the sign of the original difference in the data.• Sum the signed ranks individually (“+” together and “–”

together)• Wilconxon Statistics W = minimum (“+” Rank; “-” Rank)• Compare calculated value to Wilconxon Tabulated value. • If your value less than the tabulated value Reject Null

Hypothesis

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 65: Shahid Lecture-4-MKAG1273

Example: Example: Wilcoxon SignedSigned--Rank TestRank Test

+ Rank = 49.5; - Rank = 5.5; W = Mininmum (+Rank; - Rank) = 5.5

H0: The concentration of minerals are sameHa: Concentration of minerals are not same.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 66: Shahid Lecture-4-MKAG1273

Wilcoxon Critical Value Table

W = 5.5

N = 10

W(calculated) < W (critical)

Important Note: If W(calculated) is less than critical table value, then null hypothesis is rejected.

Decision:Reject H0. There is sufficient evidence in the sample to conclude that a difference exists in mineral concentration.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 67: Shahid Lecture-4-MKAG1273

• The runs test is used to determine for serialrandomness: whether or not observations occur in asequence in time or over space.

• Runs Test is used for Nominal Data

• In Hydrological study, the runs test is most often usedto determine whether observations are random orfollowing some pattern.

Run TestRun Test

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 68: Shahid Lecture-4-MKAG1273

For example, we have sampled occurrence of some hydrologicaldisaster in every year, resulting in the data set:

Run TestRun Test

Where A denotes “No Disaster” and B denotes “Disaster” year. We areinterested in determining whether the order of the Disastruous year israndom or not. In some cases, some phenomena follows somepattern, Like below:

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 69: Shahid Lecture-4-MKAG1273

Unlike other tests there is no equation for the runs test unless thesample size of either group is greater than 30. One only needs tocount the number of runs (u), a run being a series of the samenominal value when counting from left to right.

Run TestRun Test

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 70: Shahid Lecture-4-MKAG1273

Run Test: Example (Two tailed)Run Test: Example (Two tailed)

Flood years in a place during the lasttwenty-one years (1990-2010) has beengiven in the table below. It has beenreported in different studies that climatechange has caused an increase of floodfrequency in the recent years. We wantto check whether it is true in the place ofour interest.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 71: Shahid Lecture-4-MKAG1273

Run Test: Example (Two tailed)Run Test: Example (Two tailed)

YNYNNYNNYNYYYNYYYNYYY

HypothesisH0 : The occurrence of flood in random.Ha : The occurrence of flood is not random.

Computation of Testn1 = 13 ← there are 13 occurrences of flood.n2 = 8 ← there are 8 occurrences of no flood.u = 13 ← there are 13 runs.

DecisionAt α = 0.05, u(critical) = 6, 16 ← there are 2 critical

values of u, if the calculated value falls betweenthese then H0 is accepted.

Since 6 < 13 < 16 accept H0The distribution of flood years are random

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 72: Shahid Lecture-4-MKAG1273

Critical Critical Values for Values for Run TestRun Test

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 73: Shahid Lecture-4-MKAG1273

If a one tailed runs test is used, we can determine whether the dataare either random, non-random due to clustering, or non-random dueto uniformity.

u has two critical values:If u < the lower u(critical )then the data are non-random due toclustering.If u > the upper u(Critical) then the data are non-random due touniformity.If u falls between the lower and upper uCritical then the data arerandom.

Run Test: Example (One tailed)Run Test: Example (One tailed)

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)