shahid lecture-9- mkag1273

77
MAL1303: STATISTICAL HYDROLOGY Trend Analysis Dr. Shamsuddin Shahid Department of Hydraulics and Hydrology Faculty of Civil Engineering, Universiti Teknologi Malaysia Room No.: M46-332; Phone: 07-5531624; Mobile: 0182051586 Email: [email protected] 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Upload: nchakori

Post on 18-Jan-2017

237 views

Category:

Engineering


7 download

TRANSCRIPT

Page 1: Shahid Lecture-9- MKAG1273

MAL1303: STATISTICAL HYDROLOGY

Trend Analysis

Dr. Shamsuddin ShahidDepartment of Hydraulics and Hydrology

Faculty of Civil Engineering, Universiti Teknologi Malaysia

Room No.: M46-332; Phone: 07-5531624; Mobile: 0182051586 Email: [email protected]

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 2: Shahid Lecture-9- MKAG1273

Time Series

Measurements of a variable taken at regular intervals over time forma time series.

• The times are usually equally spaced and form a continuoussequence but can be unequally spaced and often contain gaps.

• Each sample can be a snapshot of the variable or some form ofaverage value taken over a time sample.

Examples:

• Daily rainfall recorded at a station• Monthly record of river water chemistry• Weekly fluctuation of groundwater level• Daily record of river discharge

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 3: Shahid Lecture-9- MKAG1273

Plotting of Time Series Data

A time plot of a variable plots each observation against the time at which itwas measured. Time is marked on the horizontal scale, and the variable ofinterest is marked on the vertical scale. Connecting sequential data pointsby lines helps emphasize changes over time.

Annual Total Rainfall in Different Years

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 4: Shahid Lecture-9- MKAG1273

Component of Time Series

• Secular Trend (T): Secular trend is relatively smooth long-termmovements of a time series. It can be linear or nonlinear.

• Cyclical Variation (C): Rises and falls over periods longer than one year• Seasonal Variation (S): Patterns of change within a year, typically

repeating themselves• Irregular Variation (I): Effects of unexpected or irregular occurrences

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 5: Shahid Lecture-9- MKAG1273

Trends: A trend in a time series is a persistent, long-term rise or fall.

Seasonal Variation: A pattern in a time series that repeats itself at knownregular intervals of time is called seasonal variation.

Trend Analysis

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 6: Shahid Lecture-9- MKAG1273

• We have time series of rainfall data. We want to see whetherthe rainfall of the area has changed over a time?

• Groundwater level data are recorded for years. We want to seeif there any change in groundwater level.

• We have temperature records of an area of few decades. Wewant to see if the global warming also evident at the area?

• Concentrations and loads of phosphorus have been observed ata channel over a 20-year period. Have concentrations and/orloads changed over time?

Trend Analysis: Example Questions?

Tests for trend have been of keen interest in environmental sciences overthe last two decades.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 7: Shahid Lecture-9- MKAG1273

Trend Analysis: Example Questions?

Rainfall record of a station for fifty years. Visually or using generalstatistics or mathematics, it is not possible to measure whether there issignificant change in the rainfall. We need to use trend analysis.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 8: Shahid Lecture-9- MKAG1273

Trend Analysis

Trend analysis refers to the concept of collecting information andattempting to spot a pattern, or trend, in the information.

In other words trend analysis can be defined as below:Trend analysis is a mathematical technique that uses historical results topredict future outcome.

Although trend analysis is often used to predict future events, it could beused to estimate uncertain events in the past, such as how many floodsprobably happened between two dates, based on data such as theaverage number of floods historically occurred.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 9: Shahid Lecture-9- MKAG1273

Linear and Non-linear Trends

In Linear trend, the model function is a linear combination of parameters. Suchas y = mx + c, i.e the mode can be represent a straight line.

In Non-linear trend, the parameters appears as a non-linear combination ofparameter. Such y = x3 + 5e-3

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 10: Shahid Lecture-9- MKAG1273

Spatio-Temporal Modeling

Spatio-temporal analysis or modeling tells us how a variable ischanging over space and time.

1. Temporal Change: How a variable changes over time2. Spatial Change: How a variable changes over space

Spatial-temporal models arise when data are collected acrosstime as well as space.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 11: Shahid Lecture-9- MKAG1273

A series of observations of arandom variable, e.g.,rainfall, concentration, wellyield, etc. are collected oversome period of time. Wewould like to determine iftheir values generallyincrease or decrease(getting "better" or"worse"). In statistical termsthis is a determination ofwhether the probabilitydistribution from whichthey arise has changed overtime.

Temporal Modeling

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 12: Shahid Lecture-9- MKAG1273

Spatio-Temporal Modeling

Trends are spatiallyinterpolated to model thespatio-temporal pattern ofrainfall.

This extremely important in thecontext of environmentalchange for environmentalmanagement, policy planning,decision making, disastermitigation, adaptation, etc.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 13: Shahid Lecture-9- MKAG1273

We test hypothesis:The Null Hypothesis, H0 : There is no trend.Alternative Hypothesis, HA : There is a trend.

• Null hypothesis can not be rejected does not mean that no trend inthe data has been proved. It says that trend is not evident from thedata.

• If the null hypothesis is rejected, we try to find the magnitude totrend.

Temporal Modeling: Trend Analysis

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 14: Shahid Lecture-9- MKAG1273

One of the major task in trend analysis is to find the suitablemethod to test the trend. Selection of method depends on natureof your data.

• A test may be slightly more powerful in one instance but may bemuch less powerful in some other reasonable cases.

• The test selected should therefore be robust -- it should haverelatively high power over all situations and types of data thatmight reasonably be expected to occur.

Trend Analysis: Selection of Method

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 15: Shahid Lecture-9- MKAG1273

Some of the characteristics commonly found in waterresources data, and discussed in this chapter, are:

• Distribution (normal, skewed, symmetric, heavy tailed)• Outliers (but true measurement)• Cycles (seasonal, weekly, tidal, diurnal)• Missing values (a few isolated values or large gaps)• Censored data (less-than values, historical floods)

Trend Analysis: Selection of Method

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 16: Shahid Lecture-9- MKAG1273

First, the suitable method depends on whether the X data has beenadjusted or not. For example, we want to find trend of chemicalconcentration in a stream. We have collected concentration data atdifferent flow levels. As our intension to see the change in chemicalconcentration at constant stream flow, we first need to calculate theflow-adjusted concentration before trend analysis.

Types of method:

• Simple trend tests (not adjusted for X). • Tests adjusted for X. When there is some attempt to remove

variation caused by other associated variables, we us these test.

Trend Analysis: Selection of Method

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 17: Shahid Lecture-9- MKAG1273

Trend Analysis: Selection of Method

The methods used for trend test:

• Parametric method – Regression analysis• Non-parametric method – Mann-Kendall Test

Trend test just tells us whether there exists a significantchange or not over the time

Magnitude of change:

• Parametric method – Regression analysis• Non-parametric method – Sen’s slope method

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 18: Shahid Lecture-9- MKAG1273

Trend Estimation by Linear Regression

Linear regression is a statistical technique used for finding the bestfitting straight line for a set of data. The resulting line is called theregression line.

Regression requires one independent variable and one or moredependent variable. In case of trend analysis independent variableis always time.

Trend analysis through regression is the process of finding theequation that best describes the change of variable with time.

Trend analysis using linear regression is the best line fitting method

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 19: Shahid Lecture-9- MKAG1273

The Least Squares Solution

We define the best fitting line as the one that has the smallest total squarederror

This line is commonly called the least squares solution. In symbols the linearequation is

y = mx + c or y = a + bx For each value of x in the data, this equation will determine the point on the

line that gives the best prediction of y The problem is to find the specific values for m and c that will make this line

the best fitting. Least squares estimate of m

Where:SP is the sum of productsSSx is the sum of squares for the X scores and

m =SPSSx

푐 = 푌 − 푚푋

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 20: Shahid Lecture-9- MKAG1273

The Least Squares Solution This line is commonly called the least squares solution. In symbols the linear

equation isy = mx + c or y = a + bx

For each value of x in the data, this equation will determine the point on theline that gives the best prediction of y

The problem is to find the specific values for m and c that will make this linethe best fitting. Least squares estimate of m

Where:SP is the sum of productsSSx is the sum of squares for the X scores and

m =SPSSx

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 21: Shahid Lecture-9- MKAG1273

Time (Hour) DO (mg/l)

Trend Analysis Example

Variation of Dissolved Oxygen (mg/l) with Time (hour)

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 22: Shahid Lecture-9- MKAG1273

Trend Analysis Example

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 23: Shahid Lecture-9- MKAG1273

Null Hypothesis, H0 : There is no trend, m = 0

Alternative Hypothesis, HA: There is a trend, m ≠ 0

If |t(calculated)| > t (critical, α, n-2), Null hypothesis rejected. The change issignificant.

Test of Significance of Change

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 24: Shahid Lecture-9- MKAG1273

Confidence Interval Estimates of Y

A confidence interval reports the mean value of Y for a given X.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 25: Shahid Lecture-9- MKAG1273

Assumptions in Linear Regression Trend Test

Assumption in parametric regression for trend analysis:

Y values are normally distributed. The means of these normaldistributions of Y values all lie on the straight line of regression.

If assumption is not met, we need to go for non-parametric trend test.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 26: Shahid Lecture-9- MKAG1273

In Mann-Kendall test (Mann 1945; Kendall 1975) the data areevaluated as an ordered time series.

1. Each data is compared to all subsequent data.2. The initial value of the Mann-Kendall statistic, S, is assumed

to be 0 (e.g., no trend).3. If a data from a later time period is higher than a data from

an earlier time period, S is incremented by 1.4. On the other hand, if the data from a later time period is

lower than a data sampled earlier, S is decremented by 1.5. The net result of all such increments and decrements yields

the final value of S.

Mann-Kendall trend test

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 27: Shahid Lecture-9- MKAG1273

Mann-Kendall trend test

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 28: Shahid Lecture-9- MKAG1273

Mann-Kendall trend test

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 29: Shahid Lecture-9- MKAG1273

Mann-Kendall trend test

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 30: Shahid Lecture-9- MKAG1273

Null Hypothesis: There is no trend in time series data

Mann-Kendall trend test

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 31: Shahid Lecture-9- MKAG1273

Variation of Temperature with time

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 32: Shahid Lecture-9- MKAG1273

Mann-Kendall trend test:Example

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 33: Shahid Lecture-9- MKAG1273

Mann-Kendall trend test:Example

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 34: Shahid Lecture-9- MKAG1273

Mann-Kendall trend test: Example

VAR (S) = 487.7

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 35: Shahid Lecture-9- MKAG1273

Mann-Kendall trend test: Example

S = 93VAR (S) = 487.7

Z (calculated) > Z (critical)

Hypothesis rejected. There is a trend at 99% level of significance

Result: Temperature is increasing with time

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 36: Shahid Lecture-9- MKAG1273

Linear Regression Analysis

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 37: Shahid Lecture-9- MKAG1273

Magnitude of Change

• Kendall-Theil non-parametric rank based method• Sen’s Slope Method

Related to Kendall-tau rank correlation, it is a robust nonparametricline applicable when Y is linearly related to X.

These are the advantages of Kendall-Theil or Sen’s Slope methods incontrast to OLS Regression are:

• They not depend on the normality of residuals for validity ofsignificance tests

• They are not strongly affected by outliers

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 38: Shahid Lecture-9- MKAG1273

If the model form specified in a regression equation were known to be correct(Y is linear with T) and the residuals were truly normal, then fully-parametricregression would be optimal (most powerful and lowest error variance for theslope). Of course we can never know this in any real world situation.

If the actual situation departs, even to a small extent, from these assumptionsthen the Mann-Kendall procedures will perform either as well or better.

When one is forced, by the sheer number of analyses that must be performedto work without detailed case-by-case checking of assumptions, thennonparametric procedures are ideal.

Non-parametric tests are always nearly as powerful as regression

The failure to edit out or correctly transform a small percentage of outlyingdata will not have a substantial effect on the results.

Simple Trend Analysis: Comparison

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 39: Shahid Lecture-9- MKAG1273

• Variables other than time trend often have considerable influence onthe response variable Y. These influencing variables are calledexogenous variables.

• These exogenous variables are usually natural, random phenomenasuch as rainfall, temperature or streamflow.

• It is necessary to remove the influence of exogenous variables over Y tofind the desired trend.

• By removing the variation in Y caused by these variables, thebackground variability or "noise" is reduced so that any trend "signal"present can be seen.

• The ability (power) of a trend test to discern changes in Y with T is thenincreased.

• The removal process is not often easy. It often involves modelling.

Trend Analysis: Adjusted Variable

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 40: Shahid Lecture-9- MKAG1273

Adjusted Variable

Let us consider that whetherthere is any trend in chemicalconcentration in a stream. Wehave collected concentrationdata at different flow levels. Asour intension to see the changein chemical concentration atconstant stream flow, we firstneed to calculate the flow-adjusted concentration beforetrend analysis.

Concentration data withoutadjusted (up) and after adjusted(down) will give differenttrends.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 41: Shahid Lecture-9- MKAG1273

Adjust A Variable

We can simply adjust a variable bysimple regression analysis.

For our example, if you find theregression equation betweenconcentration and flow as,

Concentration = 1.3 Flow + 0.87

Using this equation, we havecalculate the residual ofconcentration value. The residuals Rfrom the regression describe thevalues for the Y variable "adjustedfor" exogenous variables. Resultantresidula values are then used fortrend test.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 42: Shahid Lecture-9- MKAG1273

Adjust A Variable: Example

We want to see, if there any change in salinity in coastal estuary with time. Wehave time series data of water salinity. Problem is that water salinity depends onrainfall. High rainfall dilutes the salt and reduce the salinity value. Therefore, firstaim is to remove the influence of rainfall on salinity.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 43: Shahid Lecture-9- MKAG1273

Adjust A Variable: Example

Calculate the regressionequation between rainfall andriver discharge. Use thatequation to find the residual inpredicted value. This residualsare due to error in predictionand change due to time.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 44: Shahid Lecture-9- MKAG1273

Adjust More than One Exogenous Variable

Sometimes, a variable may be influenced by many variables. Such as,some chemical concentration in water depends on streamflow, rainfalland temperature.

We need, multiple regression to find the relation between the variablewhose trend to be found and the exogenous variables.

Y = a0 + a1x1 +a2x2 + a3x3

The equation is then used to predict the value and calculate residual.Trend analysis is carried out over the residual to find the trend.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 45: Shahid Lecture-9- MKAG1273

Adjustment With Non-parametric Regression

• Kendall-Theil non-parametric rank based method• Sen’s Slope Method

if the variables are not normally distributed or not linearly related, weneed to transform the data before adjustment.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 46: Shahid Lecture-9- MKAG1273

What can we do to explore other (nonlinear) relationships?

One possibility is to transform one of the variables. Forexample, instead of using Y as the dependent variable, wemight use its log, reciprocal, square, or square root.

Another possibility is to transform both of the variable in thesame way.

There are many other transformations, but log, reciprocal,square, or square root are the most common.

Transformation before Adjustment

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 47: Shahid Lecture-9- MKAG1273

Mixed Approach: Mann-Kendall on Regression Residuals

Once adjusted, we can use either parametric or non-parametric method on the adjusted variable to find thetrends.

Usually, if the adjusted variable does not obey the rules thatare necessary for application of parametric methods, then wecan use non-parametric methods for trend tests.

Therefore, in mixed method, data is adjusted withparametric method and trend is analyzed by non-parametricmethod.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 48: Shahid Lecture-9- MKAG1273

Dealing With Seasonality

• There are many instances where changes between differentseasons of the year are a major source of variation in the Yvariable.

• For example, rainfall is different in different season, river watersalinity varies with season, etc.

• As with other exogenous effects, seasonal variation must becompensated for or "removed" in order to better discern thetrend in Y over time.

• If not, little power may be available to detect trends which aretruly present.

• We may also be interested in modeling the seasonality to allowdifferent predictions of Y for differing seasons.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 49: Shahid Lecture-9- MKAG1273

Methods used to dealing with seasonality:

• Seasonal Kendall test for trend on Y• Parametric Regression of Y on T and seasonal terms• Mixed Regression of deseasonalized Y on T - The seasonal

Kendall test can be applied to residuals from a regression of Y versus X

Dealing With Seasonality

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 50: Shahid Lecture-9- MKAG1273

We can forecast the future temperature using the Regression equation:

Temperature = -0.004 x Year + 27.752

What will be the temperature in January, 2008?January, 1999 is the 1st month of the time series. January, 2008 is the 85th Month.Therefore, Temperature in January, 2008 = -0.004 x 85 + 27.752=27.412 Deg Centigrade.

The trend line ignores seasonal variation in the temperature. Using the equationabove to forecast Temperature for say, July 2008, will result in a grossunderestimate.

Trend Analysis: Linear Regression

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 51: Shahid Lecture-9- MKAG1273

• The seasonal Kendall test accounts for seasonality by computing theMann-Kendall test on each of m seasons separately, and thencombining the results.

• Therefore, a particular season data of a year is compared with thatseason data of other months.

• No comparisons are made across season boundaries.

• Kendall's S statistic Si for each season are summed to form the overallstatistic Sk.

Kendall Test for Seasonality

m

iik SS

1

The null hypothesis is rejected at significance level α if |ZSk| > Zcrit where Zcrit is the value of the standard normal distribution with a probability of exceedance of α/2.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 52: Shahid Lecture-9- MKAG1273

Kendall Test for Seasonality

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 53: Shahid Lecture-9- MKAG1273

Kendall Test for Seasonality

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 54: Shahid Lecture-9- MKAG1273

• Mixed procedure involves deseasonalizing the data by subtractingseasonal medians from all data within the season, and thenregressing these deseasonalized data against time.

• Multiple Regression With Periodic Functions

Deseasonalizing Data

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 55: Shahid Lecture-9- MKAG1273

Mixed procedure involves deseasonalizing the data by subtractingseasonal medians from all data within the season, and then regressingthese deseasonalized data against time.

One advantage of this procedure is that it produces a description of thepattern of the seasonality (in the form of the set of seasonal medians).

However, this method has generally lower power to detect trend thanother methods, and is not preffered over the other alternatives.

Subtracting seasonal means would be equivalent to using dummyvariables for m−1 seasons in a fully parametric regression.

Mixture Methods

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 56: Shahid Lecture-9- MKAG1273

Test for Seasonality: Mixture Methods

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 57: Shahid Lecture-9- MKAG1273

Test for Seasonality: Mixture Methods

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 58: Shahid Lecture-9- MKAG1273

Deseasonalization: Multiple Regression With Periodic Functions

The simplest case, one that is sufficient for most purposes, is:

Y = 0 + 1•sin(2πT) + 2•cos(2πT) + 3 [12.3]

where "other terms" are exogenous explanatory variables such as flow,rainfall, or level of some human activity (e.g. waste discharge, basinpopulation, production). They may be continuous, or binary "dummy"variables as in analysis of covariance.

The residuals must be approximately normal. Time is commonly butnot always expressed in units of years.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 59: Shahid Lecture-9- MKAG1273

• One advantage of the mixed method is that it produces adescription of the pattern of the seasonality (in the form of the setof seasonal medians).

• However, the mixed method has generally lower power to detecttrend than other methods, and is not preferred over the otheralternatives.

• The Mann-Kendall and mixed approaches have the disadvantagesof only being applicable to univariate data and are not amenable tosimultaneous analysis of multiple sources of variation.

• Multiple regression allows many variables to be considered easilyand simultaneously by a single model.

Comparing Deseasonalizing Method

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 60: Shahid Lecture-9- MKAG1273

Differences Between Seasonal Patterns

The approaches described above all assume a single pattern of trendacross all seasons. This may be a gross over-simplification and can fail toreveal large differences in behavior between different seasons. It isentirely possible that the Y variable exhibits a strong trend in its summervalues and no trend in the other seasons. Even worse, it could be thatspring and summer have strong up-trends and fall and winter have strongdown-trends, cancelling each other out and resulting in an overallseasonal Kendall test statistic stating no trend.

No overall test statistic will provide any clue of these differences. This isnot to suggest they are not useful. Many times we desire a single numberto characterize what is happening in a data set. Particularly when dealingwith several data sets (multiple stations and/or multiple variables),breaking the problem down into 4 seasons or 12 months simply swampsus with more results than can be absorbed.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 61: Shahid Lecture-9- MKAG1273

The test for homogeneity examine "contrasts" between the differentseasonal statistics. This provides a single statistic which indicateswhether the seasons are behaving in a similar fashion (homogeneous)or behaving differently from each other (heterogeneous).

For each season i (i=1,2,...m) compute,

Sum these to compute the "total“ chi-square statistic, then compute "trend" and "homogeneous" chi-squares:

Seasonal Patterns: Test of Homogeneity

)(/ iii SVarSZ

m

iitotal Z

1

22)(

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 62: Shahid Lecture-9- MKAG1273

Seasonal Patterns: Test of Homogeneity

Trend:

Homogeneous

m

ZZwhereZm

m

ii

trend

122

)( .

2)(

2)(

2)(hom trendtotalogeneous

The null hypothesis that the seasons are homogeneous with respect totrend (τ1 = τ2 = . . . = τm) is tested by comparing χ2

(homogeneous) to tables ofthe chi-square distribution with m−1 degrees of freedom. If it exceedsthe critical value for the pre-selected α, reject the null hypothesis andconclude that different seasons exhibit different trends.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 63: Shahid Lecture-9- MKAG1273

Use of Transformations in Trend Studies

• Water resources data commonly exhibit substantial departures froma normal distribution. Surface-water concentration, load, and flowdata are often positively skewed.

• Trends which are nonlinear will be poorly described by a linear slopecoefficient, whether from regression or a nonparametric method. It isquite possible that negative predictions may result for some values oftime or X. By transforming the data so that the trend is linear, aMann-Kendall or regression slope can later be re-expressed back intooriginal units.

• One way is to take a transformation (log, square, inverse, etc) of thedata prior to trend analysis. The trend slope will then be expressed inlog units. A linear trend in log units translates to an exponential trendin original units.

• If m is the estimated slope of a linear trend in natural log units thenthe percentage change from the beginning of any year to the end ofthat year will be (em − 1).

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 64: Shahid Lecture-9- MKAG1273

Use of Transformations in Trend Studies

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 65: Shahid Lecture-9- MKAG1273

Use of Transformations in Trend Studies

Change = 0.043 ( in log scale)Actual Change = e0.043 -1

= 0.044Change (%) = 0.044 x 100

= 4.4%

Sometimes, it is argued that data should always be transformed to normality, andparametric procedures computed on the transformed data. Transformations to normalityare not always possible, as some data are non-normal due not to skewness but to heavytails of the distribution. You can use non-parametric test in those situations.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 66: Shahid Lecture-9- MKAG1273

Step Trend

Study of long term changes in hydrologic variables can be carried outin either of two modes:

1. Monotonic Trends: It means overall trend. It is discussed so far inthis lecture.

2. Step Trends: It compares two non-overlapping sets of data, an"early" and "late" period of record.

Step Trends:• Changes between the periods are called "step trends", as values of Y

step up or down from one time period to the next.• Testing for differences between these two groups involves

procedures similar or identical to the rank-sum test, two-sample t-tests, and analysis of covariance. Each of them also can be modifiedto account for seasonality.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 67: Shahid Lecture-9- MKAG1273

Step Trend

t-Test (parametric)The basic parametric test for step trends is the two-sample t-test. Themagnitude of change is measured by the difference in sample meansbetween the two periods.

The disadvantages of using a t-test for step trends on data which are non-normal -- loss of power, inability to incorporate data below the detectionlimit, and an inappropriate measure of the step trend size.

Rank-sum Test (non-parametric)The primary nonparametric alternative is the rank-sum test of step-trendmagnitude. The rank-sum test can be implemented in a seasonal mannerjust like the Mann-Kendall test, called the seasonal rank-sum test. Itcomputes the rank-sum statistic separately for each season, sums the teststatistics, their expectations and variances, and then evaluates the overallsummed test statistic.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 68: Shahid Lecture-9- MKAG1273

Step Trend: Wilcoxon Signed-Rank Test

• Compute the differences between the paired observations.• Discard any differences of zero.• Rank the absolute value of the differences from lowest to

highest. Tied differences are assigned the average ranking of their positions.

• Give the ranks the sign of the original difference in the data.

• Sum the signed ranks individually (“+” together and “–” together)

• Wilconxon Statistics W = minimum (“+” Rank; “-” Rank)• Compare calculated value to Wilconxon Tabulated value. • If your value less than the tabulated value Reject Null

Hypothesis

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 69: Shahid Lecture-9- MKAG1273

Example: Wilcoxon Signed-Rank Test

+ Rank = 49.5; - Rank = 5.5; W = Mininmum (+Rank; - Rank) = 5.5

H0: There is no differenceHa: There is a difference

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 70: Shahid Lecture-9- MKAG1273

Wilcoxon Critical Value Table

W = 5.5

N = 10

W(cal) < W (tab)

Decision:Reject H0. There is no sufficient evidence to conclude that there exists difference between the two period.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 71: Shahid Lecture-9- MKAG1273

Step Trend: Applicability

Step trend procedures should be used in two situations:

1. The first is when the record (or records) being analyzed arenaturally broken into two distinct time periods with a relativelylong gap between them. There is no specific rule to determine howlong the gap should be to make this the preferred procedure. If thelength of the gap is more than about one-third the entire period ofdata collection, then the step trend procedure is probably best.

2. The second situation to test for step-trend is when a known eventhas occurred at a specific time during the record which is likely tohave changed water quality. The record is first divided into "before"and "after" periods at the time of this known event. Example eventsare the completion of a dam or diversion, the introduction of a newsource of contaminants, reduction in some contaminant due tocompletion of treatment plant improvements, or the closing ofsome facility

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 72: Shahid Lecture-9- MKAG1273

Step Trend: Applicability

Trend Tests

Rank Tests

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 73: Shahid Lecture-9- MKAG1273

Trends with Censored Data

Censored samples are records in which some of the data are known only to be"less than" or "greater than" some threshold.

The two most common examples in hydrology are constituent concentrationsless than the detection limit and floods which are known to be less than somethreshold of perception.

Example:Arsenic (As) in groundwater values recorded in ppm as 9.1, 7.3, <5.0, 6.2, 5.2,<5.0, etc.The annual flood of 1987 was not sufficiently large that local record keepersbothered to record the maximum stage.

The existence of censored values complicates the use of trend tests and theprocedures involving removal of the effect of an exogenous variable.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 74: Shahid Lecture-9- MKAG1273

Mann-Kendall Test with Censored Data: Single Threshold

The Mann-Kendall test can be used without any difficulty when only onecensoring threshold exists.

Comparisons between all pairs of observations are possible. All the "lessthans" are less than the other values and are considered to be tied with eachother.

For example: If data is like below:9.1, 7.3, <5.0, 6.2, 5.2, <5.0, …..All <5.0 data are considered as less than other recorded values. All <5.0 dataare also considered as tied with each other.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 75: Shahid Lecture-9- MKAG1273

Mann-Kendall Test with Censored Data: Multiple Threshold

When more than one detection limit exists, the Mann-Kendall test can not beperformed without further censoring the data.

For example: If data is like below:9.1, 7.3, <5.0, 6.2, <3.0, <1.0, 5.2, <5.0, …..

How can a <1.0 and <5.0 be compared? These ambiguities make the testimpossible to compute.

The only way to perform a Mann-Kendall test is to censor and recode the dataat the highest detection limit. Thus,If data series: 9.1, 7.3, <5.0, 6.2, <3.0, <1.0, 5.2, <5.0, …..It becomes: 9.1, 7.3, <5.0, 6.2, <5.0, <5.0, 5.2, <5.0, …..

There is certainly a loss of information in making this change, and a loss ofpower to detect any trends which may exist.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 76: Shahid Lecture-9- MKAG1273

If missing data is not very large, trend test can be used without majordifficulty.

But the major question, how much missing data (or how muchcompleted record) in tolerable in trend test.

One reasonably objective rule for deciding whether to include a record is:

1. Divide the study period into thirds (three periods of equal length)2. Determine the coverage in each period (e.g. if the record is generally

monthly, count the months for which there are data),3. If any of the thirds has less than 20% of the total coverage then the

record should not be included in the analysis.

Trend Test with Missing Data

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 77: Shahid Lecture-9- MKAG1273

Tolerable Missing Data

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)