shahid lecture-6- mkag1273

47
MAL1303: STATISTICAL HYDROLOGY Regression Analysis Dr. Shamsuddin Shahid Associate Professor Department of Hydraulics and Hydrology Faculty of Civil Engineering Room No.: M46-332; Phone: 07-5531624; Mobile: 0182051586 Email: [email protected] 11/23/2015 Shamsuddin Shahid, FKA, UTM You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Upload: nchakori

Post on 15-Apr-2017

161 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Shahid Lecture-6- MKAG1273

MAL1303: STATISTICAL HYDROLOGY

Regression Analysis

Dr. Shamsuddin ShahidAssociate Professor

Department of Hydraulics and HydrologyFaculty of Civil Engineering

Room No.: M46-332; Phone: 07-5531624; Mobile: 0182051586

Email: [email protected]

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 2: Shahid Lecture-6- MKAG1273

RegressionQuestions: Two variables are associated with one another. If one

variable is changed, then how much the other onechange?

How can we mathematically formalize the functionalrelationship between two variables?

Answer:Regression Analysis

Definition: Regression is a statistical technique that is used todetermine the functional relationship between two variables.Regression gives an equation that best describes the relationshipbetween two variables.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 3: Shahid Lecture-6- MKAG1273

Research Questions: Are two variables related?

Example questions in hydrology: “Is there any relation between rainfall and river discharge?” “Is there any relation between low river flow and river water

quality?” “Is there any relation between elevation and rainfall?” “Is there any relation between rainfall intensity and landslides?

Test the relationship: Correlation

If you change the questions from “Is” to “How” or “What”, e.g. “How rainfall and River Discharge is Related?”

To nee to go for: Regression Analysis

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 4: Shahid Lecture-6- MKAG1273

Simple Regression

The dependent variable is the variable for which wewant to make a prediction and independent variable isthe variable that is used to predict.

Simple regression analysis is a statistical tool that givesus the ability to estimate the mathematical relationshipbetween a dependent variable (usually called y) and anindependent variable (usually called x).

Regression can be Linear or Non-linear forms, butsimple linear regression models are the most commonin hydrology.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 5: Shahid Lecture-6- MKAG1273

The goal is to find a functional relation between the responsevariable y and the predictor variable x.

y = f (x)

Another primary goal of quantitative analysis is to use currentinformation about a phenomenon to predict its future behavior.

Regression: Main Goals

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 6: Shahid Lecture-6- MKAG1273

What is Regression?

Data of Height of Sea Waves and Erosion in Seashore are collected to findhow much responsible the sea waves are in beach erosion.We calculated the correlation coefficient between Wave height and Erosionis 0.79.Regression calculate the functional relation between Wave height andErosion as, Erosion = 7.32 + Height × 0.62

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 7: Shahid Lecture-6- MKAG1273

Pictorial Presentation of Linear Regression Model

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 8: Shahid Lecture-6- MKAG1273

Regression analysis serves Three major purposes:

1.Description2.Control3.Prediction

Uses of Regression Analysis

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 9: Shahid Lecture-6- MKAG1273

Difference between Correlation and RegressionCorrelation quantifies the degree to which two variables are related.

Correlation does not find functional relation. We simply compute acorrelation coefficient that tells us how much one variable tends tochange when the other one does.

With correlation we don't have to think about cause and effect. Wesimply quantify how well two variables relate to each other. Withregression, we do have to think about cause and effect as the regressionline is determined as the best way to predict Y from X.

With correlation, it doesn't matter which of the two variables we call "X"and which you call "Y". We get the same correlation coefficient if youswap the two. With linear regression, the decision of which variable youcall "X" and which you call "Y" matters a lot, as you'll get a different best-fit line if we swap the two. The line that best predicts Y from X is not thesame as the line that predicts X from Y.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 10: Shahid Lecture-6- MKAG1273

Linear and Non-linear Regression

In Linear Regression, the model function is a linear combination ofparameters. Such as y = mx + c, i.e the mode can be represent astraight line.

In Non-linear Regression, the parameters appears as a non-linearcombination of parameter. Such y = x3 + 5e-3

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 11: Shahid Lecture-6- MKAG1273

Construction of Regression Models

Selection of independent variables Functional form of regression relation Scope of model

– Least square and correlation based

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 12: Shahid Lecture-6- MKAG1273

Linear Regression – General Principle

A linear relationship between twovariables x and y can be expressedby the equation,

y = mx + cWhere,

y is the dependent variablex is independent variablem and c are constants

In the general linear equation,

The value of m is called the slope. The slope determines how much they variable will change when x is increased or decreased by one point

The value of c in the general equation is called the Y-intercept. Itdetermines the value of y when x=0

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 13: Shahid Lecture-6- MKAG1273

Least Squares Regression Principle

Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 14: Shahid Lecture-6- MKAG1273

The Least Squares Solution

For each value of x in the data, this equation will determine the point on theline that gives the best prediction of y

The problem is to find the specific values for m and c that will make this linethe best fitting. Least squares estimate of m

Where:SP is the sum of productsSSx is the sum of squares for the X scores and

m =SPSSx

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 15: Shahid Lecture-6- MKAG1273

Example of Regression Analysis

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 16: Shahid Lecture-6- MKAG1273

Standard Error of Estimate A regression equation, by itself, allows you to make predictions, but it does not

provide any information about the accuracy of the predictions

The standard error of estimate gives a measure of the standard distancebetween a regression line and the actual data points

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 17: Shahid Lecture-6- MKAG1273

Error Estimation Formula To calculate the standard error of estimate Find a sum of squared deviations

(SS) This sum of squares is commonly called SSerror

SSerror = Σ(Y-Ŷ)2

The obtained SS value is then divided by its degrees of freedom to obtain a measure of variance. The df for standard error of estimate are

df = n – 2

The standard error of estimate provides a measure of how accurately the regression equation predicts the y value, Standard Error =

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 18: Shahid Lecture-6- MKAG1273

Error Estimation Example

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 19: Shahid Lecture-6- MKAG1273

• The relationship between the variables is linear.

• Both variables must be at least interval scale.

• The least squares criterion is used to determine the equation.

Assumptions

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 20: Shahid Lecture-6- MKAG1273

ExampleIt is anticipated that climate change will make the sea more rough thanever before. It may impact on erosion in Seashore line. Data are collectedabout average wave height (in meter) during cyclone and Erosion inseashore (cm/cyclone event). Try to find out a relation for futureprediction of Seashore erosion due to more rough sea.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 21: Shahid Lecture-6- MKAG1273

Example: Solution

10.0

15.0

20.0

25.0

30.0

35.0

1.5 2.0 2.5 3.0 3.5

Y = mX + c

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 22: Shahid Lecture-6- MKAG1273

Example: Solution Y = mX + c

Calculate m and Calculate c

m = 9.585c = -1.00

Y = 9.585X – 1.00

Erosion =9.585 x Height – 1.00

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 23: Shahid Lecture-6- MKAG1273

Example: Solution

Erosion =9.585 x Height – 1.00

Error = 4.7778 cm

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 24: Shahid Lecture-6- MKAG1273

Example: Solution

Erosion =9.585 x Height – 1.00

With Error = 4.7778 cm

If Height is 4.0 m

Erosion = 9.585 x Height – 1.00= 39.36 cm

=34.14 to 44.59 cm

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 25: Shahid Lecture-6- MKAG1273

Regression Analysis – Least Squares Principle

The least squares principle is used to obtain a and b.

The equations to determine a and b are:

bn XY X Y

n X X

aY

nb

Xn

( ) ( )( )( ) ( )

2 2

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 26: Shahid Lecture-6- MKAG1273

Correlation Based Method: Computing the Slope

Y = mX + c

Calculate Slope m;Calculate Intercept c

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 27: Shahid Lecture-6- MKAG1273

Computing the Y-Intercept

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 28: Shahid Lecture-6- MKAG1273

Illustration of the Least Squares Regression Principle

Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 29: Shahid Lecture-6- MKAG1273

It is anticipated that climate change will make the sea more rough thanever before. It may impact on erosion in Seashore line. Data are collectedabout average wave height (in meter) and Erosion in seashore (cm/year).Try to find out a relation for future prediction of Seashore erosion due tomore rough sea.

Regression Equation - Example

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 30: Shahid Lecture-6- MKAG1273

Regression Equation - Example

Correlation Coefficient, r = 0.99257Sx = 0.5243Sy = 5.0652

m = r (Sy/Sx) = 0.99257 x (5.0652/0.5243)= 9.589

c = -1.01

Y = 9.589X - 1.01

Erosion = 9.589 x Height - 1.01

It was by least square method: Erosion =9.585 x Height – 1.00Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 31: Shahid Lecture-6- MKAG1273

Assumptions in Linear Regression Model

For each value of X, there is a group of Y values, and these Y values are normally distributed. The means of these normal

distributions of Y values all lie on the straight line of regression. The standard deviations of these normal distributions are equal. The Y values are statistically independent. This means that in the

selection of a sample, the Y values chosen for a particular X valuedo not depend on the Y values for any other X values.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 32: Shahid Lecture-6- MKAG1273

Confidence Interval Estimates of Y

A confidence interval reports the mean value of Y for a given X.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 33: Shahid Lecture-6- MKAG1273

Confidence Interval Estimates of Y

Erosion = 9.585 x Height – 1.00

If Height is 4.0 m

Erosion = 9.585 x 4.0 – 1.00= 39.36 cm

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 34: Shahid Lecture-6- MKAG1273

Confidence Interval Estimates of Y

Erosion = 9.585 x Height – 1.00

If Height is 2.5 mErosion = 9.585 x Height – 1.00= 23.0 cm

Degree of Freedom, df = n-2 = 11-2 = 9t(0.05; 9) = 2.262Serr = 4.7778Y(predicted) = 23.0Confidence Interval = 23.0 ± 3.32

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 35: Shahid Lecture-6- MKAG1273

Confidence Interval Estimates of Y

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 36: Shahid Lecture-6- MKAG1273

Confidence Interval of Y

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 37: Shahid Lecture-6- MKAG1273

Prediction Interval Estimates of Y

A prediction interval reports the range of values of Y for a particular value of X.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 38: Shahid Lecture-6- MKAG1273

Prediction Interval Estimates of Y

Erosion = 9.585 x Height – 1.00

If Height is 4.0 mErosion = 9.585 x Height – 1.00= 39.365 cm

Degree of Freedom, df = n-2 = 11-2 = 9t(0.05; 11) = 2.262Serr = 4.7778Y(predicted) at 4.0 m height = 22.26Prediction Interval = 39.365 ± 15.37

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 39: Shahid Lecture-6- MKAG1273

Confidence Interval and Confidence Interval of Y

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 40: Shahid Lecture-6- MKAG1273

Transforming Data

The coefficient of correlation describes the strength ofthe linear relationship between two variables. It could bethat two variables are closely related, but thererelationship may not be linear.

Be cautious when you are interpreting the coefficient ofcorrelation. A value of r may indicate there is no linearrelationship, but it could be there is a relationship ofsome other nonlinear or curvilinear form.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 41: Shahid Lecture-6- MKAG1273

Non-linear Data

The correlation between theRainfall and River Dischare is0.782. This is a fairly stronginverse relationship.

However, when we plot thedata on a scatter diagram therelationship does not appearto be linear; it does not seemto follow a straight line.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 42: Shahid Lecture-6- MKAG1273

Transforming Data What can we do to explore other (nonlinear) relationships?

One possibility is to transform one of the variables. Forexample, instead of using Y as the dependent variable, wemight use its log, reciprocal, square, or square root.

Another possibility is to transform both of the variable in thesame way.

There are many other transformations, but log, reciprocal,square, or square root are the most common.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 43: Shahid Lecture-6- MKAG1273

Transforming Data

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 44: Shahid Lecture-6- MKAG1273

After log transformation of River Discharge Data we got the regression equation as:

Transforming Data

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 45: Shahid Lecture-6- MKAG1273

• The value 6.4372 is the log to the base 10 of winnings.• The antilog of 6.4372 is 2.736 • Therefore, when rainfall is 70mm, discharge is 2.736 cumec.

Transforming Data

Prediction of River Discharge from Rainfall. What is discharge when rainfall is 70 mm?

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 46: Shahid Lecture-6- MKAG1273

Interpretation of Regression Equation

Y = mX + c

What does m mean?What does c mean?

Let we got a regression equation:

Y = 10.2 X + 21.9

How will you interpret it?

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Page 47: Shahid Lecture-6- MKAG1273

How will you interpret the following regression equation:

Y = 10.2 X + 21.9

Y = 10.2 X – 21.9

Y = 21.9 – 10.2 X

Interpretation of Regression Equation

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)