shahid lecture-6- mkag1273

Post on 15-Apr-2017

163 Views

Category:

Engineering

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MAL1303: STATISTICAL HYDROLOGY

Regression Analysis

Dr. Shamsuddin ShahidAssociate Professor

Department of Hydraulics and HydrologyFaculty of Civil Engineering

Room No.: M46-332; Phone: 07-5531624; Mobile: 0182051586

Email: sshahid@utm.my

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

RegressionQuestions: Two variables are associated with one another. If one

variable is changed, then how much the other onechange?

How can we mathematically formalize the functionalrelationship between two variables?

Answer:Regression Analysis

Definition: Regression is a statistical technique that is used todetermine the functional relationship between two variables.Regression gives an equation that best describes the relationshipbetween two variables.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Research Questions: Are two variables related?

Example questions in hydrology: “Is there any relation between rainfall and river discharge?” “Is there any relation between low river flow and river water

quality?” “Is there any relation between elevation and rainfall?” “Is there any relation between rainfall intensity and landslides?

Test the relationship: Correlation

If you change the questions from “Is” to “How” or “What”, e.g. “How rainfall and River Discharge is Related?”

To nee to go for: Regression Analysis

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Simple Regression

The dependent variable is the variable for which wewant to make a prediction and independent variable isthe variable that is used to predict.

Simple regression analysis is a statistical tool that givesus the ability to estimate the mathematical relationshipbetween a dependent variable (usually called y) and anindependent variable (usually called x).

Regression can be Linear or Non-linear forms, butsimple linear regression models are the most commonin hydrology.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

The goal is to find a functional relation between the responsevariable y and the predictor variable x.

y = f (x)

Another primary goal of quantitative analysis is to use currentinformation about a phenomenon to predict its future behavior.

Regression: Main Goals

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

What is Regression?

Data of Height of Sea Waves and Erosion in Seashore are collected to findhow much responsible the sea waves are in beach erosion.We calculated the correlation coefficient between Wave height and Erosionis 0.79.Regression calculate the functional relation between Wave height andErosion as, Erosion = 7.32 + Height × 0.62

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Pictorial Presentation of Linear Regression Model

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Regression analysis serves Three major purposes:

1.Description2.Control3.Prediction

Uses of Regression Analysis

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Difference between Correlation and RegressionCorrelation quantifies the degree to which two variables are related.

Correlation does not find functional relation. We simply compute acorrelation coefficient that tells us how much one variable tends tochange when the other one does.

With correlation we don't have to think about cause and effect. Wesimply quantify how well two variables relate to each other. Withregression, we do have to think about cause and effect as the regressionline is determined as the best way to predict Y from X.

With correlation, it doesn't matter which of the two variables we call "X"and which you call "Y". We get the same correlation coefficient if youswap the two. With linear regression, the decision of which variable youcall "X" and which you call "Y" matters a lot, as you'll get a different best-fit line if we swap the two. The line that best predicts Y from X is not thesame as the line that predicts X from Y.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Linear and Non-linear Regression

In Linear Regression, the model function is a linear combination ofparameters. Such as y = mx + c, i.e the mode can be represent astraight line.

In Non-linear Regression, the parameters appears as a non-linearcombination of parameter. Such y = x3 + 5e-3

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Construction of Regression Models

Selection of independent variables Functional form of regression relation Scope of model

– Least square and correlation based

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Linear Regression – General Principle

A linear relationship between twovariables x and y can be expressedby the equation,

y = mx + cWhere,

y is the dependent variablex is independent variablem and c are constants

In the general linear equation,

The value of m is called the slope. The slope determines how much they variable will change when x is increased or decreased by one point

The value of c in the general equation is called the Y-intercept. Itdetermines the value of y when x=0

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Least Squares Regression Principle

Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

The Least Squares Solution

For each value of x in the data, this equation will determine the point on theline that gives the best prediction of y

The problem is to find the specific values for m and c that will make this linethe best fitting. Least squares estimate of m

Where:SP is the sum of productsSSx is the sum of squares for the X scores and

m =SPSSx

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Example of Regression Analysis

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Standard Error of Estimate A regression equation, by itself, allows you to make predictions, but it does not

provide any information about the accuracy of the predictions

The standard error of estimate gives a measure of the standard distancebetween a regression line and the actual data points

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Error Estimation Formula To calculate the standard error of estimate Find a sum of squared deviations

(SS) This sum of squares is commonly called SSerror

SSerror = Σ(Y-Ŷ)2

The obtained SS value is then divided by its degrees of freedom to obtain a measure of variance. The df for standard error of estimate are

df = n – 2

The standard error of estimate provides a measure of how accurately the regression equation predicts the y value, Standard Error =

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Error Estimation Example

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

• The relationship between the variables is linear.

• Both variables must be at least interval scale.

• The least squares criterion is used to determine the equation.

Assumptions

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

ExampleIt is anticipated that climate change will make the sea more rough thanever before. It may impact on erosion in Seashore line. Data are collectedabout average wave height (in meter) during cyclone and Erosion inseashore (cm/cyclone event). Try to find out a relation for futureprediction of Seashore erosion due to more rough sea.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Example: Solution

10.0

15.0

20.0

25.0

30.0

35.0

1.5 2.0 2.5 3.0 3.5

Y = mX + c

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Example: Solution Y = mX + c

Calculate m and Calculate c

m = 9.585c = -1.00

Y = 9.585X – 1.00

Erosion =9.585 x Height – 1.00

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Example: Solution

Erosion =9.585 x Height – 1.00

Error = 4.7778 cm

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Example: Solution

Erosion =9.585 x Height – 1.00

With Error = 4.7778 cm

If Height is 4.0 m

Erosion = 9.585 x Height – 1.00= 39.36 cm

=34.14 to 44.59 cm

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Regression Analysis – Least Squares Principle

The least squares principle is used to obtain a and b.

The equations to determine a and b are:

bn XY X Y

n X X

aY

nb

Xn

( ) ( )( )( ) ( )

2 2

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Correlation Based Method: Computing the Slope

Y = mX + c

Calculate Slope m;Calculate Intercept c

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Computing the Y-Intercept

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Illustration of the Least Squares Regression Principle

Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

It is anticipated that climate change will make the sea more rough thanever before. It may impact on erosion in Seashore line. Data are collectedabout average wave height (in meter) and Erosion in seashore (cm/year).Try to find out a relation for future prediction of Seashore erosion due tomore rough sea.

Regression Equation - Example

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Regression Equation - Example

Correlation Coefficient, r = 0.99257Sx = 0.5243Sy = 5.0652

m = r (Sy/Sx) = 0.99257 x (5.0652/0.5243)= 9.589

c = -1.01

Y = 9.589X - 1.01

Erosion = 9.589 x Height - 1.01

It was by least square method: Erosion =9.585 x Height – 1.00Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Assumptions in Linear Regression Model

For each value of X, there is a group of Y values, and these Y values are normally distributed. The means of these normal

distributions of Y values all lie on the straight line of regression. The standard deviations of these normal distributions are equal. The Y values are statistically independent. This means that in the

selection of a sample, the Y values chosen for a particular X valuedo not depend on the Y values for any other X values.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Confidence Interval Estimates of Y

A confidence interval reports the mean value of Y for a given X.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Confidence Interval Estimates of Y

Erosion = 9.585 x Height – 1.00

If Height is 4.0 m

Erosion = 9.585 x 4.0 – 1.00= 39.36 cm

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Confidence Interval Estimates of Y

Erosion = 9.585 x Height – 1.00

If Height is 2.5 mErosion = 9.585 x Height – 1.00= 23.0 cm

Degree of Freedom, df = n-2 = 11-2 = 9t(0.05; 9) = 2.262Serr = 4.7778Y(predicted) = 23.0Confidence Interval = 23.0 ± 3.32

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Confidence Interval Estimates of Y

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Confidence Interval of Y

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Prediction Interval Estimates of Y

A prediction interval reports the range of values of Y for a particular value of X.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Prediction Interval Estimates of Y

Erosion = 9.585 x Height – 1.00

If Height is 4.0 mErosion = 9.585 x Height – 1.00= 39.365 cm

Degree of Freedom, df = n-2 = 11-2 = 9t(0.05; 11) = 2.262Serr = 4.7778Y(predicted) at 4.0 m height = 22.26Prediction Interval = 39.365 ± 15.37

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Confidence Interval and Confidence Interval of Y

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Transforming Data

The coefficient of correlation describes the strength ofthe linear relationship between two variables. It could bethat two variables are closely related, but thererelationship may not be linear.

Be cautious when you are interpreting the coefficient ofcorrelation. A value of r may indicate there is no linearrelationship, but it could be there is a relationship ofsome other nonlinear or curvilinear form.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Non-linear Data

The correlation between theRainfall and River Dischare is0.782. This is a fairly stronginverse relationship.

However, when we plot thedata on a scatter diagram therelationship does not appearto be linear; it does not seemto follow a straight line.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Transforming Data What can we do to explore other (nonlinear) relationships?

One possibility is to transform one of the variables. Forexample, instead of using Y as the dependent variable, wemight use its log, reciprocal, square, or square root.

Another possibility is to transform both of the variable in thesame way.

There are many other transformations, but log, reciprocal,square, or square root are the most common.

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Transforming Data

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

After log transformation of River Discharge Data we got the regression equation as:

Transforming Data

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

• The value 6.4372 is the log to the base 10 of winnings.• The antilog of 6.4372 is 2.736 • Therefore, when rainfall is 70mm, discharge is 2.736 cumec.

Transforming Data

Prediction of River Discharge from Rainfall. What is discharge when rainfall is 70 mm?

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

Interpretation of Regression Equation

Y = mX + c

What does m mean?What does c mean?

Let we got a regression equation:

Y = 10.2 X + 21.9

How will you interpret it?

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

How will you interpret the following regression equation:

Y = 10.2 X + 21.9

Y = 10.2 X – 21.9

Y = 21.9 – 10.2 X

Interpretation of Regression Equation

11/23/2015 Shamsuddin Shahid, FKA, UTM

You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

top related