regresi linear berganda 2014

Upload: sentinel-berg

Post on 11-Feb-2018

268 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/23/2019 Regresi Linear Berganda 2014

    1/50

    Program S2 Teknik Sipil

    Regresi Linear Berganda

    Statistika

  • 7/23/2019 Regresi Linear Berganda 2014

    2/50

    Model Regresi Linear Berganda

    Mengkaji hubungan linear antara variabel tak bebas (y)dengan 2 atau lebih variabel bebas(xi)

    xxxy kk22110

    kk22110 xbxbxbby

    Model Regresi Linear Berganda dari Populasi:

    Y-intercept Population slopes Random Error

    Estimated(or predicted)value of y

    Estimated slope coefficients

    Model Regresi Linear Berganda Dugaan:

    Estimatedintercept

  • 7/23/2019 Regresi Linear Berganda 2014

    3/50

    Model 2 variabel x

    y

    x1

    x2

    22110 xbxbby

    Model Regresi Linear Berganda

  • 7/23/2019 Regresi Linear Berganda 2014

    4/50

    Model 2 variabel x

    y

    x1

    x2

    22110 xbxbby yi

    yi

  • 7/23/2019 Regresi Linear Berganda 2014

    5/50

    Galat berdistribusi normal

    Mean dari galat = 0

    Galat memiliki ragam konstan (ragamhomogen)

    Galat model saling bebas

    e = (yy)

  • 7/23/2019 Regresi Linear Berganda 2014

    6/50

    Model Regresi Linear Berganda dengan vektornobservasi dalam variabel y dan Kvariabel x:

    y

    y

    y

    y

    x x x

    x x x

    x x xn

    K

    K

    n n nK K n

    1

    2

    11 12 1

    21 22 2

    1 2

    1

    2

    1

    2

    = X +

    Model Regresi Linear Bergandadengan Notasi Matriks

  • 7/23/2019 Regresi Linear Berganda 2014

    7/50

    Pendugaan Parameter Modeldengan Metode OLS

    2

    1

    A digression on multivariate calculus.Matrix and vector derivatives.

    Derivative of a scalar with respect to a vector

    Derivative of a column vector wrt a row vector

    n

    iie

    e e = (y - Xb)'(y - Xb)

    Other derivatives

  • 7/23/2019 Regresi Linear Berganda 2014

    8/50

    2

    Note: Derivative of 1x1 wrt Kx1 is a Kx1 vector.

    Solution

    (y - Xb)'(y - Xb)X'(y - Xb) = 0

    b

    (1x1)/ (kx1) (-2)(nxK)'(nx1)

    = (-2)(Kxn)(nx1) = Kx1

    : X'y = X'Xb

    Pendugaan Parameter Modeldengan Metode OLS

  • 7/23/2019 Regresi Linear Berganda 2014

    9/50

    -1

    1

    1

    Assuming it exists: = ( )

    Note the analogy: = Var( ) Cov( ,y)

    1 1 =

    Suggests something desirable about least squares

    n n

    b X'X X'y

    x x

    b X'X X'y

    Pendugaan Parameter Modeldengan Metode OLS

  • 7/23/2019 Regresi Linear Berganda 2014

    10/50

    2

    2

    =

    column vector =

    row vector

    = 2

    (y - Xb)'(y - Xb)X'(y - Xb)

    b

    (y - Xb)'(y - Xb)(y - Xb)'(y - Xb) b

    b b b

    X'X

    Pendugaan Parameter Modeldengan Metode OLS

  • 7/23/2019 Regresi Linear Berganda 2014

    11/50

    Does bMinimize ee?

    2

    1 1 1 1 2 1 1

    221 2 1 1 2 1 2

    2

    1 1 1 2 1

    ...

    ...2

    ... ... ... ......

    If there were a single b, we would require this to be

    po

    n n n

    i i i i i i i iK

    n n n

    i i i i i i i iK

    n n n

    i iK i i iK i i iK

    x x x x x

    x x x x x

    x x x x x

    e'eX'X = 2

    b b'

    2

    1sitive, which it would be; 2 = 2 0.

    The matrix counterpart of a positive number is a

    positive definite matrix.

    n

    iix

    x'x

  • 7/23/2019 Regresi Linear Berganda 2014

    12/50

    Multiple Coefficient ofDetermination

    Reports the proportion of total variation in yexplained by all x variables taken together

    squaresofsumTotal

    regressionsquaresofSum

    SST

    SSRR2

  • 7/23/2019 Regresi Linear Berganda 2014

    13/50

    Adjusted R2

    R2 never decreases when a new x variable isadded to the model

    This can be a disadvantage when comparing

    models What is the net effect of adding a new variable?

    We lose a degree of freedom when a new xvariable is added

    Did the new x variable add enoughexplanatory power to offset the loss of onedegree of freedom?

  • 7/23/2019 Regresi Linear Berganda 2014

    14/50

    Shows the proportion of variation in y explainedbyall x variables adjusted for the number of xvariables used

    (where n = sample size, k = number of independent variables)

    Penalize excessive use of unimportant independentvariables

    Smaller than R2

    Useful in comparing among models

    Adjusted R2

    1kn1n)R1(1R 22A

  • 7/23/2019 Regresi Linear Berganda 2014

    15/50

    Is the Model Significant?

    F-Test for Overall Significance of the Model

    Shows if there is a linear relationship between all

    of the x variables considered together and y

    Use F test statistic

    Hypotheses:

    H0

    : 1

    = 2

    = = k

    = 0 (no linear relationship)

    HA: at least one i 0 (at least one independentvariable affects y)

  • 7/23/2019 Regresi Linear Berganda 2014

    16/50

    F-Test for Overall Significance

    Test statistic:

    where F has (numerator) D1= k and(denominator) D2= (nk - 1)

    degrees of freedom

    MSE

    MSR

    kn

    SSEk

    SSR

    F

    1

  • 7/23/2019 Regresi Linear Berganda 2014

    17/50

    H0: 1= 2= 0

    HA: 1and 2not both zero

    = .05

    df1= 2 df2= 12

    Test Statistic:

    Decision:

    Conclusion:

    Reject H0at = 0.05

    The regression model does explaina significant portion of the variationin pie sales

    (There is evidence that at least oneindependent variable affects y)

    0

    = .05

    F.05

    = 3.885Reject H0Do not

    reject H0

    6.5386MSE

    MSRF

    CriticalValue:

    F

    = 3.885

    F-Test for Overall Significance

    F

  • 7/23/2019 Regresi Linear Berganda 2014

    18/50

    Are Individual VariablesSignificant?

    Use t-tests of individual variable slopes

    Shows if there is a linear relationship between thevariable xiand y

    Hypotheses:

    H0: i = 0 (no linear relationship)

    HA: i 0 (linear relationship does existbetween xiand y)

  • 7/23/2019 Regresi Linear Berganda 2014

    19/50

  • 7/23/2019 Regresi Linear Berganda 2014

    20/50

    Distributor pie beku sebagai makanan pencucimulut ingin mengevaluasi faktor-faktor yangmempengaruhi permintaan:

    Dependent variable: Jumlah penjualan Pie(units per week)

    Independent variables: Harga (in $)

    Advertising ($100s)

    Data dikumpulkan selama 15 weeks

    ContohModel Regresi Linear Berganda

  • 7/23/2019 Regresi Linear Berganda 2014

    21/50

    Sales = b0+ b1(Price)

    + b2

    (Advertising)

    WeekPie

    SalesPrice

    ($)Advertising

    ($100s)

    1 350 5.50 3.3

    2 460 7.50 3.3

    3 350 8.00 3.0

    4 430 8.00 4.5

    5 350 6.80 3.0

    6 380 7.50 4.0

    7 430 4.50 3.0

    8 470 6.40 3.7

    9 450 7.00 3.5

    10 490 5.00 4.0

    11 340 7.20 3.5

    12 300 7.90 3.2

    13 440 5.90 4.0

    14 450 5.00 3.5

    15 300 7.00 2.7

    Model Regresi dugaan:

    ContohModel Regresi Linear Berganda

  • 7/23/2019 Regresi Linear Berganda 2014

    22/50

    Slope (bi)

    Estimates that the average value of y changes by biunits for each 1 unit increase in Xiholding all othervariables constant

    Example: if b1= -20, then sales (y) is expected todecrease by an estimated 20 pies per week for each$1 increase in selling price (x1), net of the effects ofchanges due to advertising (x2)

    y-intercept (b0) The estimated average value of y when all xi= 0

    (assuming all xi= 0 is within the range of observedvalues)

    ContohModel Regresi Linear Berganda

  • 7/23/2019 Regresi Linear Berganda 2014

    23/50

    Regression Statist ics

    Multiple R 0.72213

    R Square 0.52148

    Adjusted RSquare 0.44172

    Standard Error 47.46341Observations 15

    ANOVA df SS MS F Signif icance FRegression 2 29460.027 14730.013 6.53861 0.01201

    Residual 12 27033.306 2252.776

    Total 14 56493.333

    Coeff ic ient

    sStandard

    Error t Stat P-value Low er 95% Upper 95%

    Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

    Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

    Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

    .5214856493.3

    29460.0

    SST

    SSRR2

    52.1% of the variation in pie sales

    is explained by the variation inprice and advertising

    Multiple Coefficient ofDetermination

  • 7/23/2019 Regresi Linear Berganda 2014

    24/50

    ertising)74.131(Advce)24.975(Pri-306.526Sales

    b1= -24.975:saleswill decrease, onaverage, by 24.975

    pies per week foreach $1 increase inselling price, net ofthe effects of changesdue to advertising

    b2= 74.131:sales willincrease, on average,by 74.131 pies per

    week for each $100increase inadvertising, net of theeffects of changesdue to price

    whereSales is in number of pies per week

    Price is in $Advertising is in $100s.

    ContohModel Regresi Linear Berganda

  • 7/23/2019 Regresi Linear Berganda 2014

    25/50

    Using The Model to MakePredictions

    Predict sales for a week in which the sellingprice is $5.50 and advertising is $350:

    Predicted salesis 428.62 pies

    428.62

    (3.5)74.131(5.50)24.975-306.526

    ertising)74.131(Advce)24.975(Pri-306.526Sales

    Note that Advertising isin $100s, so $350means that x2= 3.5

  • 7/23/2019 Regresi Linear Berganda 2014

    26/50

    options nodate nonumber;

    data pie;

    input week sales price adverts;

    cards;

    1 350 5.5 3.3

    2 460 7.5 3.3

    3 350 8 34 430 8 4.5

    5 350 6.8 3

    6 380 7.5 4

    7 430 4.5 3

    8 470 6.4 3.7

    9 450 7 3.5

    10 490 5 4

    11 340 7.2 3.5

    12 300 7.9 3.2

    13 440 5.9 4

    14 450 5 3.5

    15 300 7 2.7

    ;;;

    proc reg data = pie;

    model sales=price adverts;

    run;

    Model Regresi Linear Bergandadengan SAS

  • 7/23/2019 Regresi Linear Berganda 2014

    27/50

    The SAS SystemThe REG Procedure

    Model: MODEL1

    Dependent Variable: sales

    Analysis of Variance

    Sum of Mean

    Source DF Squares Square F Value Pr > F

    Model 2 29460 14730 6.54 0.0120

    Error 12 27033 2252.77554

    Corrected Total 14 56493

    Root MSE 47.46341 R-Square 0.5215Dependent Mean 399.33333 Adj R-Sq 0.4417

    Coeff Var 11.88566

    Model Regresi Linear Bergandadengan SAS

    p-value darimodel

    R2dari model

  • 7/23/2019 Regresi Linear Berganda 2014

    28/50

    Parameter Estimates

    Parameter Standard

    Variable DF Estimate Error t Value Pr > |t|

    Intercept 1 306.52619 114.25389 2.68 0.0199price 1 -24.97509 10.83213 -2.31 0.0398

    adverts 1 74.13096 25.96732 2.85 0.0145

    Model Regresi Linear Bergandadengan SAS

    p-value dari

    masing-masingvariabel

    Nilai dugaan darimasing-masing

    parameter

  • 7/23/2019 Regresi Linear Berganda 2014

    29/50

    Example 2

    Many possible factors - not all significant

    Fungal toxin contamination of seed

    pods to be used as a drug source

    Contoh 2Model Regresi Linear Berganda

  • 7/23/2019 Regresi Linear Berganda 2014

    30/50

    Collect batches of seed pods from a variety of locations.

    For each location during June-July (When seeds are forming) note :

    Temp: Mean noon temp (C)

    Wind: Mean wind speed (Km/h)

    Sun: Mean daily sunshine (h)

    Rain Total rainfall (cms/month)

    For each batch of pods, note:

    Conc of toxin (mg/100g)

    Data collection

    Possiblepredictors

    Dependent variable

  • 7/23/2019 Regresi Linear Berganda 2014

    31/50

    Temp Wind Sun Rain Toxin

    20.9 13.3 6.23 13.0 18.125.4 10.8 8.13 22.8 28.628.2 10.9 10.21 11.1 15.9

    23.7 8.2 6.96 7.4 19.226.5 9.8 9.04 13.2 19.323.9 12.3 7.84 5.1 14.826.7 10.0 6.69 15.6 21.730.0 12.2 8.30 13.2 16.524.9 10.7 9.22 20.5 23.8

    22.0 15.0 8.37 13.7 19.0

    Toxin content (m

    g/100g) andweather conditions at ten sites

  • 7/23/2019 Regresi Linear Berganda 2014

    32/50

    Contoh Toxin dengan SAS

    options nodate nonumber;

    data toxin;

    input Temp Wind Sun Rain Toxin;

    cards;

    20.9 13.3 6.23 13.0 18.1

    25.4 10.8 8.13 22.8 28.6

    28.2 10.9 10.21 11.1 15.9

    23.7 8.2 6.96 7.4 19.2

    26.5 9.8 9.04 13.2 19.3

    23.9 12.3 7.84 5.1 14.8

    26.7 10.0 6.69 15.6 21.7

    30.0 12.2 8.30 13.2 16.5

    24.9 10.7 9.22 20.5 23.822.0 15.0 8.37 13.7 19.0

    ;;;

    proc reg data = toxin;

    model Toxin = Temp Wind Sun Rain;

    run;

  • 7/23/2019 Regresi Linear Berganda 2014

    33/50

    Contoh Toxin dengan SAS

    The SAS System

    The REG Procedure

    Model: MODEL1

    Dependent Variable: Toxin

    Analysis of Variance

    Sum of Mean

    Source DF Squares Square F Value Pr > F

    Model 4 139.78209 34.94552 14.11 0.0062

    Error 5 12.38691 2.47738

    Corrected Total 9 152.16900

    Root MSE 1.57397 R-Square 0.9186

    Dependent Mean 19.69000 Adj R-Sq 0.8535Coeff Var 7.99375

    Equation issignificant

  • 7/23/2019 Regresi Linear Berganda 2014

    34/50

    Contoh Toxin dengan SAS

    Parameter Estimates

    Parameter Standard

    Variable DF Estimate Error t Value Pr > |t|

    Intercept 1 31.60838 7.10506 4.45 0.0067

    Temp 1 -0.42013 0.24131 -1.74 0.1421

    Wind 1 -0.79356 0.29770 -2.67 0.0446

    Sun 1 -0.23747 0.50857 -0.47 0.6602

    Rain 1 0.70676 0.10031 7.05 0.0009

    Two

    predictorsnot signif

  • 7/23/2019 Regresi Linear Berganda 2014

    35/50

    Removing predictors

    Temperature and Sunshine both non-significant.

    DO NOT REMOVE BOTH AT ONCE.

    May find that if we remove one of these, the

    other becomes significant.

  • 7/23/2019 Regresi Linear Berganda 2014

    36/50

    Removing predictors

    Remove non-significantpredictors one at a time until allremaining predictors are

    significant.

  • 7/23/2019 Regresi Linear Berganda 2014

    37/50

    Removing predictors

    Which is removed first?

    Usually remove the least significant factor

    (highest P value) first.But, use knowledge of the system concerned. Ifyou think a particular factor is especiallyimportant, but its P value is greater than someother factor, you might modify the order ofremoval to try to preserve the importantvariable.

  • 7/23/2019 Regresi Linear Berganda 2014

    38/50

    ToxinRemove factor with highest p-value

    Parameter Estimates

    Parameter Standard

    Variable DF Estimate Error t Value Pr > |t|

    Intercept 1 31.60838 7.10506 4.45 0.0067

    Temp 1 -0.42013 0.24131 -1.74 0.1421

    Wind 1 -0.79356 0.29770 -2.67 0.0446

    Sun 1 -0.23747 0.50857 -0.47 0.6602

    Rain 1 0.70676 0.10031 7.05 0.0009

    Remove Sunshine.Least significant

  • 7/23/2019 Regresi Linear Berganda 2014

    39/50

    options nodate nonumber;

    data toxin;

    input Temp Wind Sun Rain Toxin;

    cards;

    20.9 13.3 6.23 13.0 18.1

    25.4 10.8 8.13 22.8 28.6

    28.2 10.9 10.21 11.1 15.9

    23.7 8.2 6.96 7.4 19.2

    26.5 9.8 9.04 13.2 19.3

    23.9 12.3 7.84 5.1 14.8

    26.7 10.0 6.69 15.6 21.7

    30.0 12.2 8.30 13.2 16.5

    24.9 10.7 9.22 20.5 23.822.0 15.0 8.37 13.7 19.0

    ;;;

    proc reg data = toxin;

    model Toxin = Temp Wind Rain;

    run;

    ToxinRemove factor with highest p-value

  • 7/23/2019 Regresi Linear Berganda 2014

    40/50

    Analysis of Variance

    Sum of Mean

    Source DF Squares Square F Value Pr > F

    Model 3 139.24195 46.41398 21.54 0.0013

    Error 6 12.92705 2.15451

    Corrected Total 9 152.16900

    Root MSE 1.46782 R-Square 0.9150

    Dependent Mean 19.69000 Adj R-Sq 0.8726

    Coeff Var 7.45467

    Parameter Estimates

    Parameter Standard

    Variable DF Estimate Error t Value Pr > |t|

    Intercept 1 31.56513 6.62535 4.76 0.0031Temp 1 -0.47896 0.19193 -2.50 0.0468

    Wind 1 -0.82177 0.27184 -3.02 0.0233

    Rain 1 0.70108 0.09285 7.55 0.0003

    Fungal toxin - 3 predictorsEquation issignificant

    All three predictorsnow significant.

  • 7/23/2019 Regresi Linear Berganda 2014

    41/50

    ToxinFinal equation

    Toxin = 31.6 - 0.479 x Temp - 0.822 x Wind + 0.701 xRain

    Warm sites produce lesstoxin

    Windy sites produce lesstoxin

    Wet sites producemore toxin

    All predictions should be made using this equation, not the full equation.

    Note plus or minus signs on the three terms

    T i

  • 7/23/2019 Regresi Linear Berganda 2014

    42/50

    ToxinUsing the equation

    Consider a potential site ...Temp: 26 Wind: 11 Km/h Rain: 21cm/month

    Predicted toxin would be:

    Toxin = 31.6 - 0.479 x Temp - 0.822 x Wind + 0.701 x Rain

    = 31.6 - 0.479 x 26 - 0.822 x 11 + 0.701 x 21

    = 31.6 - 12.45 - 9.04 + 14.72

    = 24.8 mg/100g

    (A high value. Predict that this is not a good site to choose.)

  • 7/23/2019 Regresi Linear Berganda 2014

    43/50

    options nodate nonumber;

    data toxin;

    input Temp Wind Sun Rain Toxin;

    cards;

    20.9 13.3 6.23 13.0 18.1

    25.4 10.8 8.13 22.8 28.6

    28.2 10.9 10.21 11.1 15.9

    23.7 8.2 6.96 7.4 19.226.5 9.8 9.04 13.2 19.3

    23.9 12.3 7.84 5.1 14.8

    26.7 10.0 6.69 15.6 21.7

    30.0 12.2 8.30 13.2 16.5

    24.9 10.7 9.22 20.5 23.8

    22.0 15.0 8.37 13.7 19.0

    ;;;

    proc reg data = toxin;

    model Toxin = Temp Wind Sun Rain;

    run;

    proc reg data = toxin;

    model Toxin = Temp Wind Rain;

    run;

    Analisis Regresi Berganda DataToxin Secara Lengkap dengan SAS

  • 7/23/2019 Regresi Linear Berganda 2014

    44/50

    Contoh 4 Set Data Anscombe

    No Y1 X1 Y2 X2 Y3 X3 Y4 X4

    1 8,04 10 9,14 10 7,46 10 6,58 8

    2 6,95 8 8,14 8 6,77 8 5,76 8

    3 7,58 13 8,74 13 12,74 13 7,71 8

    4 8,81 9 8,77 9 7,11 9 8,84 8

    5 8,33 11 9,26 11 7,81 11 8,47 8

    6 9,96 14 8,10 14 8,84 14 7,04 8

    7 7,24 6 6,13 6 6,08 6 5,25 8

    8 4,26 4 3,10 4 5,39 4 12,50 19

    9 10,84 12 9,13 12 8,15 12 5,56 8

    10 4,82 7 7,26 7 6,42 7 7,91 8

    11 5,68 5 4,74 5 5,73 5 6,89 8

  • 7/23/2019 Regresi Linear Berganda 2014

    45/50

    4 Set Data Anscombe

    Masing-masing ke-4 data set dianalisis dengan regresi linearsederhana dan menghasilkan output MINITAB berikut:

    1. Data 1:Regression Analysis: Y1 versus X1

    The regression equation is

    Y1 = 3,00 + 0,500 X1

    Predictor Coef SE Coef T P

    Constant 3,000 1,125 2,67 0,026

    X1 0,5001 0,1179 4,24 0,002

    S = 1,23660 R-Sq = 66,7% R-Sq(adj) = 62,9%

    Analysis of Variance

    Source DF SS MS F P

    Regression 1 27,510 27,510 17,99 0,002

    Residual Error 9 13,763 1,529

    Total 10 41,273

  • 7/23/2019 Regresi Linear Berganda 2014

    46/50

    4 Set Data Anscombe2. Data 2:

    Regression Analysis: Y2 versus X2

    The regression equation is

    Y2 = 3,00 + 0,500 X2

    Predictor Coef SE Coef T PConstant 3,001 1,125 2,67 0,026

    X2 0,5000 0,1180 4,24 0,002

    S = 1,23721 R-Sq = 66,6% R-Sq(adj) = 62,9%

    Analysis of VarianceSource DF SS MS F P

    Regression 1 27,500 27,500 17,97 0,002

    Residual Error 9 13,776 1,531

    Total 10 41,276

  • 7/23/2019 Regresi Linear Berganda 2014

    47/50

    4 Set Data Anscombe

    3. Data 3:

    Regression Analysis: Y3 versus X3The regression equation is

    Y3 = 3,00 + 0,500 X3

    Predictor Coef SE Coef T P

    Constant 3,002 1,124 2,67 0,026

    X3 0,4997 0,1179 4,24 0,002

    S = 1,23631 R-Sq = 66,6% R-Sq(adj) = 62,9%

    Analysis of VarianceSource DF SS MS F P

    Regression 1 27,470 27,470 17,97 0,002

    Residual Error 9 13,756 1,528

    Total 10 41,226

  • 7/23/2019 Regresi Linear Berganda 2014

    48/50

    4 Set Data Anscombe

    4. Data 4:Regression Analysis: Y4 versus X4

    The regression equation is

    Y4 = 3,00 + 0,500 X4

    Predictor Coef SE Coef T P

    Constant 3,002 1,124 2,67 0,026

    X4 0,4999 0,1178 4,24 0,002

    S = 1,23570 R-Sq = 66,7% R-Sq(adj) = 63,0%

    Analysis of VarianceSource DF SS MS F P

    Regression 1 27,490 27,490 18,00 0,002

    Residual Error 9 13,742 1,527

    Total 10 41,232

  • 7/23/2019 Regresi Linear Berganda 2014

    49/50

    4 Set Data Anscombe

    Ternyata masing-masing ke-4 data set yangdianalisis dengan regresi linear sederhana danmenghasilkan output yang hampir sama!

    Bila dilihat berdasarkan scatter plot untukmasing-masing data adalah sebagai berikut:

  • 7/23/2019 Regresi Linear Berganda 2014

    50/50

    4 Set Data Anscombe

    Ternyatake-4 datamenghasilkan kondisiatau bentukhubungan

    yangberbeda!