sgdy 5063- reliability (group 2)

Upload: syahrul-ghafar

Post on 04-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    1/41

    SGDY 5063

    Educational & PsychologicalMeasurement & Evaluation

    Presented by:Mohd Hafiz Bin Mohd Salleh (810350)

    Syahrul Azrin Binti Ghafar (810398)Aza Nurain Binti Azmey ( 811333 )Nur Zahirah Binti Othman ( 811562)

    Supervised by: Tuan Haji Shuib Bin Hussain

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    2/41

    2

    PRESENTATION OVERVIEW What is reliability?

    Practical method of estimating reliability

    The reliability of Criterion-Referenced and Mastery Tests

    Condition affecting reliability coefficients

    The standard error of measurementSource of error

    Error components of different reliability coefficients

    The reliability of individual and group scoresThe reliability of difference scores

    Ways to improve Norm-Referenced and Criterion-Referenced Reliability

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    3/41

    3

    Introduction to Test Reliability

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    4/41

    4

    Are the scores consistent?

    Are they stable?

    Introduction to Test Reliability

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    5/41

    5

    WHAT IS RELIABILITY?

    O Reliability refers to the reliability of a test scoreor set of test scores, not the reliability of the test.

    O Reliability refers to the consistency of a

    measure. A test is considered reliable if we getthe same result repeatedly. (C. Kendra, 2012)O Reliability is not the same as validity validity

    asks Does a test measure what is suppose to?

    Technically reliability is defined as =True variance divided by obtained variance

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    6/41

    6

    Practical Methods of Estimating Reliability

    O Because teachers do not know an individuals truescore, they cannot know the amount of error for any

    given person.O We can estimate the effect of chance on

    measurements in general.Consistency of measurement means high

    reliability

    The degree of consistency can be determined by acorrelation coefficient

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    7/41

    7

    Practical Methods of Estimating Reliability

    Consistency of measurementmeans high reliability

    The degree of consistency can be determined by acorrelation coefficient

    Correlation is the generic term for all measures of

    relationship and reliability may be thought of as a special type of correlation that measuresconsistency of observation scores

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    8/41

    8

    Practical Methods of Estimating Reliability

    O The perfect test would be unaffected by the sources of unreliability and on this perfect test each examineeshould get his or her true score . Unfortunately, we know

    the observed score we get was likely affected by one or more of the sources of unreliability.O So, our observed score is likely too high or to low. The

    difference between the observed score and the true scorewe call the error score ; and this score can be positive or

    negative.O We can express this mathematically as:

    O True Score = Obtained Score +/- Error O T = O +/- E (or, looking at it another way, O = T +/- E)

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    9/41

    9

    Practical Methods of Estimating Reliability

    Test Retest(stability)

    AlternateForms(equivalent)

    InternalConsistency

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    10/41

    10

    Practical Methods of Estimating Reliability

    O The test-retest reliability method is one of the simplestways of testing the stability and reliability of an instrumentover time. (Martyn Shuttleworth , 2009)

    O For example, if a group of students takes a test, we wouldexpect them to show very similar results if they take thesame test a few months later. This definition relies uponthere being no confounding factor during the interveningtime interval.

    O Instruments such as IQ tests and surveys are primecandidates for test-retest methodology, because there islittle chance of people experiencing a sudden jump in IQor suddenly changing their opinions.

    O Educational tests are often not suitable, because studentswill learn much more information over the intervening

    period and show better results in the second test.

    http://www.experiment-resources.com/confounding-variables.htmlhttp://www.experiment-resources.com/confounding-variables.htmlhttp://www.experiment-resources.com/confounding-variables.htmlhttp://www.experiment-resources.com/confounding-variables.html
  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    11/41

    11

    Practical Methods of Estimating Reliability

    O An alternate form reliability is the authenticity stablished bycarrying out two different forms of the same test to the sameindividuals. (psychwiki.com, 2012)

    O This method is convenient to avoid the problems that comefrom the test-retest method. With the alternate form reliabilitymethod, an individual is tested on one form of the test, andthen again on a comparable second form, the in-between timeis about one week.

    O This method is used more than the test-restest methodbecause it has fewer related problems, including anabundance reduction in practice effects.

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    12/41

    12

    Practical Methods of Estimating Reliability

    O A procedure for studying reliability when the focus of theinvestigation is on the consistency of scores on the sameoccasion and on similar content, but when conductingrepeated testing or alternate forms testing is not possible.

    a series of formulas based on dichotomously scored items

    Kuder-Richardson

    Cronbachs (most widely used as can be used with

    continuous item types)

    Coefficient alpha

    Spearman-Brown correction to apply to full test(easiest to do and understand)

    Split-half (odd-even )

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    13/41

    13

    Reliability of Classroom TestsO We would recommend doing Split-Half Reliability.O Step 1 Split the test into two parts (odd even).O Step 2 Use Pearson Product Moment Correlation -

    Ungrouped Data to determine r xy

    (r xy represents the correlation between the two halves of the scale).By doing the split-half we reduce the number of items whichwe know will automatically reduce the reliability,

    O Step 3 To estimate reliability of whole test then use the

    Spearman Brown correction formula r sb = 2r xy /(1+r xy)

    where r sb is the split-half reliability coefficient

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    14/41

    14

    As a Teacher, What Do We Need

    to Know Most About Reliability O For tests we can create:

    O Increasing number of items increases reliability.O Moderate difficulty level increases reliability.O Having items measuring similar contentincreases reliability.

    O For standardized tests we can use:O Look for each tests published reliability data. O Use the published reliability coefficient to

    determine the Standard Error of Measurement(abbreviated SEM) found in the dataO See the following illustration

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    15/41

    15

    Estimating Reliability

    O If we could re-administer a test to oneperson an infinite number of times,

    what would expect the distribution oftheir scores to look like?

    O Answer: The Bell Shaped Curve.

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    16/41

    16

    THE RELIABILITY OFCRITERION-REFERENCED TESTS

    O Because the purpose of a criterion-referenced test is quitedifferent from that of a norm-referenced test, it should not besurprising to find that the approaches used for reliability aredifferent too. With criterion-referenced tests, scores are often

    used to sort candidates into performance categories.O Variation in candidate scores is not so important if candidates

    are still assigned to the same performance category. Therefore,it has been common to define reliability for a criterion-referenced test as the extent to which performanceclassifications are consistent over parallel-form administrations.

    O For example, it might be determined that 80% of the candidatesare classified in the same way by parallel forms of a criterion-referenced test administered with little or no instruction inbetween test administrations. This is similar to parallel formreliability for a norm-referenced test except the focus withcriterion-referenced tests is on the decisions rather than thescores.

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    17/41

    17

    THE LIVINGSTON APPROACHO Formula for estimating the reliability of criterion-referenced

    measures

    =+ ( )

    + ( )

    O Mean score = Criterion scoreO Livingston coefficient = reliability by conventional norm-referenced

    methodsO Mean score Criterion score

    O Livingston coefficient > reliability by conventional norm-referenced

    methods

    O WEAKNESSESO Chester Harris SEM is the same does not imply a more

    dependable determination

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    18/41

    18

    THE PERCENTAGE- AGREEMENT APPROACH

    O Criterion-referenced & mastery test are oftenadministered to classify applicants/students into oneor two groups (whether has mastered test content

    or not according to some criterion)O Simple to compute

    O =

    +

    O WEAKNESSESO Will be affected by the number of persons being

    testedO Cutoff score & its closeness to the tests mean score

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    19/41

    19

    Condition Affecting Reliability Coefficients

    Test Scoring

    Test Content

    Test Administration

    Personal Conditions

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    20/41

    20

    Test ScoringO Scorer reliability is the extent to which different

    observers or raters agree with one another asthey mark the same set of papers

    O Difference between two scorers judgmentsO One scorer over time (fatigue) and/or halo effect

    The higher the extent ofagreement is, the higher the

    scorer reliability will be

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    21/41

    21

    Test ContentO The sample of test items is too smallO The sample of test items is not evenly

    selected across material

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    22/41

    22

    Test AdministrationO Instructions with the test may contain errors

    that create another type of systemic error.These errors exist in the instructions providedto the test-taker. Instructions that interfere with

    accurately gathering information (such as atime limit when the measure the test is seekinghas nothing to do with speed) reduce thereliability of a test.

    O Noise and surroundingO Physical condition

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    23/41

    23

    Personal ConditionsO factors related to the test-taker, such as

    poor sleep, feeling ill, anxious or

    "stressed-out" are integrated into the testitself O Temporary ups and downs

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    24/41

    24

    THE STANDARD ERROR OF

    MEASUREMENT (SEM) The formula for the standard error of measurement is

    SEmeas = SD 1reliabilitywhere SD equals the standard deviation of obtained scores.

    So, if you have a test with a standard deviation (SD) of 4.89, Estimate reliability of .91, the standard error of measurement would be calculated as follows:

    SEmeas = SD 1reliability = 4.89 SD 4.89 (0.3) = 1.467 1.47

    The standard error of measurement is expressed in thesame unit as the standard deviation

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    25/41

    25

    THE RELIABILITY OF

    INDIVIDUAL & GROUP SCORES O Decisions involving a single individual will require a muchhigher degree of reliability than is necessary for evaluating

    of groups

    O Truman Kelley (1927) suggested:

    - Reliabilities of at least .94 for any students

    - Reliabilities of not less than .50 for groups of

    students

    O Ideally, reliability should be +1.00, but in practise, test users

    will have to settle for less than that.

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    26/41

    26

    THE RELIABILITY OF DIFFERENCE SCORES

    O Sometimes testers want to compare differences betweenpretest and posttest.

    O The reliability of difference scores depends on the

    reliability of each test and the correlation between them.

    O The following formula from Gulliksen (1950) can be usedto estimate the reliability of difference scores.

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    27/41

    27

    THE RELIABILITY OF DIFFERENCE SCORES

    O e.g : The reliability of two tests is .90 and they correlate .80

    Reliability of difference scores =average reliability correlation between tests

    1 correlation between test

    Reliability of difference scores =.90 .801 .80 = .50

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    28/41

    28

    THE RELIABILITY OF DIFFERENCE SCORES

    O When the test reliability equals the correlation betweenthe test, the reliability of difference scores will be 0

    O When the reliability of each test is +1.0 the reliability of

    difference scores will also equal +1.0

    O Reasons for typically low reliability is each of the two testcompared contains error of measurement.

    O To increase the reliability of difference scores testers can:a) Select or construct highly reliable testb) Increase the number of items on the test

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    29/41

    29

    THE RELIABILITY OF DIFFERENCE SCORES

    O When scores do not measure improvement but are merely

    comparisons of two different test scores, the correlation between

    the two tests is usually lower than the correlation between obtain

    and gain scores.

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    30/41

    30

    THE RELIABILITY OF DIFFERENCE SCORES

    O Difference or gain scores, suffer from other deficiencies :

    - Gain scores do not necessarily measure improvement in

    knowledge.

    - Students who initially perform at a high level cannot be expected tomake gains equal to those made by students in the middle or lower

    parts.

    - The regression effect can account for differences among students,

    especially when they are initially selected on the basis of exceptionally high or low scores.

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    31/41

    31

    SOURCES OF ERROR

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    32/41

    32

    SOURCES OF ERROR Reliability coefficients provide estimates of error that vary in

    O magnitude depending on the conditions allowed to influence

    O them.

    3 major sources of error :

    - characteristics of students

    - characteristics of the test

    - conditions affecting test administration and scoring

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    33/41

    33

    CHARACTERISTICS OF STUDENTSO The true level of a students knowledge or ability is an

    example of a desired source of true variance.O The outcome of a students true level of knowledge is

    sometimes debatable.O They are usually temporary situations and undesirable

    conditionO Not under the direct control of teachersO The factors related to this :

    Test wiseness Illness and fatigues (sick, not enough rest) Lack of motivation ( nervous, anxiety, fear, family problem)

    O Any temporary and unpredictable change in student can beconsidered intra-individual error or error within test takers .

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    34/41

    34

    CHARACTERISTICS OF THE TEST O Tricky questions

    O Ambiguous questions

    O Confusing format

    O Questions excessively difficult

    O Contain too few items

    O Include dissimilar item content

    O Reading level that is too high

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    35/41

    35

    CONDITIONS AFFECTING

    TEST ADMINISTRATION & SCORINGO Physical environment (temperature, humidity, lighting,

    seating conditions / arrangements, avoidance of

    distractions)

    O Instructions given to examineesO (their clarity, complexity, consistency, ambiguity, age differences,

    idiosyncratic mannerism, racial, ethnic backgrounds)

    O Scoring

    O Scorers made arithmetic errors, fail to understand and abide scoring criteria,

    mistakes in recording scores, bias etc

    36

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    36/41

    36

    ERROR COMPONENTS OF DIFFERENTS

    RELIABILITY COEFFICIENTS STABILITY EQUIVALENCE STABILITY &

    EQUIVALENCEINTERNAL

    CONSISTENCY orHOMOGENEITY

    Different stabilitycoefficient will be obtained

    over different time

    Will be known wheneverone form of a test be

    substituted for another

    Will be increased due tochanges in students or

    items

    Stresses on the importanceof making sure that

    guessing, item ambiguity,

    difficulty levels, directions

    and scoring criteria and

    methods are controlled aspossible

    37

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    37/41

    37

    WAYS TO IMPROVE

    NORM REFERENCED RELIABILITY 1. Increase the number of good quality items2. Construct easy items to reduce guessing.3. Construct items to measure the same trait or ability.4. Avoid using tricky and ambiguous questions.5. Prepare the test to permit objective scoring.6. Make sure the pictorial material is readily identifiable and do not

    crowd items on the page.7. Remember that means tend to be reliable than individual

    scores.8. High scores tend to be more unreliable than average scores9. Avoid the use of gain or difference scores.10. Make test instructions to students clear and consistent.

    38

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    38/41

    38

    WAYS TO IMPROVECRITERION REFERENCED

    RELIABILITY 1. If reasonable, develop test in which there is a substantial difference

    between the test mean and the cutoff score

    2. Include as many items as possible

    3. Make sure that objectives are as specific as possible.

    4. Make sure that scoring is as specific as possible.

    5. Provide students with practise items that are similar in format to the

    test they will be taking.

    6. Design the test format to be clear and uncluttered

    7. Assuming score variability, use the method that improve reliability for

    norm-reference test.

    39

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    39/41

    39

    REFERENCESO Gilbert S. , James W.N. (1997). Principles of Educational And Psychological Measurement And Evaluation. United States: Wadsworth Publishing.

    O

    Lammers, W. J., and Badia, P. (2005). Fundamental of Behavioral Research . California: Thomson andWadsworth.

    O Reliability, http://www.psychwiki.com, retireved on 3 April 2012.

    O Martyn Shuttleworth (2009), http://www.experiment-resources.com/test-retest-reliability.html, retrieved on3 April 2012

    40

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    40/41

    40

    41

  • 7/31/2019 SGDY 5063- Reliability (Group 2)

    41/41

    41

    O Confounding variables are variables thatthe researcher failed to control, or eliminate, damaging the internal validity of an experiment.

    http://www.experiment-resources.com/internal-validity.htmlhttp://www.experiment-resources.com/internal-validity.html