sgdy 5063- reliability (group 2)
TRANSCRIPT
-
7/31/2019 SGDY 5063- Reliability (Group 2)
1/41
SGDY 5063
Educational & PsychologicalMeasurement & Evaluation
Presented by:Mohd Hafiz Bin Mohd Salleh (810350)
Syahrul Azrin Binti Ghafar (810398)Aza Nurain Binti Azmey ( 811333 )Nur Zahirah Binti Othman ( 811562)
Supervised by: Tuan Haji Shuib Bin Hussain
-
7/31/2019 SGDY 5063- Reliability (Group 2)
2/41
2
PRESENTATION OVERVIEW What is reliability?
Practical method of estimating reliability
The reliability of Criterion-Referenced and Mastery Tests
Condition affecting reliability coefficients
The standard error of measurementSource of error
Error components of different reliability coefficients
The reliability of individual and group scoresThe reliability of difference scores
Ways to improve Norm-Referenced and Criterion-Referenced Reliability
-
7/31/2019 SGDY 5063- Reliability (Group 2)
3/41
3
Introduction to Test Reliability
-
7/31/2019 SGDY 5063- Reliability (Group 2)
4/41
4
Are the scores consistent?
Are they stable?
Introduction to Test Reliability
-
7/31/2019 SGDY 5063- Reliability (Group 2)
5/41
5
WHAT IS RELIABILITY?
O Reliability refers to the reliability of a test scoreor set of test scores, not the reliability of the test.
O Reliability refers to the consistency of a
measure. A test is considered reliable if we getthe same result repeatedly. (C. Kendra, 2012)O Reliability is not the same as validity validity
asks Does a test measure what is suppose to?
Technically reliability is defined as =True variance divided by obtained variance
-
7/31/2019 SGDY 5063- Reliability (Group 2)
6/41
6
Practical Methods of Estimating Reliability
O Because teachers do not know an individuals truescore, they cannot know the amount of error for any
given person.O We can estimate the effect of chance on
measurements in general.Consistency of measurement means high
reliability
The degree of consistency can be determined by acorrelation coefficient
-
7/31/2019 SGDY 5063- Reliability (Group 2)
7/41
7
Practical Methods of Estimating Reliability
Consistency of measurementmeans high reliability
The degree of consistency can be determined by acorrelation coefficient
Correlation is the generic term for all measures of
relationship and reliability may be thought of as a special type of correlation that measuresconsistency of observation scores
-
7/31/2019 SGDY 5063- Reliability (Group 2)
8/41
8
Practical Methods of Estimating Reliability
O The perfect test would be unaffected by the sources of unreliability and on this perfect test each examineeshould get his or her true score . Unfortunately, we know
the observed score we get was likely affected by one or more of the sources of unreliability.O So, our observed score is likely too high or to low. The
difference between the observed score and the true scorewe call the error score ; and this score can be positive or
negative.O We can express this mathematically as:
O True Score = Obtained Score +/- Error O T = O +/- E (or, looking at it another way, O = T +/- E)
-
7/31/2019 SGDY 5063- Reliability (Group 2)
9/41
9
Practical Methods of Estimating Reliability
Test Retest(stability)
AlternateForms(equivalent)
InternalConsistency
-
7/31/2019 SGDY 5063- Reliability (Group 2)
10/41
10
Practical Methods of Estimating Reliability
O The test-retest reliability method is one of the simplestways of testing the stability and reliability of an instrumentover time. (Martyn Shuttleworth , 2009)
O For example, if a group of students takes a test, we wouldexpect them to show very similar results if they take thesame test a few months later. This definition relies uponthere being no confounding factor during the interveningtime interval.
O Instruments such as IQ tests and surveys are primecandidates for test-retest methodology, because there islittle chance of people experiencing a sudden jump in IQor suddenly changing their opinions.
O Educational tests are often not suitable, because studentswill learn much more information over the intervening
period and show better results in the second test.
http://www.experiment-resources.com/confounding-variables.htmlhttp://www.experiment-resources.com/confounding-variables.htmlhttp://www.experiment-resources.com/confounding-variables.htmlhttp://www.experiment-resources.com/confounding-variables.html -
7/31/2019 SGDY 5063- Reliability (Group 2)
11/41
11
Practical Methods of Estimating Reliability
O An alternate form reliability is the authenticity stablished bycarrying out two different forms of the same test to the sameindividuals. (psychwiki.com, 2012)
O This method is convenient to avoid the problems that comefrom the test-retest method. With the alternate form reliabilitymethod, an individual is tested on one form of the test, andthen again on a comparable second form, the in-between timeis about one week.
O This method is used more than the test-restest methodbecause it has fewer related problems, including anabundance reduction in practice effects.
-
7/31/2019 SGDY 5063- Reliability (Group 2)
12/41
12
Practical Methods of Estimating Reliability
O A procedure for studying reliability when the focus of theinvestigation is on the consistency of scores on the sameoccasion and on similar content, but when conductingrepeated testing or alternate forms testing is not possible.
a series of formulas based on dichotomously scored items
Kuder-Richardson
Cronbachs (most widely used as can be used with
continuous item types)
Coefficient alpha
Spearman-Brown correction to apply to full test(easiest to do and understand)
Split-half (odd-even )
-
7/31/2019 SGDY 5063- Reliability (Group 2)
13/41
13
Reliability of Classroom TestsO We would recommend doing Split-Half Reliability.O Step 1 Split the test into two parts (odd even).O Step 2 Use Pearson Product Moment Correlation -
Ungrouped Data to determine r xy
(r xy represents the correlation between the two halves of the scale).By doing the split-half we reduce the number of items whichwe know will automatically reduce the reliability,
O Step 3 To estimate reliability of whole test then use the
Spearman Brown correction formula r sb = 2r xy /(1+r xy)
where r sb is the split-half reliability coefficient
-
7/31/2019 SGDY 5063- Reliability (Group 2)
14/41
14
As a Teacher, What Do We Need
to Know Most About Reliability O For tests we can create:
O Increasing number of items increases reliability.O Moderate difficulty level increases reliability.O Having items measuring similar contentincreases reliability.
O For standardized tests we can use:O Look for each tests published reliability data. O Use the published reliability coefficient to
determine the Standard Error of Measurement(abbreviated SEM) found in the dataO See the following illustration
-
7/31/2019 SGDY 5063- Reliability (Group 2)
15/41
15
Estimating Reliability
O If we could re-administer a test to oneperson an infinite number of times,
what would expect the distribution oftheir scores to look like?
O Answer: The Bell Shaped Curve.
-
7/31/2019 SGDY 5063- Reliability (Group 2)
16/41
16
THE RELIABILITY OFCRITERION-REFERENCED TESTS
O Because the purpose of a criterion-referenced test is quitedifferent from that of a norm-referenced test, it should not besurprising to find that the approaches used for reliability aredifferent too. With criterion-referenced tests, scores are often
used to sort candidates into performance categories.O Variation in candidate scores is not so important if candidates
are still assigned to the same performance category. Therefore,it has been common to define reliability for a criterion-referenced test as the extent to which performanceclassifications are consistent over parallel-form administrations.
O For example, it might be determined that 80% of the candidatesare classified in the same way by parallel forms of a criterion-referenced test administered with little or no instruction inbetween test administrations. This is similar to parallel formreliability for a norm-referenced test except the focus withcriterion-referenced tests is on the decisions rather than thescores.
-
7/31/2019 SGDY 5063- Reliability (Group 2)
17/41
17
THE LIVINGSTON APPROACHO Formula for estimating the reliability of criterion-referenced
measures
=+ ( )
+ ( )
O Mean score = Criterion scoreO Livingston coefficient = reliability by conventional norm-referenced
methodsO Mean score Criterion score
O Livingston coefficient > reliability by conventional norm-referenced
methods
O WEAKNESSESO Chester Harris SEM is the same does not imply a more
dependable determination
-
7/31/2019 SGDY 5063- Reliability (Group 2)
18/41
18
THE PERCENTAGE- AGREEMENT APPROACH
O Criterion-referenced & mastery test are oftenadministered to classify applicants/students into oneor two groups (whether has mastered test content
or not according to some criterion)O Simple to compute
O =
+
O WEAKNESSESO Will be affected by the number of persons being
testedO Cutoff score & its closeness to the tests mean score
-
7/31/2019 SGDY 5063- Reliability (Group 2)
19/41
19
Condition Affecting Reliability Coefficients
Test Scoring
Test Content
Test Administration
Personal Conditions
-
7/31/2019 SGDY 5063- Reliability (Group 2)
20/41
20
Test ScoringO Scorer reliability is the extent to which different
observers or raters agree with one another asthey mark the same set of papers
O Difference between two scorers judgmentsO One scorer over time (fatigue) and/or halo effect
The higher the extent ofagreement is, the higher the
scorer reliability will be
-
7/31/2019 SGDY 5063- Reliability (Group 2)
21/41
21
Test ContentO The sample of test items is too smallO The sample of test items is not evenly
selected across material
-
7/31/2019 SGDY 5063- Reliability (Group 2)
22/41
22
Test AdministrationO Instructions with the test may contain errors
that create another type of systemic error.These errors exist in the instructions providedto the test-taker. Instructions that interfere with
accurately gathering information (such as atime limit when the measure the test is seekinghas nothing to do with speed) reduce thereliability of a test.
O Noise and surroundingO Physical condition
-
7/31/2019 SGDY 5063- Reliability (Group 2)
23/41
23
Personal ConditionsO factors related to the test-taker, such as
poor sleep, feeling ill, anxious or
"stressed-out" are integrated into the testitself O Temporary ups and downs
-
7/31/2019 SGDY 5063- Reliability (Group 2)
24/41
24
THE STANDARD ERROR OF
MEASUREMENT (SEM) The formula for the standard error of measurement is
SEmeas = SD 1reliabilitywhere SD equals the standard deviation of obtained scores.
So, if you have a test with a standard deviation (SD) of 4.89, Estimate reliability of .91, the standard error of measurement would be calculated as follows:
SEmeas = SD 1reliability = 4.89 SD 4.89 (0.3) = 1.467 1.47
The standard error of measurement is expressed in thesame unit as the standard deviation
-
7/31/2019 SGDY 5063- Reliability (Group 2)
25/41
25
THE RELIABILITY OF
INDIVIDUAL & GROUP SCORES O Decisions involving a single individual will require a muchhigher degree of reliability than is necessary for evaluating
of groups
O Truman Kelley (1927) suggested:
- Reliabilities of at least .94 for any students
- Reliabilities of not less than .50 for groups of
students
O Ideally, reliability should be +1.00, but in practise, test users
will have to settle for less than that.
-
7/31/2019 SGDY 5063- Reliability (Group 2)
26/41
26
THE RELIABILITY OF DIFFERENCE SCORES
O Sometimes testers want to compare differences betweenpretest and posttest.
O The reliability of difference scores depends on the
reliability of each test and the correlation between them.
O The following formula from Gulliksen (1950) can be usedto estimate the reliability of difference scores.
-
7/31/2019 SGDY 5063- Reliability (Group 2)
27/41
27
THE RELIABILITY OF DIFFERENCE SCORES
O e.g : The reliability of two tests is .90 and they correlate .80
Reliability of difference scores =average reliability correlation between tests
1 correlation between test
Reliability of difference scores =.90 .801 .80 = .50
-
7/31/2019 SGDY 5063- Reliability (Group 2)
28/41
28
THE RELIABILITY OF DIFFERENCE SCORES
O When the test reliability equals the correlation betweenthe test, the reliability of difference scores will be 0
O When the reliability of each test is +1.0 the reliability of
difference scores will also equal +1.0
O Reasons for typically low reliability is each of the two testcompared contains error of measurement.
O To increase the reliability of difference scores testers can:a) Select or construct highly reliable testb) Increase the number of items on the test
-
7/31/2019 SGDY 5063- Reliability (Group 2)
29/41
29
THE RELIABILITY OF DIFFERENCE SCORES
O When scores do not measure improvement but are merely
comparisons of two different test scores, the correlation between
the two tests is usually lower than the correlation between obtain
and gain scores.
-
7/31/2019 SGDY 5063- Reliability (Group 2)
30/41
30
THE RELIABILITY OF DIFFERENCE SCORES
O Difference or gain scores, suffer from other deficiencies :
- Gain scores do not necessarily measure improvement in
knowledge.
- Students who initially perform at a high level cannot be expected tomake gains equal to those made by students in the middle or lower
parts.
- The regression effect can account for differences among students,
especially when they are initially selected on the basis of exceptionally high or low scores.
-
7/31/2019 SGDY 5063- Reliability (Group 2)
31/41
31
SOURCES OF ERROR
-
7/31/2019 SGDY 5063- Reliability (Group 2)
32/41
32
SOURCES OF ERROR Reliability coefficients provide estimates of error that vary in
O magnitude depending on the conditions allowed to influence
O them.
3 major sources of error :
- characteristics of students
- characteristics of the test
- conditions affecting test administration and scoring
-
7/31/2019 SGDY 5063- Reliability (Group 2)
33/41
33
CHARACTERISTICS OF STUDENTSO The true level of a students knowledge or ability is an
example of a desired source of true variance.O The outcome of a students true level of knowledge is
sometimes debatable.O They are usually temporary situations and undesirable
conditionO Not under the direct control of teachersO The factors related to this :
Test wiseness Illness and fatigues (sick, not enough rest) Lack of motivation ( nervous, anxiety, fear, family problem)
O Any temporary and unpredictable change in student can beconsidered intra-individual error or error within test takers .
-
7/31/2019 SGDY 5063- Reliability (Group 2)
34/41
34
CHARACTERISTICS OF THE TEST O Tricky questions
O Ambiguous questions
O Confusing format
O Questions excessively difficult
O Contain too few items
O Include dissimilar item content
O Reading level that is too high
-
7/31/2019 SGDY 5063- Reliability (Group 2)
35/41
35
CONDITIONS AFFECTING
TEST ADMINISTRATION & SCORINGO Physical environment (temperature, humidity, lighting,
seating conditions / arrangements, avoidance of
distractions)
O Instructions given to examineesO (their clarity, complexity, consistency, ambiguity, age differences,
idiosyncratic mannerism, racial, ethnic backgrounds)
O Scoring
O Scorers made arithmetic errors, fail to understand and abide scoring criteria,
mistakes in recording scores, bias etc
36
-
7/31/2019 SGDY 5063- Reliability (Group 2)
36/41
36
ERROR COMPONENTS OF DIFFERENTS
RELIABILITY COEFFICIENTS STABILITY EQUIVALENCE STABILITY &
EQUIVALENCEINTERNAL
CONSISTENCY orHOMOGENEITY
Different stabilitycoefficient will be obtained
over different time
Will be known wheneverone form of a test be
substituted for another
Will be increased due tochanges in students or
items
Stresses on the importanceof making sure that
guessing, item ambiguity,
difficulty levels, directions
and scoring criteria and
methods are controlled aspossible
37
-
7/31/2019 SGDY 5063- Reliability (Group 2)
37/41
37
WAYS TO IMPROVE
NORM REFERENCED RELIABILITY 1. Increase the number of good quality items2. Construct easy items to reduce guessing.3. Construct items to measure the same trait or ability.4. Avoid using tricky and ambiguous questions.5. Prepare the test to permit objective scoring.6. Make sure the pictorial material is readily identifiable and do not
crowd items on the page.7. Remember that means tend to be reliable than individual
scores.8. High scores tend to be more unreliable than average scores9. Avoid the use of gain or difference scores.10. Make test instructions to students clear and consistent.
38
-
7/31/2019 SGDY 5063- Reliability (Group 2)
38/41
38
WAYS TO IMPROVECRITERION REFERENCED
RELIABILITY 1. If reasonable, develop test in which there is a substantial difference
between the test mean and the cutoff score
2. Include as many items as possible
3. Make sure that objectives are as specific as possible.
4. Make sure that scoring is as specific as possible.
5. Provide students with practise items that are similar in format to the
test they will be taking.
6. Design the test format to be clear and uncluttered
7. Assuming score variability, use the method that improve reliability for
norm-reference test.
39
-
7/31/2019 SGDY 5063- Reliability (Group 2)
39/41
39
REFERENCESO Gilbert S. , James W.N. (1997). Principles of Educational And Psychological Measurement And Evaluation. United States: Wadsworth Publishing.
O
Lammers, W. J., and Badia, P. (2005). Fundamental of Behavioral Research . California: Thomson andWadsworth.
O Reliability, http://www.psychwiki.com, retireved on 3 April 2012.
O Martyn Shuttleworth (2009), http://www.experiment-resources.com/test-retest-reliability.html, retrieved on3 April 2012
40
-
7/31/2019 SGDY 5063- Reliability (Group 2)
40/41
40
41
-
7/31/2019 SGDY 5063- Reliability (Group 2)
41/41
41
O Confounding variables are variables thatthe researcher failed to control, or eliminate, damaging the internal validity of an experiment.
http://www.experiment-resources.com/internal-validity.htmlhttp://www.experiment-resources.com/internal-validity.html