sgdy 5063- reliability (group 2)

7/31/2019 SGDY 5063- Reliability (Group 2)

1/41

SGDY 5063

Educational & PsychologicalMeasurement & Evaluation

Presented by:Mohd Hafiz Bin Mohd Salleh (810350)

Syahrul Azrin Binti Ghafar (810398)Aza Nurain Binti Azmey ( 811333 )Nur Zahirah Binti Othman ( 811562)

Supervised by: Tuan Haji Shuib Bin Hussain


2/41

2

PRESENTATION OVERVIEW What is reliability?

Practical method of estimating reliability

The reliability of Criterion-Referenced and Mastery Tests

Condition affecting reliability coefficients

The standard error of measurementSource of error

Error components of different reliability coefficients

The reliability of individual and group scoresThe reliability of difference scores

Ways to improve Norm-Referenced and Criterion-Referenced Reliability


3/41

3

Introduction to Test Reliability


4/41

4

Are the scores consistent?

Are they stable?

Introduction to Test Reliability


5/41

5

WHAT IS RELIABILITY?

O Reliability refers to the reliability of a test scoreor set of test scores, not the reliability of the test.

O Reliability refers to the consistency of a

measure. A test is considered reliable if we getthe same result repeatedly. (C. Kendra, 2012)O Reliability is not the same as validity validity

asks Does a test measure what is suppose to?

Technically reliability is defined as =True variance divided by obtained variance


6/41

6

Practical Methods of Estimating Reliability

O Because teachers do not know an individuals truescore, they cannot know the amount of error for any

given person.O We can estimate the effect of chance on

measurements in general.Consistency of measurement means high

reliability

The degree of consistency can be determined by acorrelation coefficient


7/41

7


Consistency of measurementmeans high reliability

The degree of consistency can be determined by acorrelation coefficient

Correlation is the generic term for all measures of

relationship and reliability may be thought of as a special type of correlation that measuresconsistency of observation scores


8/41

8


O The perfect test would be unaffected by the sources of unreliability and on this perfect test each examineeshould get his or her true score . Unfortunately, we know

the observed score we get was likely affected by one or more of the sources of unreliability.O So, our observed score is likely too high or to low. The

difference between the observed score and the true scorewe call the error score ; and this score can be positive or

negative.O We can express this mathematically as:

O True Score = Obtained Score +/- Error O T = O +/- E (or, looking at it another way, O = T +/- E)


9/41

9


Test Retest(stability)

AlternateForms(equivalent)

InternalConsistency


10/41

10


O The test-retest reliability method is one of the simplestways of testing the stability and reliability of an instrumentover time. (Martyn Shuttleworth , 2009)

O For example, if a group of students takes a test, we wouldexpect them to show very similar results if they take thesame test a few months later. This definition relies uponthere being no confounding factor during the interveningtime interval.

O Instruments such as IQ tests and surveys are primecandidates for test-retest methodology, because there islittle chance of people experiencing a sudden jump in IQor suddenly changing their opinions.

O Educational tests are often not suitable, because studentswill learn much more information over the intervening

period and show better results in the second test.
http://www.experiment-resources.com/confounding-variables.htmlhttp://www.experiment-resources.com/confounding-variables.htmlhttp://www.experiment-resources.com/confounding-variables.htmlhttp://www.experiment-resources.com/confounding-variables.html


11/41

11


O An alternate form reliability is the authenticity stablished bycarrying out two different forms of the same test to the sameindividuals. (psychwiki.com, 2012)

O This method is convenient to avoid the problems that comefrom the test-retest method. With the alternate form reliabilitymethod, an individual is tested on one form of the test, andthen again on a comparable second form, the in-between timeis about one week.

O This method is used more than the test-restest methodbecause it has fewer related problems, including anabundance reduction in practice effects.


12/41

12


O A procedure for studying reliability when the focus of theinvestigation is on the consistency of scores on the sameoccasion and on similar content, but when conductingrepeated testing or alternate forms testing is not possible.

a series of formulas based on dichotomously scored items

Kuder-Richardson

Cronbachs (most widely used as can be used with

continuous item types)

Coefficient alpha

Spearman-Brown correction to apply to full test(easiest to do and understand)

Split-half (odd-even )


13/41

13

Reliability of Classroom TestsO We would recommend doing Split-Half Reliability.O Step 1 Split the test into two parts (odd even).O Step 2 Use Pearson Product Moment Correlation -

Ungrouped Data to determine r xy

(r xy represents the correlation between the two halves of the scale).By doing the split-half we reduce the number of items whichwe know will automatically reduce the reliability,

O Step 3 To estimate reliability of whole test then use the

Spearman Brown correction formula r sb = 2r xy /(1+r xy)

where r sb is the split-half reliability coefficient


14/41

14

As a Teacher, What Do We Need

to Know Most About Reliability O For tests we can create:

O Increasing number of items increases reliability.O Moderate difficulty level increases reliability.O Having items measuring similar contentincreases reliability.

O For standardized tests we can use:O Look for each tests published reliability data. O Use the published reliability coefficient to

determine the Standard Error of Measurement(abbreviated SEM) found in the dataO See the following illustration


15/41

15

Estimating Reliability

O If we could re-administer a test to oneperson an infinite number of times,

what would expect the distribution oftheir scores to look like?

O Answer: The Bell Shaped Curve.


16/41

16

THE RELIABILITY OFCRITERION-REFERENCED TESTS

O Because the purpose of a criterion-referenced test is quitedifferent from that of a norm-referenced test, it should not besurprising to find that the approaches used for reliability aredifferent too. With criterion-referenced tests, scores are often

used to sort candidates into performance categories.O Variation in candidate scores is not so important if candidates

are still assigned to the same performance category. Therefore,it has been common to define reliability for a criterion-referenced test as the extent to which performanceclassifications are consistent over parallel-form administrations.

O For example, it might be determined that 80% of the candidatesare classified in the same way by parallel forms of a criterion-referenced test administered with little or no instruction inbetween test administrations. This is similar to parallel formreliability for a norm-referenced test except the focus withcriterion-referenced tests is on the decisions rather than thescores.


17/41

17

THE LIVINGSTON APPROACHO Formula for estimating the reliability of criterion-referenced

measures

=+ ( )

+ ( )

O Mean score = Criterion scoreO Livingston coefficient = reliability by conventional norm-referenced

methodsO Mean score Criterion score

O Livingston coefficient > reliability by conventional norm-referenced

methods

O WEAKNESSESO Chester Harris SEM is the same does not imply a more

dependable determination


18/41

18

THE PERCENTAGE- AGREEMENT APPROACH

O Criterion-referenced & mastery test are oftenadministered to classify applicants/students into oneor two groups (whether has mastered test content

or not according to some criterion)O Simple to compute

O =

+

O WEAKNESSESO Will be affected by the number of persons being

testedO Cutoff score & its closeness to the tests mean score


19/41

19

Condition Affecting Reliability Coefficients

Test Scoring

Test Content

Test Administration

Personal Conditions


20/41

20

Test ScoringO Scorer reliability is the extent to which different

observers or raters agree with one another asthey mark the same set of papers

O Difference between two scorers judgmentsO One scorer over time (fatigue) and/or halo effect

The higher the extent ofagreement is, the higher the

scorer reliability will be


21/41

21

Test ContentO The sample of test items is too smallO The sample of test items is not evenly

selected across material


22/41

22

Test AdministrationO Instructions with the test may contain errors

that create another type of systemic error.These errors exist in the instructions providedto the test-taker. Instructions that interfere with

accurately gathering information (such as atime limit when the measure the test is seekinghas nothing to do with speed) reduce thereliability of a test.

O Noise and surroundingO Physical condition


23/41

23

Personal ConditionsO factors related to the test-taker, such as

poor sleep, feeling ill, anxious or

"stressed-out" are integrated into the testitself O Temporary ups and downs


24/41

24

THE STANDARD ERROR OF

MEASUREMENT (SEM) The formula for the standard error of measurement is

SEmeas = SD 1reliabilitywhere SD equals the standard deviation of obtained scores.

So, if you have a test with a standard deviation (SD) of 4.89, Estimate reliability of .91, the standard error of measurement would be calculated as follows:

SEmeas = SD 1reliability = 4.89 SD 4.89 (0.3) = 1.467 1.47

The standard error of measurement is expressed in thesame unit as the standard deviation


25/41

25

THE RELIABILITY OF

INDIVIDUAL & GROUP SCORES O Decisions involving a single individual will require a muchhigher degree of reliability than is necessary for evaluating

of groups

O Truman Kelley (1927) suggested:

- Reliabilities of at least .94 for any students

- Reliabilities of not less than .50 for groups of

students

O Ideally, reliability should be +1.00, but in practise, test users

will have to settle for less than that.


26/41

26

THE RELIABILITY OF DIFFERENCE SCORES

O Sometimes testers want to compare differences betweenpretest and posttest.

O The reliability of difference scores depends on the

reliability of each test and the correlation between them.

O The following formula from Gulliksen (1950) can be usedto estimate the reliability of difference scores.


27/41

27


O e.g : The reliability of two tests is .90 and they correlate .80

Reliability of difference scores =average reliability correlation between tests

1 correlation between test

Reliability of difference scores =.90 .801 .80 = .50


28/41

28


O When the test reliability equals the correlation betweenthe test, the reliability of difference scores will be 0

O When the reliability of each test is +1.0 the reliability of

difference scores will also equal +1.0

O Reasons for typically low reliability is each of the two testcompared contains error of measurement.

O To increase the reliability of difference scores testers can:a) Select or construct highly reliable testb) Increase the number of items on the test


29/41

29


O When scores do not measure improvement but are merely

comparisons of two different test scores, the correlation between

the two tests is usually lower than the correlation between obtain

and gain scores.


30/41

30


O Difference or gain scores, suffer from other deficiencies :

- Gain scores do not necessarily measure improvement in

knowledge.

- Students who initially perform at a high level cannot be expected tomake gains equal to those made by students in the middle or lower

parts.

- The regression effect can account for differences among students,

especially when they are initially selected on the basis of exceptionally high or low scores.


31/41

31

SOURCES OF ERROR


32/41

32

SOURCES OF ERROR Reliability coefficients provide estimates of error that vary in

O magnitude depending on the conditions allowed to influence

O them.

3 major sources of error :

- characteristics of students

- characteristics of the test

- conditions affecting test administration and scoring


33/41

33

CHARACTERISTICS OF STUDENTSO The true level of a students knowledge or ability is an

example of a desired source of true variance.O The outcome of a students true level of knowledge is

sometimes debatable.O They are usually temporary situations and undesirable

conditionO Not under the direct control of teachersO The factors related to this :

Test wiseness Illness and fatigues (sick, not enough rest) Lack of motivation ( nervous, anxiety, fear, family problem)

O Any temporary and unpredictable change in student can beconsidered intra-individual error or error within test takers .


34/41

34

CHARACTERISTICS OF THE TEST O Tricky questions

O Ambiguous questions

O Confusing format

O Questions excessively difficult

O Contain too few items

O Include dissimilar item content

O Reading level that is too high


35/41

35

CONDITIONS AFFECTING

TEST ADMINISTRATION & SCORINGO Physical environment (temperature, humidity, lighting,

seating conditions / arrangements, avoidance of

distractions)

O Instructions given to examineesO (their clarity, complexity, consistency, ambiguity, age differences,

idiosyncratic mannerism, racial, ethnic backgrounds)

O Scoring

O Scorers made arithmetic errors, fail to understand and abide scoring criteria,

mistakes in recording scores, bias etc

36


36/41

36

ERROR COMPONENTS OF DIFFERENTS

RELIABILITY COEFFICIENTS STABILITY EQUIVALENCE STABILITY &

EQUIVALENCEINTERNAL

CONSISTENCY orHOMOGENEITY

Different stabilitycoefficient will be obtained

over different time

Will be known wheneverone form of a test be

substituted for another

Will be increased due tochanges in students or

items

Stresses on the importanceof making sure that

guessing, item ambiguity,

difficulty levels, directions

and scoring criteria and

methods are controlled aspossible

37


37/41

37

WAYS TO IMPROVE

NORM REFERENCED RELIABILITY 1. Increase the number of good quality items2. Construct easy items to reduce guessing.3. Construct items to measure the same trait or ability.4. Avoid using tricky and ambiguous questions.5. Prepare the test to permit objective scoring.6. Make sure the pictorial material is readily identifiable and do not

crowd items on the page.7. Remember that means tend to be reliable than individual

scores.8. High scores tend to be more unreliable than average scores9. Avoid the use of gain or difference scores.10. Make test instructions to students clear and consistent.

38


38/41

38

WAYS TO IMPROVECRITERION REFERENCED

RELIABILITY 1. If reasonable, develop test in which there is a substantial difference

between the test mean and the cutoff score

2. Include as many items as possible

3. Make sure that objectives are as specific as possible.

4. Make sure that scoring is as specific as possible.

5. Provide students with practise items that are similar in format to the

test they will be taking.

6. Design the test format to be clear and uncluttered

7. Assuming score variability, use the method that improve reliability for

norm-reference test.

39


39/41

39

REFERENCESO Gilbert S. , James W.N. (1997). Principles of Educational And Psychological Measurement And Evaluation. United States: Wadsworth Publishing.

O

Lammers, W. J., and Badia, P. (2005). Fundamental of Behavioral Research . California: Thomson andWadsworth.

O Reliability, http://www.psychwiki.com, retireved on 3 April 2012.

O Martyn Shuttleworth (2009), http://www.experiment-resources.com/test-retest-reliability.html, retrieved on3 April 2012

40


40/41

40

41


41/41

41

O Confounding variables are variables thatthe researcher failed to control, or eliminate, damaging the internal validity of an experiment.
http://www.experiment-resources.com/internal-validity.htmlhttp://www.experiment-resources.com/internal-validity.html

sgdy 5063- reliability (group 2)

Documents