item analysis

Download Item Analysis

Post on 11-Oct-2015




0 download

Embed Size (px)


Item Analysis


  • Analisis Item

  • Kegunaan Analisis Item

    -Menilai kualiti setiap itemRasional: Kualiti item menentukan kualiti ujian (i.e., kebolehpercayaan & kesahan)

    Boleh mencadangkan penambahbaikan pengukuran ujian

    Boleh membantu untuk memahami kenapa ujian tertentu boleh meramal sesuatu kriteria

  • Analisis ItemApabila menganalisa item ujian, terdapat beberapa soalan tentang prestasi setiap item. Soalan tersebut adalahAre the items congruent with the test objectives? Are the items valid? Do they measure what they're supposed to measure? Are the items reliable? Do they measure consistently? How long does it take an examinee to complete each item? What items are most difficult to answer correctly? What items are easy? Are there any poor performing items that need to be discarded?

  • Jenis analisis item

    1. Menilai kualiti pengggangu 2. Menilai kesukaran item3. Menilai bagaiamana item dapat dibezakan diantara murid berpencapaian tinggi dan rendah4. Kebolehpercayaan item5. Kesahan item

  • Kesahan ItemSejauh mana ujian dapat mengukur apa yang hendak diukurKesahan boleh dilihat daripada 3 aspek: - kesahan kandungan -kesahan gagasan - kesahan kriteria

  • Kesahan kandunganSejauh mana sesuatu ujian merangkumi domain kandungan yang hendak dinilaiBagi menentukan kandungan kurikulum perlu ditakrifkanBoleh dirangka di dalam JSURujuk guru pakar

  • Kesahan gagasanMerujuk kepada skor ujian dengan pencapaian dalam ujian

  • Kesahan kriteriaMerujuk kepada sejauh mana kaitan antara pencapaian dalam sesuatu ujian dengan kriteria yang berkecualiMemastikan ujian dapat mengukur kriteria yang hendak diukurTerdpat dua jenis kriteria yang boleh digunakan - kriteria serentak - kriteria jangkaan

  • Kriteria serentak ; kriteria yang ditunjukkan dalam satu tempoh masa yang sama/ hampir sama dengan pengukuran yang dikenakan keatas kriteria tersebutKriteria jangkaan; kriteria yang ditunjukkan pada satu selang masa selepas pengukuran dikenakan ke atas ujian

  • KebolehpercayaanKetekalan ujian dalam mengukur apa yang hendak diukurUjian dengan skor yang tekal, kebolehpercayaan tinggiUjian dengan skor berubah-ubah, kebolehpercayaan rendah

  • Ujian yang boleh dipercayai tak semestinya sahUjian yang sah boleh dipercayai

  • Dipengaruhi faktor tidak sistematikFaktor tidak sistematik; faktor yang menghasilakan perubahan atau variasi tidak dijangka/perubahan rawak3 sumber faktor tidak sistematik: -variasi individu - variasi perubahan tugasan - variasi persampelan gagasan

  • Indeks kebolehpercayaanDitentukan: - kaedah korelasi antara dua set ujian - kaedah bentuk setara/bahagi dua

  • Kaedah korelasi antara dua set ujianDua set ujian diperolehi dengan menggunakan kaedah uji dan uji semulaDiuji pada murid yang di masa berlainan

  • Kaedah bentuk setara/bahagi duaDua set diperolehi daripada satu ujian pada murid yang samaSet skor diperolehi daripada set jawapan soalan ganjil dan set soalan genap

  • Indeks kebolehpercayaan

    rxy = Nxy (x) (y) [ Nx2 (x)2] [ N y2 - (y)2]

    Kebolehpercayaan = 2 x kebolehpercayaan ujianKeseluruhan ujian 1 + kebolehpercayaan ujian

  • A. Multiple-ChokeB. Multiply-ChoiceC. Multiple-ChoiceD. Multi-ChoiceDISTRACTOR ANALYSIS

  • Distractor AnalysisFirst question of item analysis: How many people choose each response?If there is only one best response, then all other response options are distractors.Example from in-class assignment (N = 35):

    Which method has the best internal consistency?#a) projective test1b) peer ratings1c) forced choice21d)differences n.s.12

  • Distractor Analysis (contd)A perfect test item would have 2 characteristics: 1. Everyone who knows the item gets it right 2. People who do not know the item will have responses equally distributed across the wrong answers.

    It is not desirable to have one of the distractors chosen more often than the correct answer.

    This result indicates a potential problem with the question. This distractor may be too similar to the correct answer and/or there may be something in either the stem or the alternatives that is misleading.

  • Distractor Analysis (contd)Calculate the # of people expected to choose each of the distractors. If random same expected number for each wrong response (Figure 10-1).

    N answering incorrectly 14 Number of distractors 3

    # of Persons Exp. To Choose Distractor == 4.7

  • Distractor Analysis (contd)When the number of persons choosing a distractor significantly exceeds the number expected, there are 2 possibilities:

    1. It is possible that the choice reflects partial knowledge

    2. The item is a poorly worded trick question

    unpopular distractors may lower item and test difficulty because it is easily eliminated

    extremely popular is likely to lower the reliability and validity of the test

  • Item Difficulty AnalysisDescription and How to Compute ex: a) (6 X 3) + 4 = ? b) 9[1n(-3.68) X (1 1n(+3.68))] = ?

    It is often difficult to explain or define difficulty in terms of some intrinsic characteristic of the item

    The only common thread of difficult items is that individuals did not know the answer

  • Item Difficulty

    Percentage of test takers who respond correctly

    What if p = .00What if p = 1.00?

  • Item Difficulty

    An item with a p value of .0 or 1.0 does not contribute to measuring individual differences and thus is certain to be useless When comparing 2 test scores, we are interested in who had the higher score or the differences in scoresp value of .5 have most variation so seek items in this range and remove those with extreme valuescan also be examined to determine proportion answering in a particular way for items that dont have a correct answer

  • Item Difficulty (cont.)

    What is the best p-value?

    most optimal p-value = .50maximum discrimination between good and poor performers

    Should we only choose items of .50? When shouldnt we?

  • Should we only choose items of .50?

    Not necessarily ...

    When wanting to screen the very top group of applicants (i.e., admission to university or medical school).

    Cutoffs may be much higher

    Other institutions want a minimum level (i.e., minimum reading level)

    Cutoffs may be much lower

  • Item Difficulty (cont.)

    Interpreting the p-value... example: 100 people take a test 15 got question 1 right

    What is the p-value?Is this an easy or hard item?

  • Item Difficulty (cont.)

    Interpreting the p-value... example: 100 people take a test 70 got question 1 right

    What is the p-value?Is this an easy or hard item?

  • Item Difficulty (contd)General Rules of Item Difficulty

    p low (< .20) difficult test itemp moderate (.20 - .80) moderately diff.p high (> .80) easy item

  • ITEM DISCRIMINATION... The extent to which an item differentiates people on the behavior that the test is designed to assess.

    the computed difference between the percentage of high achievers and the percentage of low achievers who got the item right.

  • Item Discrimination (cont.)

    compares the performance of upper group (with high test scores) and lower group (low test scores) on each item--% of test takers in each group who were correct

  • Item Discrimination (contd):Discrimination Index (D)Divide sample into TOP half and BOTTOM half (or TOP and BOTTOM third)Compute Discrimination Index (D)

  • Item DiscriminationD = U - L

    U = # in the upper group correct response Total # in upper group

    L = # in the lower group correct response Total # in lower group

    The higher the value of D, the more adequately the item discriminates (The highest value is 1.0)

  • Item Discriminationseek items with high positive numbers (those who do well on the test tend to get the item correct)

    negative numbers (lower scorers on test more likely to get item correct) and low positive numbers (about the same proportion of low and high scorers get the item correct) dont discriminate well and are discarded

  • Item Discrimination (contd):Item-Total CorrelationCorrelation between each item (a correct response usually receives a score of 1 and an incorrect a score of zero) and the total test score.To which degree do item and test measures the same thing?

    Positive -item discriminates between high and low scores

    Near 0 - item does not discriminate between high & low

    Negative - scores on item and scores on test disagree

  • Item Discrimination (contd):Item-Total CorrelationItem-total correlations are directly related to reliability.Why?Because the more each item correlates with the test as a whole, the higher all items correlate with each other ( = higher alpha, internal consistency)

  • Quantitative Item AnalysisInter-item correlation matrix displays the correlation of each item with every other itemprovides important information for increasing the tests internal consistencyeach item should be highly correlated with every other item measuring the same construct and not correlated with items measuring a different construct

  • Quantitative Item Analysisitems that are not highly correlated with other items measuring the same construct can and should be dropped to increase internal consistency

  • Item Discrimination (contd):Interitem CorrelationPossible causes for low inter-item correlation:a. I