epidemiology statistics

FF2613 MEDICINE & SOCIETY II

PRACTICAL SESSIONS: EPIDEMIOLOGY & STATISTICS

FOR YEAR 2 STUDENTS ONLY

DEPARTMENT OF COMMUNITY HEALTH FACULTY OF MEDICINE

UNIVERSITI KEBANGSAAN MALAYSIA KUALA LUMPUR

Scenario An outbreak of gastroenteritis occurred in Bandar Tun Razak, a suburban neighborhood, on the evening of April 28. A total of 89 people went to the emergency departments of the three local hospitals during that evening. No more cases were reported afterward. The patients complained of headache, fever, nausea, vomiting and diarrhea. The disease was severe enough in 19 patients to require hospitalization for rehydration. The local health department was immediately notified of a potential food-borne outbreak of gastroenteritis in Bandar Tun Razak. Exercise 1 1. Define epidemic, endemic and pandemic. 2. Describe the gastroenteritis outbreak according to disease transmission and epidemiological triad. 3. What are the possible causes of the outbreak? 4. List and discuss steps that should be taken in outbreak investigations 5. What further information needed?

Exercise 2 The epidemic team, including a medical epidemiologist (public health physician – Health Officer), health inspectors and a nurse, visited the local hospitals to interview the attending physicians, the patients and some of their relatives. Some stool samples were obtained from patients for microbiologic identification of the causative agent. The distribution of the disease by person (age and gender) was found as follows:

Gastroenteritis Outbreak Findings by Person, Case Distribution by Age and Gender

Female Male Total by age Age group No %Females No %Male No %

0 - 5 yr 1 1 6 - 10 yr 38 37 11 yr and

older 10 2 Total by gender

Please calculate the totals for each column and row and their corresponding percentages to try to determine if there are any important differences by age or by gender. Interpret your findings.

Discuss the epidemic curve above

Exercise 3 Therefore the epidemic team investigated the places where affected persons, their relatives and neighbors ate that day (April 28). The following table shows the team's findings:

Gastroenteritis Outbreak Findings by Place

Place

People who attended

Ill people

Attack rate

People who did not attend

Ill people

Attack rate

Relative risk

Cafeteria LRT 207 61 157 47 Kedai Makan Ali 246 25 122 13 Restaurant ABC 475 68 189 29 Elementary school cafeteria 239 67 495 22

Please calculate the attack rates per 100 (incidence rates per 100) by place to try to determine where the contaminated meal was served. For each place compare attack rates (AR) for those who attended with attack rates for those who did not, by using the relative risk (i.e., RR = AR in attendees/AR in non attendees). Interpret your findings.

Exercise 4 Once the implicated place was determined, the investigation centered on the food. The following table includes the food items served in that place on April 28:

Gastroenteritis Outbreak Findings by Person Ate the food item Did not eat the food item Food

Item No. people

Ill people

Attack rate

No. people

Ill people

Attack rate

Relative risk

Beef rendang 276 28 266 27 Burger 218 21 131 14 Salad 105 49 297 15 Baked potato 139 11 213 31 Fruit cocktail 88 48 279 25 Ice cream 175 18 203 49

Important note: None of the kitchen personnel were ill. The names of the kitchen personnel and their participation in the food preparation are as follows: Ms Mary prepared the beef rendang and the potatoes, Johan prepared the salad and the fruit, Salmah served all dishes except the ice cream, and Jamilah prepared the burgers and served the ice cream. The ice cream was a commercial brand and was bought at a nearby supermarket. Please calculate the attack rates per 100 (incidence rates per 100) by food item to try to determine the one that was probably contaminated. Compare attack rates (AR) for those who ate the food item with attack rates for those who did not eat the food item, by using the relative risk (i.e., RR = AR in those who ate the food/AR in those who did not eat the food). Interpret your findings.

Exercise 5 Given that the epidemic team worked fast enough and the implicated meal(s) was (were) identified before all food leftovers were discarded, food samples from some meal leftovers were taken to the laboratory. In addition, stool samples were taken from the kitchen personnel who prepared or handled each different food item. The laboratory confirmed that Salmonella toxin was present in some of the food samples and that one of the kitchen personnel of that place had the same Salmonella species. Furthermore, the Salmonella species found in the food and the kitchen worker was the same species found in stool samples of the patients. Please discuss these findings and identify the kitchen worker possibly responsible for the outbreak. Discuss the general principle of prevention and control of gastroenteritis outbreak.

1

Screening Test

Screening: DefinitionThe identification, amongst apparently healthy individuals, of those who are sufficiently at risk of a specific disorder

Screening vs Diagnosis

In screening, there is no intention to make a definitive diagnosis or offer therapeutic intervention solely based on a positive result

Screening program: Requirements (I)

Natural history of disease must be understoodHave an agreed policy on whom to treatPrevalence of undiagnosed disease highDisease has high morbidity and mortalityOf public health concernEarly treatment easier and more effective

Screening program: Requirements (II)

Signs present to indicate disease presenceScreening test acceptable and harmlessScreening test must be validYield of screening must be highDiagnostic work-up for a positive test must have acceptable morbidityScreening exercise must be cost-effective

The ‘ideal’ screening test

Would always give the right answer

Quick, safe and simple

Painless, reliable and inexpensive

2

Structure of a study involving a screening test

Resembles an observational studySame concepts applied for ‘diagnostic test’Designed to determine how well a test can discriminate between diseased and non-diseasedA predictor variable (the test result)An outcome variable (presence or absence of disease)

Structure of a study involving a screening test

The test result

– Dichotomous+ve or -ve

– Categorical+, ++, +++, ++++

– Continuousmg/dl, ng/L, etc.

The disease as outcome variable

– Presence or absence determined by a gold standard

Measures of accuracy for screening tests:

Validity– Sensitivity and specificity

Predictive values (PV)– Positive PV and Negative PV

Evaluation of a screening testTRUTH

Disease No disease

A BTrue-positive False-positive

C DFalse-negative True-negative

TEST RESULT

Positive

Negative

Sensitivity = AA + C

x 100 Specificity = DB + D

x 100

Sensitivity Specificity

Sensitivity is the proportion of those with the disease who tested positiveIndicates how good a test is at identifying the diseased

Specificity is the proportion of those without the disease who tested negativeIndicates how good a test is at identifying the non-diseased

Sensitivity and specificity

Describe the performance of a test

A test with a high sensitivity is useful to RULE OUT the disease

A test with a high specificity is useful to CONFIRM the presence of disease

3

Predictive values (PV)

Assess the usefulness of a testA test of efficient use of time and resourcesPV estimate the probability of diseasePV describe the frequency of correct identificationPositive PV and Negative PV

Predictive valuesTRUTH

Disease No disease

A BTrue-positive False-positive

C DFalse-negative True-negative

TEST RESULT

Positive

Negative

PV+ = AA + B

x 100 PV- = DC + D

x 100

Predictive values

PV of a positive test is the proportion of individuals who test +ve and have the disease

PV of a negative test is the proportion of individuals who test -ve and don’t have the disease

The positive PV estimates the likelihood that a person who tests positive has the disease

The negative PV indicates the likelihood that a person who tests negative is actually disease free

Predictive values

Greatest value in deciding whether to implement a screening program

Not useful if positive PV is low

Sensitivity, specificity and PVsDisease status

Cancer No cancer Total

Positive 132 985 1117Negative 47 62295 62342

Total 179 63280 63459

Prevalence: 179/63459 x 100 = 0.3%Sensitivity: 132/179 x 100 = 73.7%Specificity: 62295/63280 x 100 = 98.4%+ve PV: 132/1117 x 100 = 11.8% (False +ve 88.2%)-ve PV: 62295/62342 x 100 = 99.9% (False -ve 0.01%)

Shapiro et al., 1988

Mammography

CommentsMammography had an excellent specificity (98%)False +ve tests outnumber the true +ve tests by over 7:1 (PV+ =12%)~7 in every 8 patients who had positive mammograms had normal biopsiesPredictive value for a positive test is low (12%)

4

Predictive Value Of A Test Is Affected By Prevalence Of Disease SUMMARY

A screening test study determines the usefulness of a test in identifying those at risk of a disease Students must be able to calculate and interpret sensitivity, specificity & predictive values.

THANK YOU

Year-2, Semester-1

Page 2 of 12

Trigger:

You are the State Medical Officer for AIDS/HIV of Negeri Sembilan and you are expected to conduct a sentinel surveillance for HIV amongst;

o Pregnant mothers (Antenatal Screening)

o STD clinic patients

Year-2, Semester-1

Page 5 of 12

Data Information Sheet-1 – Choosing The Appropriate Screening Test

To select the appropriate screening test, you did a literature review and collated the following tables. Calculate the sensitivity, specificity, PPV and NPV of each test to help you decide.

Disease Present

Disease Absent

Total

Positive TP FP TP + FP Negative FN TN FN + TN

Total TP + FN FP + TN N

HIV Enzyme Immuno Assay (EIA) Gold Standard

+ - total + 1000 9 1009 EIA (blood) - 0 8991 8991

total 1000 9000 10,000 HIV Particle Agglutination Test

Gold Standard + - total

+ 999 270 1269 PA - 1 8730 8731

total 1000 9000 10,000 HIV Rapid Test Kit


+ 998 180 1178 Rapid Test - 2 8820 8822

total 1000 9000 10000 Oral Rapid Test Kit


+ 930 180 1110 Oral Test Kit - 70 8820 8890

total 1000 9000 10000

EIA PA Rapid Oral

Sensitivity

Specificity

PPV

NPV

Which is the best screening test?

TP = True Positive FP = False Positive FN = False Negative TN = True Negative

Sensitivity = TP/(TP+FN) x 100% Specificity = TN/(TN+FP) x 100% PPV = TP/(TP+FP) x 100% NPV = TN/(TN+FN) x 100%

Year-2, Semester-1

Page 6 of 12

Data Information Sheet-2 – Effect of Prevalence on Sensitivity & Specificity Based on the earlier analysis, HIV EIA, a test with sensitivity of 100.0% and specificity of 99.9% was selected to be used for the sentinel surveillance in Negeri Sembilan. You decided to include the inmates of Pusat Serenti Tampin and Pusat Serenti Jelebu in the sentinel surveillance. Each study population consisted of 10,000 people. Calculate the PPV, NPV and prevalence rate of HIV for each study population. Antenatal mothers

Disease Present

Disease Absent Total

Positive 3 10 13 Negative 0 9987 9987

Total 3 9997 10000 Blood donors

Disease Present

Disease Absent

Total


Total 9 9991 10000 IVDU

Disease Present

Disease Absent

Total


Total 2000 8000 10000

Population Population with HIV

Population without HIV

TOTAL Prevalence rate

Antenatal mothers 3 9987 10,000 Blood donors 9 9991 10,000 IVDUs 2000 8000 10,000

Since the sensitivity and specificity is the same for all three study populations, please discuss how PPV and NPV are affected by the prevalence of the disease in each study population.

PPV and NPV can also be calculated using the following formulas;

PPV = Prevalence x Sensitivity (Prev x Sen) + (1 - Prev)x (1 - Sp)

NPV = (1-Prevalence) x Specificity (1-Prev) x Sp + Prev x (1 - Sen)

PPV = NPV =

PPV = NPV =

PPV = NPV =

Year-2, Semester-1

Page 7 of 12

Data Information Sheet-3 – Effect of Prevalence on Sensitivity & Specificity

+ -Population 10,000 + a b a+bSensitivity 100.00% - c d c+dSpecificity 99.90% a+c b+d a+b+c+d

TP FP FN TN PPV NPVPrevalence a b c d a+c b+d a/a+b d/c+d

0.01% 1 10 0 9,989 1 9,999 9.09% 100.00%0.02% 2 10 0 9,988 2 9,998 16.67% 100.00%0.03% 3 10 0 9,987 3 9,997 23.08% 100.00% <- Antenatal0.05% 5 10 0 9,985 5 9,995 33.34% 100.00%0.09% 9 10 0 9,981 9 9,991 47.39% 100.00% <- Blood Donors1.00% 100 10 0 9,890 100 9,900 90.99% 100.00%5.00% 500 10 0 9,491 500 9,500 98.14% 100.00%10.00% 1000 9 0 8991 1000 9,000 99.11% 100.00%20.00% 2000 8 0 7992 2000 8,000 99.60% 100.00% <- Pusat Serenti30.00% 3000 7 0 6993 3000 7,000 99.77% 100.00%

Hypothetical Illustration of Screening Programme with Test Kit

PPV based on Prevalence, Sensitivity & Specificity

99% 95% 90% 80%99% 95% 90% 80%

20.0% 96.1% 82.6% 69.2% 50.0%10.0% 91.7% 67.9% 50.0% 30.8%5.0% 83.9% 50.0% 32.1% 17.4%1.0% 50.0% 16.1% 8.3% 3.9%0.1% 9.0% 1.9% 0.9% 0.4%

sensitivity %specificity %

prevalence

Year-2, Semester-1

Page 8 of 12

References:

Osman Ali. 1990. Kaedah Epidemiologi. Penerbit: Dewan Bahasa Dan Pustaka. UNAIDS/WHO. 2004. UNAIDS/WHO Policy Statement on HIV Testing http://www.who.int/ethics/topics/hivtestingpolicy_who_unaids_en_2004.pdf WHO (March 1997) Revised Recommendation for the Selection and Use of HIV Antibody Tests. Weekly Epidemiological Record, No. 12. http://www.who.int/docstore/wer/pdf/1997/wer7212.pdf WHOSEA. 1998. Standard Operating Procedures for Diagnosis of HIV Infection. http://w3.whosea.org/bct/332/diagnosis1.htm CDC. 2005. What are the different HIV screening tests available in the U.S.? http://www.cdc.gov/hiv/pubs/faq/faq8.htm USFDA. 2006. “Donor Screening Assays for Infectious Agents and HIV Diagnostic Assays” http://www.fda.gov/cber/products/testkits.htm Joseph Hellweg. 2005. Narrative and Secrecy: Sentinel Surveillance and Alternative Epidemiologies of HIV/AIDS in Northwestern Côte d'Ivoire. Africa Conference 2005: African Health and Illness. http://www.utexas.edu/conferences/africa/2005/panels/hellweg.html Trisha Greenhalgh. 1997. How to read a paper: Papers that report diagnostic or screening tests. BMJ 1997;315:540-543 (30 August) http://bmj.bmjjournals.com/cgi/content/full/315/7107/540

PRACTICALS GUIDE Medicine & Society Module (FF2613)

INTRODUCTION

In this module there will be 4 practical sessions for the research project and statistical exercises. Students will be guided by the respective lecturer/tutor assigned to each lab. The schedule for the practical sessions for this semester is as stated below;

DATE

TIME TOPIC CONTENT

14/07/10 10.30 – 12.30 Descriptive Statistics & Research Project 1

Manipulation and presentation of data using the given dataset, including calculating the measures of central tendencies and variability using statistical formulas. Determine the title, objective, problem framework, hypothesis and methodology. Once the above has been agreed upon, as homework, they are expected to write up the proposal, including the questionnaire, which will be discussed during the second practical session.

21/07/10 10.30 – 12.30 Analysis of Quantitative Data & Research Project 2

Calculation and interpretation of t-tests and proportionate tests using the given dataset. Presentation of the complete research proposal. Upon acceptance, as homework, the students are expected to distribute the questionnaires and collect the data for the study. All completed forms are to be brought to the third practical session.

20/08/10 2.30 – 4.30 Correlation & Research Project 3

Calculation and interpretation of correlation and regression using the given dataset. Students are guided on how to enter the data into the computer using Excel or SPSS. Each lab is required to prepare a notebook for the session. For homework, students will complete the data entry for all collected data and bring the complete file to the fourth practical session.

27/08/10 10.30 – 12.30 Chi-Square, Non-Parametric and Research Project 4

Calculation and interpretation of non-parametric and chi-square tests using the given dataset. Each lecturer will demonstrate how to analyse the data using computer and advice on the interpretation of results. For homework, the students will complete the analysis and prepare a PowerPoint presentation for the final practical session.

20/09/10 24/09/10

2.00 – 4.00 10.00 – 12.00

Research Project 5

Presentation of their findings. For homework, the students will prepare a written report of the study, to be submitted in two weeks time from their presentation.

Practical 1 Descriptive Statistics

Introduction In the old curriculum, the practical sessions were slotted immediately after the respective lectures. In the past we had 25 hours of lectures and 8 practical sessions just for statistics and research methodology. Now we only have 7 hours of lecture and 4 practical sessions for statistics and research methodology in the new curriculum. Whenever possible, we try to slot the practical sessions according to lectures. But we can’t cover everything; therefore students are also expected to learn on their own. Please be patient and persists in doing the exercises.

For this session, we are will learn about measures of central tendency and variability. We use these measures of central tendency and variability to describe the data that we collected. The measures of central tendency are mean, mode and median. For variability, it is standard deviation (sd). Kindly refer to your formula sheet or your books for help. Measures of Central Tendency for Quantitative Data

1. Write down the formulas for mean in the boxes below;

Basic Formula Formula for grouped data (Formula A)

2. Calculate the mean, mode and median for the age χi of the following respondents; 35 24 36 21 21 20 34 29 37 30 26 27 29 34 33 33 27 25 21 26 32 30 33 36 28 33 19 29 27 29 22 23 31 32 31 Total = ___________ n = ________ Median = __________ Mean = __________ Mode = __________ 3. Write down the formulas for standard deviation in the boxes below;

Basic Formula Formula for grouped data (Formula A)

4. Using the data from Q.2, calculate the standard deviation and variance of the age χi of respondents.

x x-mean (x-mean)2 x x-mean (x-mean)2

19.00 29.00 20.00 30.00 21.00 30.00 21.00 31.00 21.00 31.00 22.00 32.00 23.00 32.00 24.00 33.00 25.00 33.00 26.00 33.00 26.00 33.00 27.00 34.00 27.00 34.00 27.00 35.00 28.00 36.00 29.00 36.00 29.00 37.00 29.00

Total Total Total (x-mean)2 = _______________ Therefore standard deviation s = _________________ It is easy to calculate the mean and standard deviation for data with few observations. But for studies with large number of samples, it is much harder. Therefore for large studies, the quantitative data are sorted in frequency tables such as the one below; 5. These are data from a case-control study to identify factors that are associated with small for gestational age amongst newborn babies. For the table below, the factor being studied is the weight of the mothers during first trimester (first three months of pregnancy) and the incidence of babies with low birth weight. Weight during first

trimester in kg All

FrequenciesFrequency of Cases

Frequency of Controls

30.0-39.9 5 5 0 40.0-49.9 69 48 21 50.0-59.9 82 43 39 60.0-69.9 45 10 35 70.0-79.9 10 2 8 80.0-89.9 3 1 2 90.0-99.9 4 1 3

Total 218 110 108

For the following exercise, calculate the mean, mode, median and standard deviation for both cases and controls. To simplify matters, just fill up the table below; For cases;

Weight in kg Frequency m.p f.mp f.mp2 f cumulative 30.0-39.9 5 34.95 5 40.0-49.9 48 44.95 53 50.0-59.9 43 54.95 96 60.0-69.9 10 64.95 106 70.0-79.9 2 74.95 108 80.0-89.9 1 84.95 109 90.0-99.9 1 94.95 110

Total 110 For controls;

Weight in kg Frequency m.p f.mp f.mp2 f cumulative 30.0-39.9 0 34.95 0 0 0 40.0-49.9 21 44.95 21 50.0-59.9 39 54.95 60 60.0-69.9 35 64.95 95 70.0-79.9 8 74.95 103 80.0-89.9 2 84.95 105 90.0-99.9 3 94.95 108

Total 108 ☻f.mp2 means “frequency x (midpoint)2”, not (fmp)2 Fill up your answers in the table below;

Case Control Mean

=

=

Mode + . =

+ . =

Median + . =

+ . =

Standard deviation =

=

The answers above will be used in the coming practical sessions.

Hakcipta terpelihara Dr Azmi Mohd Tamil Amali1.doc 8-4-07.

Practical 1b Research Proposal

Each lab group is required to come up with a research proposal, collect the data required, analyse the data, present their findings and write up the final report for submission. For this session, the students are expected to agree on the;

• Title of the research • Objectives • Problem Framework • Hypothesis • Methodology

Once the above has been agreed upon, as homework, they are expected to write up the proposal, including the questionnaire, which will be discussed during the second practical session.

Practical 2 Inferential Statistics

Statistical Tests & Types of Variables In general there are 2 types of variables; qualitative & quantitative. When you want to test the association between 2 variables, the type of test to be utilised depends on the type of variables. The tables below gave a general guide on the correct statistical test for the respective variable types. Qualitative Data Analysis

Parametric Analysis Qualitative Dichotomus

Quantitative Normally distributed data Student's t Test

Qualitative Polinomial

Quantitative Normally distributed data ANOVA

Quantitative Quantitative Repeated measurement of the same individual & item (e.g. Hb level before & after treatment). Normally distributed data

Paired t Test

Quantitative - continous


Normally distributed data Pearson Correlation & Linear Regresssion

Non-Parametric Analysis Variable 1 Variable 2 Criteria Type of TestQualitative Dichotomus

Qualitative Dichotomus

Sample size < 20 or (< 40 but with at least one expected value < 5)

Fisher Test

Qualitative Dichotomus

Quantitative Data not normally distributed Wilcoxon Rank Sum Test or U Mann-Whitney Test

Qualitative Polinomial

Quantitative Data not normally distributed Kruskal-Wallis One Way ANOVA Test

Quantitative Quantitative Repeated measurement of the same individual & item

Wilcoxon Rank Sign Test



Data not normally distributed Spearman/Kendall Rank Correlation

Practical 2 This is the second practical session for this module. In this session, we will be conducting exercises on Student�s t-test, paired t-test and proportionate test. Student’s t-test 1a. Write down the formula for Student�s t-test in the boxes below;

Basic Formula Sample size > 30 Small sample size & equal variance

b. Based on results from the previous session, Q5, complete the boxes below;

Case Control

Mean

Standard deviation

n 110 108 The hypothesis that we want to test out is that; There is a difference of first trimester body weight between the cases (mothers with SGA babies) and controls (mothers with non-SGA babies). c. Write down the null hypothesis; d. Calculate the t for Student�s t-test for the above exercise;

e. Please refer to table A1 and A3, and try to estimate the p value from the t value calculated. Discuss which table is more appropriate for this exercise. f. Based on the above p value, is the null hypothesis rejected? g. Is there a significant difference of first trimester weight between the two groups? Explain your answer. 2. During the examination, we will not tell you what test to use. Instead the students are expected to choose the appropriate one based on the problem and the data given. For example, try to do the exercise below; A case-control study to identify factors that can cause small for gestational age � SGA was conducted. Among the factors studied were the mothers� heights. It is believed that the shorter mothers were of higher risk to get SGA babies.

Case Control Total of samples n 110 108 Total of weight ∑x 16620 16439 Total of (x-mean)2 2326 3605

Both groups Total of samples n 218 Total of weight ∑x 33059 Total of (x-mean)2 5931

a. State the hypothesis and null hypothesis for the above problem. b. What is the appropriate statistical test to prove this hypothesis? c. Using the data given, conduct the statistical test. d. What is your conclusion, based on your answers in Q2c?

Paired t-test 3a. Write down the formula for paired t-test in the box below;

Basic Formula

b. Thirty of the pregnant mothers were found to be anaemic during their second trimester follow-up. They were treated with haematinics for 2 months and their haemoglobin levels were measured again. To measure the effectiveness of the treatment, please complete the table below.

Hb1 Hb2 D D2 1 9.3 9.5 2 9.5 10.0 3 9.5 10.0 4 9.6 11.0 5 9.7 12.0 6 9.8 9.0 7 9.8 9.6 8 10.0 7.2 9 10.0 9.6

10 10.0 10.0 11 10.0 10.0 12 10.0 10.0 13 10.0 10.0 14 10.0 10.0 15 10.0 10.0 16 10.0 10.0 17 10.0 10.0 18 10.0 10.3 19 10.0 10.5 20 10.0 10.6 21 10.0 10.8 22 10.0 11.0 23 10.0 11.0 24 10.0 11.0 25 10.0 11.0 26 10.0 11.5 27 10.0 13.0 28 10.0 13.0 29 10.0 13.0 30 10.1 11.0

Total

c. Is the intervention effective? Do a paired t-test analysis using the data above.

d. Discuss the result of your statistical test.

Proportionate Test 4a. Write down the formula for proportionate test in the box below;

Basic Formula

The rate of SGA for mothers exposed to cigarette smoke (passive smoker) was 89/156. The rate of SGA for mothers not exposed to cigarette smoke was 20/61. b. State the appropriate null hypothesis. c. Do the proportionate test and discuss its result using 0.05 as the level of significance (the z value in the normal distribution table for 0.05 as the level of significance is 1.96).

Research Project 2 Presentation of the complete research proposal. Upon acceptance of the proposal, as homework, the students are expected to distribute the questionnaires and collect the data for the study. All completed forms are to be brought to the third practical session.


Practical 3 Inferential Statistics 2

Introduction This is the third practical session. In this session we will do exercises on Pearson correlation and linear regression. Pearson Correlation 1a. Write down the formula for Pearson Correlation in the boxes below;

Basic Formula for r (x-mean x)2 (y-mean y)2 (x- mean x)(y-mean y)

As you can see from the formulas above, to calculate the correlation coefficient (r), you need to identify the following;

• Total of the first variable (∑x), • Total of the first variable squared (∑x2), • Total of the second variable (∑y), • Total of the second variable squared (∑y2) and • Total of the two variables multiplied (∑xy).

Just imagine the number of calculations that you have to do before you even get to calculate the correlation coefficient (r). If the sample size is 150, you will have to do more than 455 calculations. Since you�ll be doing this calculations manually, the chance of error occurring is quite high indeed☺.

For exercise, complete the following table. Measure the time required to complete it. Once done, please note that you may have to do the same thing again for a dataset 5 times larger than this.��.☺ 2. A case-control study to identify factors that can cause small for gestational age �

SGA was conducted.

In the past exercise, we have proven that there is an association between the mothers� first trimester weight and SGA.

Now we want to see whether there is an association between the mothers� first trimester weight (WEIGHT2) and the child�s birth weight (BIRTHWGT).

Please complete the following table;

INDEX WEIGHT2 WEIGHT22 BIRTHWGT BIRTHWGT2 ∑xy

9 42.00 2.40 10 40.00 2.30 12 66.00 2.30 20 51.50 2.10 21 47.50 2.23 29 39.50 2.49 31 40.00 2.46 32 46.50 2.52 34 55.00 2.28 43 49.20 2.20 60 45.00 2.48 70 63.50 2.00 72 52.40 2.31 79 52.30 2.15 90 47.50 2.55 97 62.00 2.41

117 55.10 3.46 126 72.00 3.50 131 61.50 2.97 138 86.00 3.48 145 60.80 3.00 146 44.00 2.84 156 58.00 3.55 159 70.00 3.19 171 44.00 3.09 173 59.50 3.56 174 47.50 3.16 175 53.00 3.10 178 62.50 3.27 181 92.00 3.00

TOTAL a. State the null hypothesis for correlation test between the two variables. b. Conduct the correlation test and calculate the r (correlation coefficient). How

strong is the relationship between the two variables? c. Is the r significant? What is the p value? How is it calculated?

If the r is significant, it is best to demonstrate it using a scatter diagram like the one below;

Mothers' Weight

1009080706050403020100

Babi

es' B

irthw

eigh

t3.63.43.23.02.82.62.42.22.01.81.61.41.21.0.8.6.4.2

0.0 Rsq = 0.1874

To expect the students to calculate all that during the examination, would be

rather cruel. Instead, usually, all the required data will be given, along with some extraneous data, just to confuse the students. It is up to the students to select the appropriate data and use it in the appropriate statistical test.

3. A case-control study to identify factors that can cause small for gestational age �

SGA was conducted. Among the factors studied were whether there is an association between the mothers� height in cm (HEIGHT) and the child�s birth weight in kilogram (BIRTHWGT).

n = 218 HEIGHT BIRTHWGT Mean 151.65 2.79 Standard deviation 5.26 0.54 ∑(observation) 33059.00 608.46 ∑ (observation2) 5019291.00 1760.98 ∑ (observation 1 x observation 2) 92386.35 a. Name the appropriate statistical test to test the association between the two

variables. b. State the null hypothesis for the above statistical test. c. Conduct the statistical test including the test of significance. Discuss the result of

the test.

r = 0.431, p = 0.017

Linear Regression 4a. Write down the formula for linear regression in the boxes below;

Basic Formula b a

b. Using the data from Q2, conduct the test for linear regression and calculate the regression co-efficient (b) and constant (a). c. Write down the final equation of the calculation. d. Draw a rough diagram of the final equation from the calculation. Research Project 3 Students will be guided on how to enter the data that they have collected into the computer using Excel or SPSS. Each lab is required to prepare a notebook for the session. For homework, students are required to complete the data entry for all collected data and bring the completed file to the fourth practical session.


!"#$%&$#'()(*+,-"-+%&#'(.%#%&/%&$(0(

(*+%"123$%&1+!!"#! $%&'! ()*+$&+*,! '-''&.#/! 0-! 0&,,! 1-! 2.&#3! -4-)+&'-'! .#! +%&5'67*)-! $-'$! *#2! #.#5(*)*8-$)&+!*#*,9'&':!!!45&6.73#"-(8-/%(9:;<(!;%&'!&'!$%-!8.'$!<)-67-#$!'$*$&'$&+*,!*#*,9'&'!$%*$!&'!$-'$-2!<.)!27)&#3!-4*8&#*$&.#:!=.!8*>-!'7)-! $%*$! 9.7!)-*,,9!7#2-)'$*#2! &$:!;%&'!*#*,9'&'! &'!2.#-! $.! $-'$! <.)!*''.+&*$&.#!1-$0--#!$0.!67*,&$*$&?-!?*)&*1,-':!!!@1'-)?-2!2*$*!0.7,2!1-!'.)$-2!*++.)2&#3,9!&#!*!+.#$&#3-#+9!$*1,-:!;%-#!$%-!-4(-+$-2!?*,7-!$*1,-!&'!+*,+7,*$-2/!7'&#3!$%-!).0'!*#2!+.,78#!$.$*,'/!*'!&,,7'$)*$-2!1-,.0A!!!@1'-)?*$&.#!;*1,-!

! B! 5! !B! *! 1! 3!5! +! 2! %!! -! <! #!

!C4(-+$-2!D*,7-!;*1,-!

! B! 5! !B! -3E#! <3E#! 3!5! -%E#! <%E#! %!! -! <! #!

!!F%&5'67*)-! &'! +*,+7,*$-2! 19! '788&#3! 7(! G.1'-)?-2! H! -4(-+$-2IJE-4(-+$-2! <.)! -*+%!+-,,:!!KJ!L!M!!G@5CIJ!! !!!C!!2<!L!G)!H!NI!G+!H!NI!!!O.!$%-!<.,,.0&#3!-4-)+&'-:!;%-#!+.8(*)-!0&$%!$%-!*#'0-)!<.)!P:Q+!<).8!R)*+$&+*,!J:!!!

N: ;%-! )*$-! .<! =ST! <.)!8.$%-)'! -4(.'-2! $.! +&3*)-$$-! '8.>-! GU(*''&?-! '8.>-)VI! &'!WXENYZ:!;%-!)*$-!.<!=ST!<.)!8.$%-)'!#.$!-4(.'-2!$.!+&3*)-$$-!'8.>-!&'!J[EZN:!!

!@1'-)?*$&.#!$*1,-!

! =ST! \.)8*,! !R*''&?-!=8.>-)! WX! Z]! NYZ!\.#5=8.>-)! J[! QN! ZN!

! N[X! N[W! JN]!!*: F.8(,-$-!$%-!$*1,-!.<!-4(-+$-2!?*,7-'!1-,.0A!!

! =ST! \.)8*,! !R*''&?-!=8.>-)! ! ! !\.#5=8.>-)! ! ! !

! ! ! !!1: ^%*$!&'!$%-!#7,,!%9(.$%-'&'_!!!!+: ^%*$! &'! $%-! )*$-! .<! =ST! <.)! (*''&?-! '8.>-)'! *#2! #.#5'8.>-)'_! "'! $%-)-! *#9!2&<<-)-#+-_!!

!!!2: F.#27+$! $%-! *(().()&*$-! '$*$&'$&+*,! $-'$! $.! ().?-! 9.7)! %9(.$%-'&':! O&'+7''! 9.7)!<&#2&#3':!

((((((((((((=&/5-">/(?@#$%(8-/%(!`&'%-)a'!C4*+$!;-'$!&'!+.#27+$-2!$.!$-'$!$%-!*''.+&*$&.#!1-$0--#!J!67*,&$*$&?-!?*)&*1,-'!*#2!%*'!*!'8*,,!'*8(,-!'&b-A!,-''!$%*#!J[!.)!,-''!$%*#!Q[!*#2!.#-!.<!$%-!-4(-+$-2!?*,7-'!&'!,-''!$%*#!Y:!;%-!<.)87,*!&'!*'!<.,,.0'A!!!

! B! 5! !B! *! 1! 3!5! +! 2! %!! -! <! #!

!

!!!!!!().1*1&,&$9!(!L!!!-c<c3c%c!!!c!! ! !!!!#c*c1c+c2c!! ! ! ! !

J: `).8!$%-!-*),&-)!=ST!'$729/!Jd!.<! $%-!)-'(.#2-#$'!%*2!8&'+*))&*3-'! &#! $%-!(*'$:!e9! *#*,9'&#3! $%&'! 3).7(! .<! (*$&-#$'! 0&$%! (..)! .1'$-$)&+! %&'$.)9/! &'! $%-)-! *#!*''.+&*$&.#! 1-$0--#! -4(.'7)-! $.! +&3*)-$$-! '8.>-! *#2! =ST_! e*'-2! .#! $%-!<.,,.0&#3!+.#$&#3-#+9!$*1,-/!+.#27+$!$%-!*(().()&*$-!'$*$&'$&+*,!$-'$:!!

! =ST! \.)8*,! !

R*''&?-!=8.>-)! N[! ]! N]!\.#5=8.>-)! [! Z! Z!

! N[! Nd! Jd!!*: ^%*$!&'!$%-!(!?*,7-_!!1: ^%*$!+.#+,7'&.#!+*#!9.7!8*>-!<).8!$%-!*1.?-!)-'7,$'_!((A&'$1@1+(B#+C(.3D(8-/%(((;%&'!$-'$!&'!$%-!#.#5(*)*8-$)&+!-67&?*,-#$!.<!$%-!=$72-#$a'!$!$-'$/!17$!+.#27+$-2!.#!#.$!#.)8*,,9!2&'$)&17$-2!2*$*:! "$! &'!7'-2! $.! $-'$! <.)! $%-!*''.+&*$&.#!1-$0--#!*!67*,&$*$&?-!2&+%.$.8.7'!?*)&*1,-!0&$%!*!67*#$&$*$&?-!?*)&*1,-:!;%-!8-$%.2!&'!'&8(,-/!f7'$!'.)$!$%-!2*$*! &#! *#! *'+-#2&#3! .)2-)/! )*#>! $%-8/! '78! 7(! $%-! )*#>'! *++.)2&#3! $.! 3).7('! *#2!+.8(*)-!$%-!?*,7-!0&$%!$%-!$*1,-!.<!+)&$&+*,!?*,7-'!<.)!^&,+.4.#!g*#>!=78!;-'$:!!!d: `.)!()*+$&'-/!2.! $%-!<.,,.0&#3!-4-)+&'-:!;%-!2*$*! &'!*!'71'-$!.<! $%-!-*),&-)!'$729:!^-! *)-! $)9&#3! $.! '--! 0%-$%-)! $%-)-! &'! *#9! *''.+&*$&.#! 1-$0--#! -4(.'7)-! $.!+&3*)-$$-!'8.>-!*#2!$%-!0-&3%$!.<!$%-!1*19:!=&#+-!$%-!'*8(,-!'&b-!&'!67&$-!'8*,,/!$%-!*(().()&*$-!$-'$!&'!*!#.#5(*)*8-$)&+!*#*,9'&':!!

!*: ^%*$!&'!$%-!#7,,!%9(.$%-'&'_!!-: F.#27+$! $%-! *(().()&*$-! '$*$&'$&+*,! $-'$! $.! ().?-! 9.7)! %9(.$%-'&':! O&'+7''! 9.7)!<&#2&#3':!

!E1+6.D1C-"(9+FGH<( !#//&I-(.D1C-"(9+FGH<(

J&"%5(A-&K5%( B#+C( J&"%5(A-&K5%( B#+C(Q:J[! ! d:]Z! !d:XZ! ! d:Z[! !d:][! ! d:YY! !d:ZN! ! d:QW! !d:JZ! ! d:JY! !d:NY! ! d:[Z! !d:NJ! ! d:[Y! !d:[[! ! J:YY! !J:XW! ! J:Q]! !J:WQ! ! J:QZ! !J:WN! ! J:QY! !J:Y]! ! J:QY! !J:QQ! ! J:Qd! !J:Qd! ! J:d[! !J:N[! ! J:[X! !

A&'$1@1+(B#+C(.&K+(8-/%(!;%&'! $-'$! &'! $%-!#.#5(*)*8-$)&+! -67&?*,-#$!.<! $%-!(*&)-2! $! $-'$/! 17$! +.#27+$-2!.#!#.$!#.)8*,,9! 2&'$)&17$-2! 2*$*:! "$! &'! +.#27+$-2! $.! $-'$! 0%-$%-)! $%-)-! &'! *#9! *''.+&*$&.#!1-$0--#!J!67*#$&$*$&?-!?*)&*1,-'!0%&+%!*)-!)-(-*$-2!8-*'7)-'!.#!$%-!'*8-!&#2&?&27*,/!.<! $%-! '*8-! $%&#3/! *$! 2&<<-)-#$! $&8-':! T'! &#2&+*$-2! 19! $%-! #*8-/! $%-! +*,+7,*$&.#!2-(-#2'!.#! $%-! '&3#!*#2! )-,*$&?-!8*3#&$72-!.<! $%-!2*$*/!#.$!.<! $%-! )-*,! ?*,7-!.<! $%-!2*$*:!!!Q: `.)!()*+$&'-/!2.! $%-!<.,,.0&#3!-4-)+&'-:!;%-!2*$*! &'!*!'71'-$!.<! $%-!-*),&-)!'$729:!^-!*)-!$)9&#3!$.!'--!0%-$%-)!$%-!&#$-)?-#$&.#!.<!%*-8*$&#&+'!+*#!&#+)-*'-!$%-!,-?-,!.<!%*-8.3,.1&#!.<!$%-!*#*-8&+!8.$%-)':!)-!&'!*#9!*''.+&*$&.#!1-$0--#!-4(.'7)-!$.!+&3*)-$$-!'8.>-!*#2! $%-!0-&3%$!.<! $%-!1*19:!=&#+-! $%-!'*8(,-! &'!67&$-!'8*,,/! $%-!*(().()&*$-!$-'$!&'!*!#.#5(*)*8-$)&+!*#*,9'&':!!

!*: ^%*$!&'!$%-!#7,,!%9(.$%-'&'_!

!1: F.#27+$!$%-!*(().()&*$-!'$*$&'$&+*,!$-'$!$.!().?-!9.7)!%9(.$%-'&':!O&'+7''!9.7)!<&#2&#3':!

!!!!!!!

"\OCh! ieJ! ied! i1!O&<<! g*#>!N! N[:Q! N[:[! ! !Y! N[:]! NN:[! ! !Nd! N[:Y! NN:[! ! !N]! N[:Z! N[:W! ! !NX! N[:Y! NN:[! ! !J[! N[:]! NN:[! ! !JX! N[:d! X:Y! ! !Z[! X:d! X:Y! ! !ZN! N[:[! NN:Y! ! !XJ! N[:Y! N[:W! ! !XQ! N[:Q! W:J! ! !XY! N[:[! ]:J! ! !NdQ! N[:J! NN:[! ! !NWW! N[:[! N[:[! ! !NX]! N[:J! N[:[! ! !

((B-/-#"$5(!"1L-$%()(!C*+%!,-+$7)-)!0&,,!2-8.#'$)*$-!%.0!$.!*#*,9'-!$%-!2*$*!7'&#3!$%-!+.8(7$-)!*#2!*2?&+-!$%-!'$72-#$'!.#!%.0!$.!&#$-)()-$!$%-!)-'7,$':!`.)!%.8-0.)>/!$%-!'$72-#$'!0&,,!+.8(,-$-!$%-!*#*,9'&'!*#2!()-(*)-!*!R.0-)R.&#$!()-'-#$*$&.#!<.)!$%-!<&#*,!()*+$&+*,!'-''&.#:!

i*>+&($*!$-)(-,&%*)*!O)!Tb8&!j.%2!;*8&,!T8*,&Z:2.+!JQ5W5[Z:!

epidemiology statistics

Documents