¦SPECIAL BASIC DOCTORAL LEVEL WRITTEN EXAMINATION IN BIOSTATISTICSÈ ¦PART IIÈ ¦JANUARY 23-30, 1992È ‚INSTRUCTIONSƒ: ™a)™ This is an open-book take home" examination. ™b)™ Answer any four (but only four) of the five questions which follow. ™c)™ Put the answers to different questions on separate sets of papers. ™d)™ Since your papers may be xeroxed for back-up purposes, type or write with a paper ™™ and pen/pencil combination that will xerox clearly. Do not, for example, use a hard ™™ pencil on yellow paper. ™e)™ Most questions should be answered in the equivalent of less than five typewritten pages ™™ (300 words per page) and under no circumstances will more than the first 10 typewritten ™™ pages or the equivalent be read by the grader. ™f)™ Put your code letter, not your name, on ‚eachƒ page. ™g)™ Return the examination with a signed statement of honor pledge on a page separate from ™™ your answers. In recognition of and in the spirit of the honor code, I certify that I have neither given nor received aid on this examination and that I will report all Honor Code violations observed by me." (Signed) › þè„Question 1.… ‚Backgroundƒ ™A design has been developed for a national opinion survey of adults (21 years and older) living in the coverage areas of rural community hospitals. The main objective of the survey is to determine how highly these adults value their local hospitals. In general the design calls for obtaining 10,000 telephone interviews in 100 randomly chosen coverage areas. ™The sample will be chosen from the noninstitutionalized population of adults living in communities where hospitals are experiencing threatening operating margins or where they have been recently converted over to provide a different type of health care. The selection process will be as follows. A national list of eligible hospitals will be created and then divided into the following groups: ™(1)™ Negative profit margin; ™(2)™ Positive profit margin of 0.0Û,22.0 percent; ™(3)™ Positive profit margin of more than 2.0 percent; and ™(4)™ Converted to a different type of health care facility (e.g., nursing home). Within each group a random sample of 25 hospitals will be chosen with a selection probability for each hospital that is proportional to the number of beds for patient care in the hospital. The health care coverage area for each selected hospital (i.e., the geographic area from which most of the hospital's patients come) will next be determined and a random digit sample of telephone numbers will be randomly chosen from each area. These simple random samples of numbers will be purchased from the A. C. Nielson Company, which purges sequences of known nonresidential numbers prior to selection and thereby guarantees that approximately 50 percent of the selected numbers they provide will be working residential numbers. With an assumed response rate of 70 percent among working numbers, approximately 29,000 numbers will be required to yield 10,000 participating households. One randomly chosen adult will be interviewed in each participating household, thus completing the sampling process. ‚The Problemƒ ™Stating any additional assumptions, and explicitly defining any terms that you may need, answer the following. (4)™a.™Using standard sampling terminology, describe the proposed design in a few tightly worded þîsentences. þî(4)™b.™Explicitly formulate and explain the selection probability for each sample adult. (4)™c.™A measure of intraclass correlation (Roh) is needed to assess the likely statistical quality of þîestimates from the proposed design for the proportion (P) of adults in the survey population who possess some attribute (e.g., an opinion tied to how much the adult values the local hospital). Since at this point no data are available to directly estimate Roh, a value will be þìconjectured as ™ RohÛV2R«2È/12P(1Û,2P), where ™ RÛV2P«(Max)É°Ûa;ÈÛ,2P«(Min)É°Ûa;È þìis the range (among all coverage areas) of the uniformly distributed values of P¬Ûa;È, the proportion of adults in the Ûa;-th coverage area who possess the attribute tied to P. Briefly but thoroughly discuss the rationale for this formulation of Roh. þî(4)™d.™The actual respondent sample size in this survey will be 10,000. Assuming that PÛV20.5 and þîthat the coverage-area-specific proportions of this attribute vary uniformly over a range of size 0.1, determine how large a simple random sample of adults from the entire survey population would be needed to produce the same variance of the overall estimate of P as the proposed design. Also, how large is this effective sample size" for the sample in each of the four hospital groups? þî(4)™e.™Given the proposed design and assuming a two-sided alternative with Type I error at 0.05, þîdetermine the statistical power to detect a difference (in the same attribute proportion as in part d) of 5 percentage points between any two of the four hospital groups. þî(5)™f.™Emphasizing breadth rather than depth, briefly critique the proposed sampling design, þîconsidering such things as frame error and the precision of sample estimates. Can you think of any reasonable alternatives to this basic design? þî„Question 2.… ‚Backgroundƒ ™A study of the 123 children who participated in Head Start in Chapel Hill in the summer of 1967 was undertaken in order to discover what measurements made during the program would be predictive of performance in first grade. The resulting dataset includes the following predictor variables: ™PREDICT™Head Start teacher's prediction ™™(1ÛV2excellent, 2ÛV2very good, 3ÛV2average, 4ÛV2poor, 5ÛV2failure) ™WOOD1™Woodlawn Scale of School Adaptation (5ÛV2best Û!2 20ÛV2worst) ™BENDER™Bender-Gestalt Visual Motor Test (1ÛV2best Û!2 30ÛV2worst) ™DAP™Draw-a-Person Test (an IQ measure) ™WPPSI™Wechsler Preschool and Primary Scale of Intelligence ™WPPSIV™WPPSI Verbal" IQ (subtest of WPPSI) ™SPPSIP™WPPSI Performance" IQ (subtest of WPPSI) ™(The WPPSI was given only to a random sample of children of each sex.) ™A year later, at the end of the first grade, several of the children had been lost to follow-up because they were not enrolled in any of the Chapel Hill-Carrboro schools. Variables recorded for those who remained included the following: ™EVAL™First Grade teacher's evaluation (same scale as PREDICT) ™WOOD2™Woodlawn Scale (repeated) ™READING™First Grade final score in reading (1ÛV2worst Û!2 9ÛV2best) ™LANGUAGE™ditto in language ™WRITING™ditto in writing ™MATH™ditto in mathematics ™MEAN™Mean of READING, LANGUAGE, WRITING, and MATH ™PMA™Primary Mental Abilities IQ Test ™PMAV™Verbal subtest of PMA ™PMANF™Number Facility subtest of PMA ™PMAPS™Perceptual Speed subtest of PMA ™PMASR™Spatial Relations subtest of PMA ™Other variables in the dataset are SEX (0ÛV2female, 1ÛV2male), ID (within SEX), SCHOOL (coded 1, 2, 3, or 4 to indicate the four elementary schools operating in the district at that time), and FINWOOD (to be ignored). All the children were black except for 3 unidentified whites. ‚Assignmentƒ (15) ™a.™Prepare a descriptive study of these data. (10)™b.™Investigate what Head Start measurements were predictive of first-grade performance þî(especially EVAL and MEAN). þîNote 1: Aim your report at school administrators. Include a discussion of any differences between the two sexes and among the four schools. Also include suitable tables and graphs. Note 2: The dataset is available, as raw data in the form of a SAS DATA step, with DSN equal to UBRARY.EXAM on the UNC mainframe. „Question 3.… Consider an IÛ-2JÛ-2K contingency table obtained by classifying the outcomes of the variables V¬1È, V¬2È, V¬3È. Suppose that given V¬3ÈÛV2k, variables V¬1È and V¬2È are independent: i.e., ™p¬ijkÈÛV2P(V¬1ÈÛV2iÛ|8V¬3ÈÛV2k) P(V¬2ÈÛV2jÛ|8V¬3ÈÛV2k) P(V¬3ÈÛV2k) . þìNotation…™data {y¬ijkÈ} iÛV21, Û\5, I ; jÛV21, Û\5, J ; kÛV21, Û\5, K ™{y¬ijkÈ}Û[2M¬IJKÈ(N, {p¬ijkÈ}) ™Ûm;¬ijkÈÛV2E(y¬ijkÈ)ÛV2Np¬ijkÈ þì þä<@èa.™Show that p¬ijkÈÛV2ºp¬i+kÈ p¬+jkÈÂp¬++kÈË and hence that the expected frequencies satisfy ™™Ûm;¬ijkÈÛV2ºÛm;¬i+kÈ Ûm;¬+jkÈÂÛm;¬++kÈË . b.™What log-linear model satisfies this constraint? Derive your answer. c.™What are the minimal sufficient statistics in this case? d.™Obtain the number of degrees of freedom associated with this model. e.™Find the log-linear model corresponding to V¬1È being independent of (V¬2È, V¬3È), i.e., þìîp¬ijkÈÛV2P(V¬1ÈÛV2i) P(V¬2ÈÛV2j, V¬3ÈÛV2k), and hence show that the fitted values are given by Ûm;µ^ȬijkÈÛV2ºy¬i++È y¬+jkÈÂNË . þìîf.™This model could be appropriate in the following situation. The data below are from a General þîSocial Survey conducted by the National Opinion Research Center of the University of Chicago in 1974. Married couples are cross-classified by the sex of the respondent (to the questionnaire), highest educational qualification (HEQ) obtained by the husband, and HEQ obtained by the wife. The results are as follows þî ™ Sex of Respondent™›™Husband's HEQ™-›™5Wife's HEQ ™ ›<™0(a)™9(b)™B(c) ™ Male™"(a)™0135™:60™D1 ™"(b)™143™9151™C19 ™ ›<™"(c)™26™:69™C35 ™ ™ Female™"(a)™0124™:63™D1 ™"(b)™139™9219™C18 ™"(c)™21™:41™C40 a.™ÛV2 Û<8 high school diploma b.™ÛV2 high school diploma or junior college degree c.™ÛV2 Û;2 bachelor's degree In principle, with a properly conducted survey, the two educational qualification variables should be independent of the sex of the respondent. Calculate the fitted value under this model and test for goodness of fit. Examine closely any model failure. þä@<è„Question 4.… ™The contingency table shown below is from a survey of dental health status in an elderly population. The subjects in this survey were selected by stratified simple random sampling with race being the stratification factor. The variables which were observed for each subject during an examination in their homes were as follows: 1.™Caries (some present (S), none present (N)) 2.™Race (black, white) 3.™Financial Situation (favorable (F), unfavorable (U)) 4.™Salivary Flow Rate (Û<81.5 ml/min, Û>81.5 ml/min) 5.™Dentist Prior Utilization (regular (R), other (O)) 6.™Relative Perception of Mouth Appearance (good (G), bad or no opinion (B)) þì™&Perception Financial™ Salivary™Dentist Prior™&Mouth™7Black™DWhite þì›LSituation™ Flow Rate™Utilization™&Appearance™6S™81.5™R™)G™76™<58™D2™I14 ™F™ Û>81.5™R™)B™70™=1™D0™J1 ™F™ Û>81.5™O™)G™74™<23™C28™I36 ™F™ Û>81.5™O™)B™70™=1™C™¡7™J2 ™U™ Û<81.5™R™)G™71™=2™C™C™D0™I™J1 ™U™ Û<81.5™R™)B™70™<™=0™C™D1™I™J0 ™U™ Û<81.5™O™)G™76™<™=3™C14™D™I16 ™U™ Û<81.5™O™)B™72™<™=1™C™D2™I™J3 ™U™ Û>81.5™R™)G™70™<™=0™C™D0™I™J1 ™U™ Û>81.5™R™)B™70™<™=0™C™D0™I™J0 ™U™ Û>81.5™O™)G™72™<™=1™C10™D™I™J9 þì›L™U™ Û>81.5™O™)B™71™<™=0™C™D0™I™J2 ™™TOTAL™™)™652™;218™=™B130™D™H142 ™(3)™a.™ Assess the association between caries status and race. Provide an 0.95 confidence interval þî for the extent to which the odds of some versus none for caries is larger for blacks than whites. State assumptions and interpret results. þî ™(4)™b.™ Use statistical tests at the 0.05 level to evaluate the similarity of black and white þî populations for the distributions of financial situation, salivary flow rate, dentist prior utilization, and perception of mouth appearance. State assumptions, interpret results, and discuss any implications of results to conclusions for (a). þî ™(4)™c.™ Under minimal assumptions, assess the association between caries status and race with þî adjustment for financial situation, salivary flow rate, dentist prior utilization, and perception of mouth appearance; and evaluate the homogeneity of such association relative to the structure for adjustment. State assumptions and interpret results. þî ™(4)™d.™ Under minimal assumptions, assess the association of caries status with financial situation, þî salivary flow rate, dentist prior utilization, and perception of mouth appearance, respectively, in appropriately adjusted settings. State assumptions and interpret results. þî ™(10)™e.™ Use an appropriate statistical model to describe the variation among proportions of subjects þî with some caries across the groups corresponding to the cross-classification of race, financial situation, salivary flow rate, dentist prior utilization, and perception of mouth appearance. Evaluate whether any interactions among explanatory variables are noteworthy. Use the model to determine predicted proportions for some caries among subjects in the respective cross-classification groups and discuss the pattern of these predicted values relative to groups with greater need for dental health services. þî „Question 5.… ™An experiment has been described as follows: In an experiment designed to investigate two driver training programs, students in the eleventh grade in schools I, II, and III were trained with one program and eleventh-grade students in schools IV, V, and VI were trained with another program. Each school had its own teacher; each teacher taught at only one school. After completion of a 6-week training period, a test measuring knowledge (K) of traffic laws and driving ability (A) was administered to the 138 students in the study." The data from the study are shown in the table below. (5)™a.™If the description is not sufficiently detailed to enable you to respond the following þîquestions/problems, state any additional assumptions you feel are necessary. Describe the following for this study: Treatment design, experimental design, observational unit, experimental unit. Specify, in English-language sentences, the pair(s) of null and alternative hypotheses to be addressed by the study. þîïLL(10)™b.™Define and describe a linear model for the statistical analysis of the data, including þîappropriate definitions/descriptions of all parameters and hypotheses. þî(10)™c.™Analyze the data and present the results of your analyses. (Before presenting parameter þîestimates and tests of hypotheses, present the results of your examination of the data for anomalies and describe actions you took to cope with data anomalies, if any.) Describe the conclusions you draw from the study.