BASIC DOCTORAL WRITTEN EXAMINATION IN BIOSTATISTICS PART II June 20,2June 27, 1992 INSTRUCTIONS a)This is an open-book take home" examination. b)Answer any four (but only four) of the five questions which follow. c)Put the answers to different questions on separate sets of papers. d)Since your papers may be xeroxed for back-up purposes, type or write with a paper and pen/pencil combination that will xerox clearly. Do not, for example, use a hard pencil on yellow paper. e)Most questions should be answered in the equivalent of less than five typewritten pages (300 words per page) and under no circumstances will more than the first 10 typewritten pages or the equivalent be read by the grader. f)Put your code letter, not your name, on each page. g)Return the examination with the signed statement of the Honor Pledge: In recognition of and in the spirit of the honor code, I certify that I have neither given nor received aid on this examination and that I will report all Honor Code violations observed by me." ((Signed)0 7NAME  NOTE:The datasets which this examination states are available on the mainframe are also available on a floppy disk which may be checked out briefly (for copying) from the secretary in the main Biostatistics Department office. BLUNT is in the form of a SAS data step (PROB.TWO); RECALZIT is available both as a PC SAS file (S0814I5.SSD) and as an ASCII file (S0814I5C.DAT). <>JLP~~. L P BASIC DOCTORAL WRITTEN EXAMINATION IN BIOSTATISTICS PART II June 20,2June 27, 1992 Question 1. LPThe hypothetical data shown in the accompanying table are from an occupational health study of workers in a certain industry. The jobs of the workers have been classified as involving (1) exposure to substantial mental stress, (2) exposure to excessive noise, or (3) no exposure (to either substantial mental stress or excessive noise). The health histories of the workers during their first six months of employment were reviewed to determine (1) presence or (2) absence of an illness which led to at least three consecutive lost workdays. Note that the study did not include workers with exposure to both substantial mental stress and excessive noise, nor did it include workers with exposure to other conditions which might adversely affect health (because the numbers of those types of workers were small). Background characteristics reported by the workers were gender (0 if male, 1 if female), smoking status (0 if no, 1 if yes), age (1 if age:240, 2 if 40<8age:250, 3 if age>850) and alcohol usage (0 if none, 1 if light consumer, 2 if at least moderate consumer). The data are available in the SAS dataset INDUSTRY", with variables EXPOSURE, ILLNESS, GENDER, SMOKING, AGE, and ALCOHOL as described above, and COUNT giving the number of workers in the corresponding cell of the six-way classification. a)For each gender separately, evaluate the association between no illness and no exposure (to either substantial mental stress or excessive noise as a pooled condition). Apply both 0.95 confidence intervals and statistical tests at the 0.05 level. b)Assess the extent to which an appropriate measure of the association in (a) is homogeneous for the two genders through a statistical test at the 0.05 level. c)Under minimal assumptions relative to the representativeness of the workers and their background characteristics, evaluate the association between no versus any illness and no versus any occupational exposure. d)Under minimal assumptions, evaluate the association between no versus any illness and each of the background characteristics (i.e., each of gender, age, smoking, and alcohol). e)Use a statistical model to describe the variation of rates of any illness with respect to exposure status, gender, smoking, age, alcohol status, and their interactions; evaluate goodness of fit of the model. f)For the model in (e), evaluate whether the main effects and any interactions corresponding to substantial mental stress (versus no exposure) are equal to those corresponding to excessive noise (versus no exposure). g)Under minimal assumptions, evaluate the association between exposure status and each of the background characteristics. h)Use a statistical model to describe the relationship between the distribution of exposure status and the respective background characteristics of gender, age, smoking, alcohol status and their interactions. i)For the model in (h), evaluate whether the main effects and any interactions corresponding to substantial mental stress (versus no exposure) are equal to those corresponding to excessive noise (versus no exposure). j)Summarize the findings from the analyses in (a) - (i). KL Question 2. Two laboratories, one in North Carolina and one in New York, each recruited 14 normal subjects, from whom they drew blood samples at 7 times: -30, -15, 0, 15, 30, 45, and 60 minutes after administering a standard dose of TSH (thyroid stimulating hormone). Each sample was split into two half-samples, and one was assayed by each laboratory. The results are in the SAS dataset TWOLABS". Variables C1-C7 are the TSH levels as assayed in North Carolina, and Y1-Y7 in New York, at the 7 times in order. Note that the baseline levels (times -30, -15, and 0) are not zero, because TSH is a normal component of human blood. Also included are the DATE of the blood drawing, the AGE of the subject in years, the SEX (M or F), the LAB (NC or NY), and a character subject ID. a)How closely do the two labs agree when assaying the same sample? Are there any systematic differences between their results? b)Write a brief descriptive study of TSH blood level in normal subjects, and its response to administration of the standard dose.  Question 3. There follows the description of a proposed sample survey whose main objective is to estimate the prevalence of illicit drug use by women during pregnancy in North Carolina. Prepare a concise critique of each of the four facets of this design given in (a),2(d) below. Support your discussion with results from the statistical literature, wherever it is appropriate to do so. Your critique of each facet should contain several elements. First, the discussion should point out any strengths and weaknesses of what has been proposed. Briefly suggest ways to deal with the weaknesses. Second, if you believe there are reasonable alternatives to what has been proposed, state the alternatives and sketch a strategy for choosing among the alternatives. Finally, if you feel there are important design details that are missing from this document, specifically state what would be needed to make the design description complete. a)Sampling Procedure: Discuss the choice of sampling units, the number of sampling stages, use of stratification and/or clustering, and allocation of the sample among strata/stages. b)Sample Size: Discuss the appropriateness of the approach taken to determining the needed sample size. c)Nonsampling Aspects of Survey Design: Discuss the mode of data collection, questionnaire design, approach for soliciting a response from those selected in the sample, and the data collection plan. d)Summary Assessment: The general feasibility of the design. Is it likely to achieve the survey objectives? MO Question 4. The data shown below come from a five-center, longitudinal, parallel, randomized, double-blind placebo- and active-controlled clinical trial in outpatients with a diagnosis of General Anxiety Disorder (GAD). The primary objectives stated in the protocol are: (1) to compare the efficacy of treatment with buspirone (a new drug), diazepam (active control), or placebo for the reduction of patient anxiety level, as measured by the Hamilton Anxiety scale (Ham-A), and (2) to compare the general safety and tolerance of the three treatments. [The safety and tolerance data are not a part of this problem.] Higher Ham-A scores correspond to higher levels of anxiety. Study Design: The study was conducted in five clinical centers, each using the same protocol. A total of 240 Caucasian patients were screened and subjected to a placebo run-in period; 212 subjects met the inclusion criteria and were enrolled in the study. Baseline evaluations were performed and patients were randomly assigned to treatment and given a week's supply of capsules containing the appropriate drug or placebo. At 7021 day intervals, patients returned to the clinic, were evaluated using the Ham-A, and were given a new supply of capsules. [Compliance was assessed by counting a patient's unused capsules; those data are not a part of this this problem.] Dropouts. A number of subjects withdrew before the end of the study. (The variable NUMWEEKS specifies how many weeks of the study a subject completed.) Some of the subjects withdrew for reasons related to the study (e.g., ineffective treatment), but for the purposes of this problem you may assume that data are missing completely at random. Comment on Analysis Strategy. The results of this study will be submitted to the Food and Drug Administration (FDA) as a part of a New Drug Application. The dropouts affect analysis strategy. One approach is to perform a longitudinal analysis, perhaps using a mixed model. One influential official at the FDA (not a statistician), strongly holds the opinion that the appropriate analysis strategy for data such as these (with a substantial number of dropouts) is a univariate model analysis, using the last available Ham-A score from each subject to compute the dependent variable. For example, if one were analyzing change-from-baseline, the dependent variable would be computed as: [last available Ham-A score from a subject]-[the subject's baseline Ham-A]. Such an analysis is sometimes called a last value carried forward analysis" or an endpoint analysis". Additional Explanatory Variables. Data were collected on each subject's GENDER, marital status (MARRIED), AGE, whether the subject had previous psychiatric therapy (PREVTHER), and whether the subject had previous drug therapy for anxiety (PREVDRUG). [Names in capitals, e.g., GENDER" are SAS variable names.] The relationship of anxiety to these variables, and possible treatment interactions involving these variables, are not the primary focus of the study. Any of these variables that are related to anxiety might be useful covariates. Substantial treatment interactions with any of these variables would be of interest (secondary to the primary objective of the study) and could also be useful for reducing variance estimates. Datasets. Two versions of the data are available as SAS datasets: BUSPIR" contains the data as shown in the accompanying listing, while BUSPIRRO" contains the data in rolled out" form, with a separate observation for each week. In BUSPIRRO, the variable WEEK specifies the week number and the variable HAM_A contains the Ham-A score for that week. The following table contains brief descriptions of the variables in the datasets. PR Question 5. Consider Figure 3 below, which was taken from an article in Science. It might clarify the context if the first sentence of the legend were rewritten to end \5 bulbs of ewes before (Prepartum) and after (Postpartum) giving birth." (You are welcome to read the entire article -- if you can find it -- but doing so will cast no light on the questions asked here.) a)Assuming that the points have been plotted correctly, read the data from the graph and check the analysis reported in the legend. Obviously you cannot read with sufficient precision to get exactly the same results, but note any discrepancies large enough to raise the suspicion of a mistake in the article. b)Do you have any reservations about the conclusion that the slopes are significantly different"? Is this conclusion affected by the presence of outliers? Would it hold if the data were log-transformed? Do you see any good reason to make such a transformation? c)A glance at the graph suggests that the intercepts of the two regressions may be the same. Comment on this possibility. How would your analysis change if theory required them to be the same?  d)Write a letter to the author(s) detailing your findings.