MS WRITTEN EXAMINATION IN BIOSTATISTICS PART II March 8, 1994: 9:00 AM to 1:00 PM INSTRUCTIONS:  a) This an open book examination. b) Answer any three questions. (Point values for each part are given in square brackets) c) Put the answers to different questions on separate sheets of paper. d) Put your CODE LETTER (not your name) on each page. e) Return the examination with a signed statement of the honor pledge on a page  separate from your answers. f) You are required to answer only what is asked in the question, not all you know  about the topic. Question 1 Irritable bowel syndrome (IBS), the most common reason for referral of patients to gastroenterologists, is characterized by abdominal pain and an altered bowel habit. In a study recently published in Gastroenterology 102:102,2108, Vassallo et al (1992) wanted to evaluate transit of solid residue through the entire, unprepared gut in patients with diarrhea- predominant IBS and to assess the relationships among colonic transit, large bowel capacitance, and stool weights. (5)a.Summarize the data presented in Table 1 of the paper, page 104. (6)b.Test the null hypothesis of homogeneity of stool weight between IBS patients and healthy subjects. Justify your choice of test(s). Provide a conclusion. (9)c.The authors examine colonic transit over a 36 hour period in Table 2. Comment on the appropriateness of their method of analysis. (See their results given on page 105.) Suggest an alternative way of testing for group differences over time. Do you agree that IBS patients 1, 2, and 6 had normal overall colonic transit"? Justify your answer. (5)d.State your best linear model to answer the investigators' goal as stated above: to assess the relationships among colonic transit, large bowel capacitance, and stool weights. Question 2. A nutritionist asks for help in designing a study of the effect of the amount of a compound (X) in a person's diet on the blood level of a metabolic product (Y). He asks a series of questions, each concerning a different possible study design. Results of an earlier study by the nutritionist, already published, will be available for exploratory analysis. Experience with similar response variables suggests that a General Linear Univariate Model (GLUM) will hold either for the response or for an appropriate transformation (to be found by examining the earlier data). (6)a.The nutritionist asks you to consider a study design involving four treatments: placebo, low, medium, and high levels of X. This leads to the following fixed-effects ANOVA model (Model 1). y1ȼȼyNȁ V21~ȼ0~ȼ0~ȼ0~ȼ1~ȼ1~ȼ0~ȼ0~ȼ1~ȼ0~ȼ1~ȼ0~ȼ1~ȼ0~ȼ0~ȼ1~60PȼLȼMȼHρ+2 e~   i) Specify C~ needed to estimate response means for the four diets.  ii) Specify C~ and h;~Ȭ0 needed to test the hypothesis of no differences among the four diets. iii) The usual (corrected) R2 V2[SSE(Model 2),2SSE(Model 1)]/[1,2SSE(Model 1)].  Describe how to modify Model 1 in order to generate Model 2. (4)b)The nutritionist mentions that he intends to collect a pre-treatment baseline (Y0). Extend Model 1 to become a traditional analysis of covariance model.  i) Clearly specify the design matrix elements and dimensions. ii) Clearly specify the parameter matrix elements and dimensions. (4)c)Further discussion raises the possibility that the relationship between the response and baseline might vary among the four groups. Hence you decide to consider a model which includes the baseline as a predictor, but allows unequal slopes for the various diets.  i) Clearly specify the design matrix elements and dimensions. ii) Clearly specify the parameter matrix elements and dimensions. (4)d)The nutritionist asks you to consider a study design involving gender (male, female) as one factor, and diet as a second factor (four levels as described in part a). Assume 4 subjects per group. If this design is used, the baseline measure will not be collected, in order to save money. Use cell mean coding for a fixed-effect factorial ANOVA.  i) Clearly specify the design matrix elements and dimensions. ii) Clearly specify the parameter matrix elements and dimensions. (7)e)The earlier data consist of Y values for ten men receiving a diet high in X, and Y values for ten men receiving a vehicle-control (placebo) diet. Given that Y is a serum concentration level, and so a ratio scale variable, certain transformations are likely candidates: Y-1 (reciprocal); Y-1/2 (reciprocal square root); log(Y), which can be thought of as Y0"; Y1/2 (square root); and Y1 (identity).  i) Specify one or two graphical displays, and one or two statistics to examine in choosing  the best transformation.  ii) Indicate very briefly how you would decide which transformation is best. iii) The obvious approach in a SAS-based analysis would be to compute the twenty  transformed values in a data step, then use a procedure such as PROC UNIVARIATE to  compute a statistic on a set of twenty values. Briedly describe the potential mistake in  this approach, and indicate a way to avoid the mistake with these data. Question 3 Two preliminary clinical studies were conducted to compare a new treatment for an infectious disease and a standard control treatment; a third study with only the control treatment was also conducted. The results for the pooled studies were as follows:  Response 3TreatmentFavorable(Unfavorable7Total Control42,18860 New17-1818 (2)a.Provide estimates of the rates of favorable response for each treatment. (3)b.Provide a 0.90 confidence interval for the new versus control ratio of odds of favorable versus unfavorable response. Use this confidence interval to test the equality of the rates of favorable response for the two treatments. Since the patients from the respective studies had different distributions of baseline characteristics, a logistic model was fit to the data for the pooled studies. The results from this analysis were as follows: :Explanatory Variable(Estimate7Standard Error Intercept*4.82;1.26 Concomitant disease (x1)) -2.76;1.20 White blood cell count (x2)) -2.50;0.94 Respiratory status (x3)) -4.81;1.26 where x1V21 if concomitant disease present at baseline, V20 if not;  x2V21 if low white blood cell count at baseline, V20 if not;  x3V21 if poor respiratory function at baseline, V20 if not. (4)c.State the relevant assumptions for this application of logistic regression. Specify the mathematical structure of this logistic regression model including the explanatory variable matrix (i.e., the X~ matrix) and the parameter vector b;~; interpret the elements of b;~. (3)d.Provide a 0.95 confidence interval for the no concomitant disease versus presence of concomitant disease ratio of the odds of favorable versus unfavorable response. (2)e.Apply a statistical test at the two-sided (a;V20.05 level) for no association of response with status for white blood cell count. (3)f.Use the logistic model to obtain an estimate of the probability of favorable response for patients without concomitant disease, low white blood cell count, or poor respiratory function; also provide an estimated probability for patients with concomitant disease, low white blood cell count, and poor respiratory function. (3)g.For the model with x1, x2, and x3, the -2*log-likelihood criterion was 44.089. For a simplified model with (x1+2x2+2x3)V2x as the only explanatory variable, the -2*log- likelihood criterion was 48.987. Apply a statistical test to evaluate the approriateness of simplification of the model with x1, x2, and x3 to the model with xV2(x1+2x2+2x3) only. Also, for the model with x only, interpret the corresponding parameter b;. (Note that xV20, 1, 2, 3). The variable xV2x1+2x2+2x3 is a prognostic score which is stongly predictive of the rate of favorable response. When adjustment is applied for this prognostic score, the response distributions for the two treatments are as follows: Prognostic score&-Response =xV2(x1+2x2+2x3)Treatment&Favorable2UnfavorableATotal  0 or 1Control)3965B44 =0 or 1New)1762B18 2 or 3Control*4512B16 =2 or 3New*260C0  (3)h.Apply a statistical test to compare the response distributions for the two treatments with adjustment for the prognostic score. (2)i.Provide a brief statement as to why the results from (b) and (h) agree or disagree. Note that the upper 0.05 and 0.025 critical values for the standard normal distribution are 1.96 and 1.645; the upper 0.05 critical values for the chi-squared distributions with d.f.V21, 2, 3, 4 are 3.84, 5.99, 7.81, and 9.49 respectively. Question 4 You have counted a total of 400 colonies of a certain species of bacteria growing on a petri dish. These colonies are distributed among 200 small squares of equal area as shown in the table below. (11)a.The literature suggests that each colony of this species produces a substance which inhibits the growth of other colonies nearby, thus causing the colonies to be rather uniformly distributed at a safe" distance from each other on the dish. Test the null hypothesis of a random distribution, with this uniform" alternative in mind. (14)b.Based on the results of part (a), choose a distribution to fit the data, and estimate its parameters. # of colonies 0123456789101112>12TOTAL # of squares 5052451813565302010200