MS WRITTEN EXAMINATION IN BIOSTATISTICS PART I March 1, 1996: 8:30 a.m.,212:30 p.m. INSTRUCTIONS: a) This is a closed book examination. b) Answer any three questions. c) Put the answers to different questions on separate sets of paper. d) Put your CODE LETTER, not your name, on each page. e) Return the examination with a signed statement of the honor pledge on a page separate  from your answers. f) You are required to answer only what is asked in the questions, not all you know about the  topics.  1.A study is designed to estimate the mean and variance of blood cholesterol in a certain population. A random sample of K8 households is selected and within each such household a random sample of m8 subjects is selected; so, the total number of subjects is n8V2m8K8. Let Yi7j7 be a random variable denoting the cholesterol level of the j8-th subject in the i8-th household. Let E8(Y8i7j7) = m; and var(Y8i7j7) = s;27. Since correlation among responses within a household is expected, the following is assumed: corr(Y8i7j7,8Y8i7k7) = r; if 1 :2 j8W2k8 :2 m8 and corr(Y8i7j7,8Y8r7s7) = 0 if 1 :2 i W2 r8:2 K8. In other words, r; is the correlation between any pair of responses within the same household, and different households are assumed to be uncorrelated. Define    Y8V218n8 !0K7ȶi7V117 !0m7ȶj7V117Y8i7j7 and  S827V218n8,218 !0K7ȶi7V117 !0m7ȶj7V117 (Y8i7j7,2Y8)27. Show that the usual variance formula for Y involving uncorrelated observations does not hold and that the usual sample variance is a biased estimator of s;27 unless m8 = 1; more specifically, show that (10 pts.) var(Y8) = s;27n8 [1 + r;(m8,21)] ,  and (15 pts.) E8[8S827]8V2s;27Ȥ18,2r;(8m8,218)8n8,218ϙ  2.The geometric distribution is often used to model the probability of failure in discrete time periods (of equal length), so that  PX(x; h;) = h;x-1 (1,2h;), x = 1, 2, ..., r2, is the probability of failure in the x-th time period. In this context, the probability y; of failure after period r (or, equivalently, of no failure up through period r) is of interest. (3 pts.) a)Show that  y; = pr(X > r) = h;r . (7 pts.) b)Consider a new random variable Y defined as    X, if X = 1, 2, ..., r;  Y =V0   (r + 1) , if X > r . Let Y18, Y28, ..., Yn be a random sample of size n from pY(y; h;), the distribution of the random variable Y. Further, let the random variable M be the number of Yi's in the random sample which take the value (r + 1). Show that the likelihood function L(y~; h;) based on the data {Y18, Y28, ..., Yn} can be written in the form   !0nȶi=1 yi,2n  L(y~; h;) = h;# (1,2h;)(n-M) . (4 pts.) c)Find the maximum likelihood (ML) estimator h;^8 of h; using L(y~; h;) given in part (b). (11 pts.) d)Find the asymptotic variance of h;^8, and then use this expression to construct a large-sample 95% confidence interval for h; when n = 100, r = 3, !0nȶi=1yi = 300, and M = 50. 3.For a certain multiple choice question with m possible choices (only one of which is correct), suppose that n randomly chosen students of equal ability attempt the question. Let h; equal the probability that a student actually knows the right answer to the question. Then, (1,2h;) is the probability that a student does not really know the answer to the question (i.e., the student is guessing); in this case, the probability that a student answers the question correctly, given that he or she is guessing, is m-1. (3 pts.) a)Prove (very precisely) that the probability p; that a student answers the question correctly is equal to  p; = [1 + h;(m,21)]/m . (4 pts.) b)Let the random variable X denote the number of students out of n that answer the question correctly. Find the maximum likelihood estimator h;^8 of h;. (6 pts.) c)Given X as defined in part (b), determine (as a function of X) the structure of the critical region for a most powerful test of H08: h; = h;08 versus H18: h; = h;18, where 0 < h;08 < h;18 < 1. (4 pts.) d)For H08: h; = h;08 and H18: h; > h;08, suppose that n = 10, h;08 = 1828, m = 5, and further suppose that we observe X = x = 9 (i.e., exactly 9 out of the 10 students answer the question correctly). Given this situation, calculate the exact P-value (i.e., the exact probability of observing a result at least as rare as the one observed if H08 is true) for the test developed in part (c).  (8 pts.) e)Suppose that n = 100, h;08 = 1828, h;18 = 3848, and m = 5. For a test of size a; = 0.025, provide a reasonable value for the power using the test developed in part (c). 4.For a population of N elements, two strata are to be used for a proportionate stratified simple random sample of size n = fN that you are asked to help design. Consider the following measures for these strata:   Population#(-27 Sample  #(-  Number of Mean per #Element(-Number of27 Sampling