¦MS WRITTEN EXAMINATION IN BIOSTATISTICSÈ ¦PART IÈ ¦March 14, 1992: 9:00Û,212:00 PMÈ ™ ‚INSTRUCTIONSƒ: þì™ ™a)™This is a ‚closed bookƒ examination. ™ ™b)™Answer any ‚threeƒ questions during the three hour time period. ™ ™c)™Put the answers to different questions on separate sets of paper. ™ ™d)™Put your code letter, ‚notƒ your name, on each page. ™ ™e)™Return the examination with a signed statement of the honor pledge ™ ™™on a page separate from your answers. ™ ™f)™You are required to answer only what is asked in the questions and not ™ ™™all you know about the topics. þèì1.™Researchers believe that the prevalence (Y) of byssinosis in workers in textile manufacturing ™plants is ‚linearlyƒ related to the mean daily cotton dust level X. Under the assumption that ™zero cotton dust level implies zero prevalence, the regression equation relating the conditional ™mean of Y to an observed value x of X is ¦E(YÛ|8XÛV2x)ÛV2Ûb;x ,È ™where Ûb; is an unknown parameter and x is a fixed known constant. ‚pts.ƒ ™¡4™a)™™Given the n pairs of data points (x¬iÈ, Y¬iÈ), iÛV21, 2, Û\5, n, show that the ‚least-squares estimatorƒ ™™of Ûb; is ¦Ûb;µ^ÈÛV2 Û!0µnɶi=1Èx¬iÈY¬iÈÛ-0Û!0µnɶi=1Èx«2É°iÈ ;È ™™i.e., show that Ûb;µ^È above minimizes the function ¦Û!0µnɶi=1ȤY¬iÈÛ,2E(Y¬iÈÛ|8XÛV2x¬iÈ)ϯ2ÈÈ ™™with respect to Ûb;. ™™Now, henceforth assume that each Y¬iÈ (conditional on XÛV2x¬iÈ) is normally distributed with ™™conditional mean Ûb;x¬iÈ and conditional variance Ûs;«2È, that the Y¬iÈ's are mutually independent, ™™and that the x¬iÈ's are ‚fixed knownƒ constants (i.e., they are ‚notƒ random variables). ƒ™¡4™b)™™Find E(Ûb;µ^È). ™¡5™c)™Find Var(Ûb;µ^È). ™¡4™d)™™What is the distribution of Ûb;µ^È? ™¡8™e)™Suppose that the assumption of a straight-line model is incorrect, and that the true regression þì™™equation is really a quadratic function of the form ¦E(YÛ|8XÛV2x)ÛV2Ûb;xÛ+2Ûh;x«2È .È ™™Show that the ‚biasƒ in using Ûb;µ^È to estimate Ûb;, namely ¦E(Ûb;µ^È)Û,2Ûb; ,È ™™can be written as ¦Ûh;Ûm;«Ûm2É°3ÈÛ-0Ûm;«Ûm2É°2È ,È þì ™™where Ûm;«Ûm2É°rÈÛV2º1ÂnË Û!0µnɶi=1Èx«rÉ°iÈ . 2.™According to a certain genetic theory, the expected proportions of three genotypes in offspring ™from certain crosses of rats are as follows: ™™‚Genotype 1ƒ:™1/(3Û+2Ûh;) ; ™™‚Genotype 2ƒ:™2/(3Û+2Ûh;) ; ™™‚Genotype 3ƒ:™Ûh;/(3Û+2Ûh;) ; ™here, Ûh;(Û;20) is an unknown parameter. Suppose that n such offspring are selected at random ™and tested appropriately. Let the random variable X¬iÈ denote the number of genotype i observed, ™iÛV21, 2, 3; thus, Û!0µ3ɶi=1ÈX¬iÈÛV2n. ‚pts.ƒ ™¡3™a)™Using the fact that the joint distribution of X¬1È, X¬2È, and X¬3È is multinomial, show that X¬3È is a ™™sufficient statistic for Ûh;. 12™b)™™It is of interest to test H¬0È: Ûh;ÛV21 (the value expected under standard Mendelian theory) versus ™™H¬1È: Ûh;ÛW21. Find an ‚explicitƒ expression for the likelihood ratio statistic Ûl;µ^È for testing H¬0È versus ™™H¬1È. For ‚largeƒ n, how would you use Ûl;µ^È to test H¬0È versus H¬1È? 10™c)™™By expressing Ûl;µ^È in an appropriate form, show that the P-value for a test of H¬0È: Ûh;ÛV21 versus ™™H¬1È: Ûh;ÛW21 can be calculated using the binomial distribution. 3.™For a certain multiple choice question with m possible choices (only one of which is correct), ™suppose that n randomly chosen students of equal ability attempt the question. Let Ûh; equal ™the probability that a student actually knows the right answer to the question. Then, (1Û,2Ûh;) ™is the probability that a student does ‚notƒ really know the answer to the question (i.e., the ™student is ‚guessingƒ); in this case, the probability that a student answers the question ™correctly, ‚givenƒ that he or she is guessing, is m«-1È. ‚pts.ƒ ™¡4™a)‚™Proveƒ (‚very preciselyƒ) that the probability Ûp; that a student answers the question correctly is þì™™equal to ¦Ûp;ÛV2¤1Û+2Ûh;(mÛ,21)ÏÛ-0m .È þ왡6™b)™Let the random variable X denote the number of students out of n that answer the question ™™correctly. Find the maximum likelihood estimator Ûh;µ^È of Ûh;. ™¡5™c)™Show that Ûh;µ^È is an unbiased estimator of Ûh;, and also find the variance of Ûh;µ^È. 10™d)™™For the special case mÛV22 and Ûh;ÛV2º1Â2Ë , show that Ûh;µ^È achieves the CrameµÜÈr-Rao lower bound for ™™the variance of any unbiased estimator of Ûh;. 4.™For a population of N elements, two strata are to be used for a proportionate stratified simple ™random sample that you are asked to help develop. Consider the following measures for these ™strata: ™›™POPULATION™/›™:SAMPLE þì™™Number of™Mean per™%Element™/Number of™:Sampling þ왂Stratum™Elements™Element™%Variance™/Elements™:Rate™EElementƒ þì™1™N¬1ÈÛV2NW™¤Yʬ1È™'S«2É°1È™1n¬1È™:f¬1ÈÛV2n¬1È/N¬1È™G¤yʬ1È ™2™N¬2ÈÛV2N(1Û,2W)™¤Yʬ2È™'S«2É°2È™1n¬2È™:f¬2ÈÛV2n¬2È/N¬2È™G¤yʬ2È þì‚pts.ƒ ™¡7 ™a)™To estimate the population mean, ¤YÊ, we use the estimator ¤yʬwoÈÛV2W¤yʬ1ÈÛ+2(1Û,2W)¤yʬ2È. ™Assuming that S«2É°1ÈÛV2S«2É°2ÈÛV2S«2É°*È , show for this proportionate stratified design that the ™true variance of ¤yʬwoÈ is ¦Var(¤yʬwoÈ)ÛV2º1Û,2fÂnË S«2É°*È .È ™¡6™b)™Formulate the design effect" for ¤yʬwoÈ , given this design. Is it likely to be greater than one ™™or less than one in size? Briefly explain your answer. ™¡5™c)™Suppose that there are two stratification variables (I and II) from which we will choose one ™™for this design, and that we know the following about each of them: þì™™Stratification ™™›#Variable™¤Yʬ1È™#¤Yʬ2È þì™™™ I™100™"100 ™ II™50™"120 ™™Furthermore, suppose that S«2É°1ÈÛV2S«2É°2ÈÛV2S«2É°*È for each stratification variable. Based on this ™™information, which of the two variables would be better for stratification when the object ™™is to estimate ¤YÊ? Briefly explain your answer. 7™d)™To estimate the difference between the stratum means, DÛV2¤Yʬ1ÈÛ,2¤Yʬ2È , we use the estimator ™™dÛV2¤yʬ1ÈÛ,2¤yʬ2È . Show that the true variance of d when S«2É°1ÈÛV2S«2É°2ÈÛV2S«2É°*È is ¦Var(d)ÛV2º1Û,2fÂnË S«2É°*È ¤1/{W(1Û,2W)}Ï .È ™