/************************************************************************************** geecorr1.03.macro Modified by John Preisser and Bing Lu based on Richard Conrad Zink's GEECORR SAS macro for binary data, logit link for marginal mean model, and identity link for linear correlation model based upon GEE procedure of Prentice (1988), Biometrics 44, 1033-1048. The procedure provides parameter (beta) estimates for a model for the logit of the mean in terms of covariates (X) in addition to parameter (alpha) estimates for a linear model (identity link) for pairwise within-cluster correlations as a function of covariates (Z). Model-based and empirical (robust) standard errors are provided for the beta estimates, and empirical standard errors are provided for the alpha estimates. Version 1.02 of the macro has the following including extensions to alpha:: 1. Bias-corrected covariance estimators (alpha and beta) extending Mancl and DeRouen (2001) as described in Lu, Preisser et al. (submitted) 2. Regression diagnostics for clusters and observations (alpha and beta) extending Preisser and Qaqish (1996) by Preisser and Perin (submitted) This version (1.03) of the macro has the following: 1. Correction to standardization of Cooks Distance (inverse taken of covariance matrix); 2. added leverage (h_it) and residual (r_it) to output data set for predicted probabilities (probout); 3. More detail printed for range violations. ***** TECHNICAL NOTE ON THE ESTIMATION PROCEDURE ***** The fitting algorithm used is not (17) of Prentice (1988) but rather the "more detailed iterative procedure" referred to in that paper based upon (15). Like (17), the estimation equation for beta is beta_{s+1} = beta_s + A*U_{beta} For alpha, however, the detailed procedure gives alpha_{s+1} = alpha_s + (B*U_{beta} + C*U_{alpha}) in contrast to (17) which fixes B=0. Equation (15) provides a robust variance estimator based upon use of \partial Z_i/ \partial beta in B. As shown at the top of page 1040, this is a random quantity. Instead, this program uses the expected information matrix based -E(\partial Z_i/ \partial beta) The row of this matrix corresponding to the (j,k) pair in cluster i is rho_{ijk}/2 [(1-2*mu_{ij})x'_{ij} + (1-2*mu_{ik})x'_{ik}]. The macro uses these modifications in the fitting algorithm and variance estimators for the estimating equations given in (14) of Prentice: U_{beta} = \sum_i D_i`*inv(V_i)*(Y_i - p_i) and U_{alpha} = \sum_i (E)`*diag(VEE_i)*(Z_{i} - \delta_{i}) ***** GENERAL COMMENTS ON USING THE SOFTWARE ***** 1. All inputs to macro are required (except makevone). These are: xydata: The data set containing the outcome and marginal mean covariates yvar: The 0/1 binary outcome in xydata xvar: A list of marginal mean covariates in xydata. Must all be numeric id: Cluster ID (consecutive integers) in xydata zdata: The data set containing the covariates for the marginal correlation regression zvar: A list of marginal correlation covariates in zdata. Must all be numeric wdata: The data set containing cluster weights wvar: The weight variable in wdata (numeric, i.e. cluster frequencies) clsout: output data set of cluster deletion diagnostics for beta and alpha obsout: output data set of observation deletion diagnostics for beta and alpha probout: output data set of predicted probabilities of Y=1 for each observation maxiter: Maximum number of iterations for Fisher Scoring epsilon: Tolerance for convergence printrange: Print details of range violations? (Should initially check) (YES, NO) Range is always checked, however. Range is always checked and printed when variance computed. shrink: THETA or ALPHA. See Readme file for details. makevone: Option sets VEE = 1 if makevone=YES instead of w_ijk from Prentice (1988). default. See Sharples and Breslow (1992). 2. ID should go from 1 to K, where K is the number of clusters. Data should be sorted by ID before calling the macro. This goes for the W, Z, and XY datasets. (IS 1 to K needed?) 3. ID is really only used to get sample size for each cluster before being dropped. It isn't really needed as "i", which is the indicator for cluster in loops goes from 1 to K. 4. XYdata has 1 row for each observation so the matrix (vector) X (y) will be of dimension [\sum_{i=1}^K n_i] x p ([\sum_{i=1}^K n_i] x 1). Note that p is number of beta parameters, and q is number of alpha parameters. 5. Zdata has one observation for all pairs (j,k) where j epsilon); n_modi = 0; do until(n_modi > max_modi | rangeflag = 0); run SCORE(U, UUtran,ustar, beta, alpha, y, X, Z, n, w, p, q, 0, rangeflag, niter); if rangeflag = 1 then do; %if %upcase(&shrink) = THETA %then %do; if niter = 1 then alpha = J(q, 1, 0); else do; theta = theta - (0.5)**(n_modi+1) * delta; beta = theta[1:p]; alpha = theta[p+1:p+q]; end; %end; %else %if %upcase(&shrink) = ALPHA %then %do; if niter = 1 then alpha = J(q, 1, 0); else do; alpha = 0.95 * alpha; end; %end; n_modi = n_modi + 1; %if %upcase(&printrange) = YES %then %do; print "Iteration and Shrink Number", niter n_modi; %end; end; end; %if %upcase(&printrange) = YES %then %do; if n_modi > max_modi then do; print "n_modi too great, more than 20 shrinks"; goto theend; end; %end; theta = beta // alpha; delta = solve(Ustar, U); theta = theta + delta; beta = theta[1:p]; alpha = theta[p+1:p+q]; converge = (max(abs(delta)) <= epsilon); end; niter = niter - 1; if converge = 1 then run MAKEVAR(robust, naive, bias, beta, alpha, y, X, Z, n, w, p, q, id, nobs, niter); theend:finish FITPRENTICE; /************************************************************************************** MODULE: INITVALS Generates initial values for beta and alpha. Uses 0.1 for initial value for alphas. Beta uses approximate parms from a logistic regression on all observations, treating them independently INPUT y: The 0/1 binary outcome X: Marginal mean covariates n: Vector of cluster sample sizes w: Weights q: Number of marginal correlation parameters OUTPUT beta: Vector of marginal mean parameters alpha: Vector of marginal correlation parameters **************************************************************************************/ start INITVALS(beta, alpha, y, X, n, w, q,startbeta,startalpha); alpha = J(q, 1, 0.01); /* User specified inital values */ %IF %length(&startbeta)^= 0 %then %do; use &startbeta; read into startb; beta=startb`; %end; %else %do; run INITBETA(beta, y, X, n, w); %end; %IF %length(&startalpha)^= 0 %then %do; use &startalpha; read into starta; alpha=starta`; %end; *print beta; *print alpha; finish INITVALS; /************************************************************************************** MODULE: INITBETA Generates initial values for beta. Approximates logistic regression INPUT y: The 0/1 binary outcome X: Marginal mean covariates n: Vector of cluster sample sizes w: Weights OUTPUT beta: Vector of marginal mean parameters **************************************************************************************/ start INITBETA(beta, y, X, n, w); run EXPANDW(obsw, n, w); z = y + y - 1; beta = solve(X` * (X # obsw), X` * (z # obsw)); do i = 1 to 2; u = X * beta; u = 1 / (1 + exp(-u)); v = u # (1 - u) # obsw; z = X` * ((y - u) # obsw); d = solve(X` * (X # v), z); beta = beta + d; end; finish INITBETA; /************************************************************************************** MODULE: EXPANDW Generates an observed weight vector INPUT n: Vector of cluster sample sizes w: Weights OUTPUT obsw: Vector of observed weights **************************************************************************************/ start EXPANDW(obsw, n, w); obsw = J(sum(n), 1, 0); run BEGINEND(first, last, n); do i = 1 to nrow(n); obsw[first[i]:last[i]] = w[i]; end; finish EXPANDW; /************************************************************************************** MODULE: NCHOOSE2 For a value n, returns n choose 2 INPUT n: Vector of cluster sample sizes OUTPUT n choose 2 *************************************************************************************/ start NCHOOSE2(n); return((n#n - n)/2); finish NCHOOSE2; /************************************************************************************ MODULE: FINDINV Calculate the inverse for symmetric and positive definite matrix ************************************************************************************/ start FINDINV (A); AHALF=ROOT(A); GINV=ginv(AHALF); AINV=GINV*GINV`; return (AINV); finish; /************************************************************************************** MODULE: VEERDB Embedded within SCORE MODULE to do the following: (i) create VEE, R, and DB matrices (ii) check to see if correlation is within range based on marginal means INPUT gamma_c: correlation vector for cluster mu_c: mean vector for cluster X_c: Covariates for beta estimating equations flag: Performs an Eigenanalysis of B to see if positive definite. Only called when computing the variance at the end (0 = no, 1 = yes). Prints warning for each cluster violation. n: vector of cluster sizes i: index for cluster p: dimension of X OUTPUT VEE: working variances for alpha estimating equatinons R: residual vector for alpha estimating equations DB: -E(d(corr)/dbeta) (i.e. Q) matrix rangeflag: Checks to see if correlation is within range based on marginal means (0 = in range, 1 = out of range). See Prentice (1988). **************************************************************************************/ start VEERDB(VEE,R,DB,rangeflag,mu_c,gamma_c,X_c,y_c,flag,n,i,p,niter); VEE = J(NCHOOSE2(n[i]), 1, 0); R = J(NCHOOSE2(n[i]), 1, 0); DB = J(NCHOOSE2(n[i]), p, 0); l = 1; do j = 1 to n[i] - 1; do k = j + 1 to n[i]; rho_upper_limit = min(sqrt((mu_c[j]*(1-mu_c[k]))/(mu_c[k]*(1-mu_c[j]))), sqrt((mu_c[k]*(1-mu_c[j]))/(mu_c[j]*(1-mu_c[k])))); rho_lower_limit = max(-sqrt((mu_c[j]*mu_c[k])/((1-mu_c[j])*(1-mu_c[k]))), -sqrt(((1-mu_c[j])*(1-mu_c[k]))/(mu_c[j]*mu_c[k]))); * RANGE CHECKS ; if ((gamma_c[l] >= rho_upper_limit) | (gamma_c[l] <= rho_lower_limit)) & flag = 0 then do; rangeflag = 1; %if %upcase(&printrange) = YES %then %do; print "Range Violation Detected for Cluster and Pair:", niter i j k; CovarJ = X_c[j,]; CovarK = X_c[k,]; print covarj covark; corrjk = gamma_c[l]; margmeanj = mu_c[j]; margmeank = mu_c[k]; print rho_lower_limit corrjk rho_upper_limit margmeanj margmeank; %end; * return; /* this statement breaks the loop when there's a range violation*/ end; if ((gamma_c[l] >= rho_upper_limit) | (gamma_c[l] <= rho_lower_limit)) & flag = 1 then do; print "Last Update pushes Parms Out of Range."; print "Range Violation Detected for Cluster and Pair:", niter i j k; end; %if %upcase(&makevone) = YES %then %do; VEE[l] = 1; * print &makevone; %end; %else %do; VEE[l] = 1 + ((1-2*mu_c[j])*(1-2*mu_c[k])*gamma_c[l]) / sqrt(mu_c[j]*(1-mu_c[j])*mu_c[k]*(1-mu_c[k])) - gamma_c[l]**2; %end; DB[l,] = GETROWB(mu_c, gamma_c[l], j, k, X_c); R[l] = ((y_c[j] - mu_c[j])*(y_c[k] - mu_c[k])) / sqrt(mu_c[j]*(1-mu_c[j])*mu_c[k]*(1-mu_c[k])) - gamma_c[l]; l = l + 1; end; end; finish; /************************************************************************************** MODULE: SCORE Generates the score matrix for each cluster and approximate information to be used to estimate parameters and generate standard errors INPUT beta: Vector of marginal mean parameters alpha: Vector of marginal correlation parameters y: The 0/1 binary outcome X: Marginal mean covariates Z: Marginal correlation covariates n: Vector of cluster sample sizes w: Weights p: Number of marginal mean parameters q: Number of marginal correlation parameters flag: Performs an Eigenanalysis of B to see if positive definite. Only called when computing the variance at the end (0 = no, 1 = yes). Prints warning for each cluster violation. rangeflag: Checks to see if correlation is within range based on marginal means (0 = in range, 1 = out of range). See Prentice (1988). OUTPUT U: Score vector UUtran: Sum of U_i*U_i` across all clusters Ustar: Approximate information matrix **************************************************************************************/ start SCORE(U, UUtran, Ustar, beta, alpha, y, X, Z, n, w, p, q, flag, rangeflag, niter); U = J(p+q, 1, 0); UUtran = J(p+q, p+q, 0); Ustar = J(p+q, p+q, 0); run BEGINEND(firstx, lastx, n); run BEGINEND(firstz, lastz, NCHOOSE2(n)); do i = 1 to nrow(n); X_c = X[firstx[i]:lastx[i],]; y_c = y[firstx[i]:lastx[i]]; if n[i] = 1 then Z_c = 0; else Z_c = Z[firstz[i]:lastz[i],]; if n[i] = 1 then do; mu_c = 1 / (1 + exp(-X_c * beta)); U_c = X_c` * (y_c - mu_c); UUtran_c = U_c * U_c`; Ustar_c = X_c` * (mu_c * (1 - mu_c)) * X_c; U[1:p] = U[1:p] + U_c * w[i]; UUtran[1:p, 1:p] = UUtran[1:p, 1:p] + UUtran_c * w[i]; Ustar[1:p, 1:p] = Ustar[1:p, 1:p] + Ustar_c * w[i]; end; else do; U_c = J(p+q, 1, 0); Ustar_c = J(p+q, p+q, 0); mu_c = 1 / (1 + exp(-X_c * beta)); gamma_c = Z_c * alpha; run VEERDB(VEE,R,DB,rangeflag,mu_c,gamma_c,X_c,y_c,flag,n,i,p,niter); C = CREATEC(X_c, mu_c); B = CREATEB(mu_c, gamma_c, n[i]); A = CREATEA(mu_c, y_c); * Check for pos def of B; if flag = 1 then do; if min(eigval(B)) <= 0 then do; print"Cluster:" i, "Var(Y) is not Positive-Definite", "Joint Distribution Does Not Exist"; end; end; INVB=Findinv(B); U_c[1:p] = C` * INVB * A; *print VEE; U_c[p+1:p+q] = Z_c` * (R # (1 / VEE)); UUtran_c = U_c * U_c`; Ustar_c[1:p, 1:p] = C` * INVB * C; * when commented out its treat alpha and beta estimating equations as orthogonal; *Ustar_c[p+1:p+q, 1:p] = Z_c` * (DB # (1 / VEE)); Ustar_c[p+1:p+q, p+1:p+q] = Z_c` * (Z_c # (1 / VEE)); U = U + U_c * w[i]; UUtran = UUtran + UUtran_c * w[i]; Ustar = Ustar + Ustar_c * w[i]; end; end; rangeflag = 0; finish SCORE; /************************************************************************************** MODULE: GETROWB Generates a row of the -E(d(corr)/dbeta) (ie Q) matrix INPUT mu: marginal means for cluster i gamma: pairwise correlation for outcome j and k for cluster i j: indicator for mean k: indicator for mean X: covariate matrix for cluster i OUTPUT Row of -E(d(corr)/dbeta) (ie Q) matrix **************************************************************************************/ start GETROWB(mu, gamma, j, k, X); row = (gamma/2) * ((1-2*mu[j])*X[j,] + (1-2*mu[k])*X[k,]); return(row); finish GETROWB; /************************************************************************************** MODULE: GETPAIRWISE Generates pairwise marginal means with marginal means and CORR INPUT muj: marginal mean for outcome j for cluster i muk: marginal mean for outcome k for cluster i gamma: linear predictor between outcome j and k for cluster i OUTPUT pairwise marginal means, mujk **************************************************************************************/ start GETPAIRWISE(muj, muk, gamma); mujk = sqrt(muj * muk * (1-muj) * (1-muk)) * gamma + muj * muk; return(mujk); finish GETPAIRWISE; /************************************************************************************** MODULE: CREATEA Creates residual for beta estimating equation, (Y - mu) INPUT mu: Vector of n_i marginal means y: Outcome vector for ith cluster OUTPUT residuals for beta estimating equation **************************************************************************************/ start CREATEA(mu, y); return(y - mu); finish CREATEA; /************************************************************************************** MODULE: CREATEC Creates derivative matrix for beta estimating equation, dmu/dbeta INPUT X: Covariate matrix for cluster i mu: Vector of n_i marginal means OUTPUT derivative matrix for beta estimating equation **************************************************************************************/ start CREATEC(X, mu); return(X # (mu # (1 - mu))); finish CREATEC; /************************************************************************************** MODULE: CREATEB Creates covariance matrix for beta estimating equation, var(Y) INPUT mu: Vector of n_i marginal means n: Sample size (scalar) for cluster i gamma: Vector of rho_jk between outcome j and k for cluster i OUTPUT covariance matrix for beta estimating equation **************************************************************************************/ start CREATEB(mu, gamma, n); d = mu # (1 - mu); B = diag(d); l = 1; do j = 1 to n - 1; do k = j + 1 to n; B[j,k] = GETPAIRWISE(mu[j], mu[k], gamma[l]) - mu[j] * mu[k]; l = l + 1; end; end; return(sqrsym(symsqr(B`))); finish CREATEB; /************************************************************************************** MODULE: RESULTS Creates printed output to screen of parameters and other information INPUT beta: Vector of marginal mean parameters alpha: Vector of marginal correlation Parameters robust: Robust covariance matrix for beta and alpha naive: Naive (Model-Based) covariance matrix for beta niter: Number of iterations until convergence n: Vector of cluster sample sizes w: Weights OUTPUT To Screen **************************************************************************************/ start RESULTS(beta, alpha, robust, naive, bias, niter, n, w); p = nrow(beta); q = nrow(alpha); print 'Prentice Method (1988)',, 'Number of Clusters:' (sum(w)), 'Maximum Cluster Size:' (max(n)), 'Minimum Cluster Size:' (min(n)), 'Number of Iterations:' niter[label=''], "Outcome Variable: &yvar"; print '***** Robust and Bias-corrected Standard Errors *****'; var = {&xvar}`; parm = beta; stderr = sqrt(vecdiag(robust[1:p, 1:p])); chisq = (beta / stderr) ## 2; pvalue = 1 - probchi(chisq, 1); bstd = sqrt(vecdiag(bias[1:p, 1:p])); bchi = (beta / bstd) ## 2; bpval = 1 - probchi(bchi, 1); print 'Marginal Mean Parameter Estimates',, var parm stderr chisq pvalue bstd bchi bpval; var = {&zvar}`; parm = alpha; stderr = sqrt(vecdiag(robust[p+1:p+q, p+1:p+q])); chisq = (alpha / stderr) ## 2; pvalue = 1 - probchi(chisq, 1); bstd = sqrt(vecdiag(bias[p+1:p+q, p+1:p+q])); bchi = (alpha / bstd) ## 2; bpval = 1 - probchi(bchi, 1); print 'Marginal Correlation Parameter Estimates',, var parm stderr chisq pvalue bstd bchi bpval; print '***** Naive (Model-Based) Standard Errors *****'; var = {&xvar}`; parm = beta; stderr = sqrt(vecdiag(naive)); chisq = (beta / stderr) ## 2; pvalue = 1 - probchi(chisq, 1); print 'Marginal Mean Parameter Estimates',, var parm stderr chisq pvalue; finish RESULTS; /************************************************************** /************************************************************** MODULE: INVBIG compute (A - mm`)^{-1}c without performing the inverse directly INPUT ainvc: inverse of matrix A times vector c ainvm: inverse of matrix A times matrix (with low no. of columns) M M: matrix of eigen column vectors m1,m2, .. c: vector start: of do loop end: of do loop, rank of X **************************************************************/ Start INVBIG(ainvc,ainvm,m,c,start,end); do i=start to end; b = ainvm[,i]; bt = b`; btm = bt*m; btmi = btm[,i]; gam = 1 - btmi; bg = b/gam; ainvc = ainvc + bg*(bt*c); if i< end then ainvm = ainvm + bg*btm; end; finish INVBIG; /************************************************************************************** MODULE: MAKEVAR Creates covariance matrix of beta and alpha. INPUT beta: Vector of marginal mean parameters alpha: Vector of marginal correlation parameters y: The 0/1 binary outcome X: Marginal mean covariates Z: Marginal correlation covariates n: Vector of cluster sample sizes w: Weights p: Number of marginal mean parameters q: Number of marginal correlation parameters OUTPUT robust: Robust covariance matrix for beta and alpha naive: Naive (Model-Based) covariance matrix for beta **************************************************************************************/ start MAKEVAR(robust, naive, bias, beta, alpha, y, X, Z, n, w, p, q, id,nobs,niter); run SCORE(U, UUtran, Ustar, beta, alpha, y, X, Z, n, w, p, q, 1, 0, niter); inustar = FINDINV(Ustar); robust = inustar * UUtran * inustar; naive = FINDINV(Ustar[1:p,1:p]); /* if B=0, this is upper left matrix of inustar */ naivealp = FINDINV(Ustar[p+1:p+q,p+1:p+q]); /* for use in cluster diagnostics */ * new commands to compute INV(I - H1); Call Eigen(evals1,evecs1,naive); sqrevals1 = sqrt(evals1); sqe1 = evecs1*diag(sqrevals1); * new commands to compute INV(I - H2); Call Eigen(evals2,evecs2,naivealp); sqrevals2 = sqrt(evals2); sqe2 = evecs2*diag(sqrevals2); /**************************************************** * Bias-corrected variance and influence diagnostics * ****************************************************/ U = J(p+q, 1, 0); UUbc = J(p+q, p+q, 0); /* alternative way */ biascov = J(p+q, p+q, 0); run BEGINEND(firstx, lastx, n); run BEGINEND(firstz, lastz, NCHOOSE2(n)); KK = nrow(n); dfbetcls = J(KK,p,0); /* dfbeta for a cluster */ dfalpcls = J(KK,q,0); /* dfalpha for a cluster */ Hbeta = J(KK,1,0); /* cluster leverage for beta */ Halpha = J(KK,1,0); /* cluster leverage for alpha */ cookdcls = J(KK,1,0); /* cooks distance overall */ cookbetcls = J(KK,1,0); /* cooks distance for beta */ cookalpcls = J(KK,1,0); /* cooks distance for alpha */ /*for observation level deletion diagnostics*/ Dfbetobs=J(nobs,p,0); Dfalpobs = J(nobs,q,0); cookdobs = J(nobs,1,0); CookBetObs = J(nobs,1,0); CookAlpObs = J(nobs,1,0); Clusterid = J(nobs,1,0); ClusterN = J(nobs,1,0); Levbetobs = J(nobs,1,0); /* observation leverage */ do i = 1 to KK; /*cluster*/ X_c = X[firstx[i]:lastx[i],]; y_c = y[firstx[i]:lastx[i]]; mu_c = 1 / (1 + exp(-X_c * beta)); if n[i] = 1 then do; B = mu_c * (1 - mu_c); C = X_c #B; Hi1=C*naive*X_c`; U_c = X_c` * (y_c - mu_c)/(1 - Hi1); UUbc_c = U_c * U_c`; U[1:p] = U[1:p] + U_c * w[i]; UUbc[1:p, 1:p] = UUbc[1:p, 1:p] + UUbc_c * w[i]; * Hi2 is not defined for clusters of size 1; Hbeta[i] = Hi1; /* Halpha is set to 0 by default */ dfbetclsi = naive*U_c; Dfbetcls[i,] = dfbetclsi`; Dfbetobs[firstx[i],] = dfbetclsi`;/* dfbetobs = dfbetcls when n[i]=1*/ /* dfalpcls[i,] value is 0 by default */ dfalpclsi = dfalpcls[i,]`; dfalpobs[firstx[i],] = dfalpclsi`;/* dfalpobs = dfalpcls when n[i]=1*/ /*observation level cook's d*/ Dfobsboth = dfbetclsi//dfalpclsi; cookdobs[firstx[i]] = Dfobsboth`*robust*Dfobsboth/(p+q); cookbetobs[firstx[i]] = Dfbetcls[i,]*robust[1:p,1:p]*dfbetclsi/p; cookalpobs[firstx[i]] = Dfalpcls[i,]*robust[p+1:p+q,p+1:p+q]*dfalpclsi/q; levbetobs[firstx[i]] = Hbeta[i]; end; /* if n[i] = 1 */ else do; U_c = J(p+q, 1, 0); Z_c = Z[firstz[i]:lastz[i],]; gamma_c = Z_c * alpha; * commands for beta; C = CREATEC(X_c, mu_c); B = CREATEB(mu_c, gamma_c, n[i]); A = CREATEA(mu_c, y_c); INVB=Findinv(B); CtinvB = C`*invB; Hi1=C*naive*CtinvB; Hbeta[i] = trace(Hi1); /* cluster leverage for beta */ * commands to compute generalized inverse - beta; ai1 = invB; mm1 = C*sqe1; ai1A=ai1*A; ai1m1=ai1*mm1; *print 'INVBIG(ai1A, ai1m1, mm1, A, 1, p)'; *print ai1A ai1m1 mm1 A p; run INVBIG(ai1A,ai1m1,mm1,A,1,p); U_c[1:p] = C`*ai1A; * commands for alpha; VEE = J(NCHOOSE2(n[i]), 1, 0); R = J(NCHOOSE2(n[i]), 1, 0); * DB = J(NCHOOSE2(n[i]), p, 0); l = 1; do j = 1 to n[i] - 1; do k = j + 1 to n[i]; %if %upcase(&makevone) = YES %then %do; VEE[l] = 1; %end; %else %do; VEE[l] = 1 + ((1-2*mu_c[j])*(1-2*mu_c[k])*gamma_c[l]) / sqrt(mu_c[j]*(1-mu_c[j])*mu_c[k]*(1-mu_c[k])) - gamma_c[l]**2; %end; * DB[l,] = GETROWB(mu_c, gamma_c[l], j, k, X_c); R[l] = ((y_c[j] - mu_c[j])*(y_c[k] - mu_c[k])) / sqrt(mu_c[j]*(1-mu_c[j])*mu_c[k]*(1-mu_c[k])) - gamma_c[l]; l = l + 1; end; end; Zctvee = Z_c`#(1/VEE`); Hi2=Z_c*naivealp*Zctvee; Halpha[i] = trace(Hi2); /* cluster leverage for alpha */ * commands to compute generalized inverse - alpha; mm2 = Z_c*sqe2; ai2R = R/VEE; ai2m2 = mm2#(1/VEE); run INVBIG(ai2R,ai2m2,mm2,R,1,q); U_c[p+1:p+q] = Z_c` *ai2R; UUbc_c = U_c * U_c`; UUbc = UUbc + UUbc_c * w[i]; /* cluster deletion diagnostics */ dfbetclsi = naive*U_c[1:p]; dfalpclsi = naivealp*U_c[p+1:p+q]; Dfbetcls[i,] = dfbetclsi`; Dfalpcls[i,] = dfalpclsi`; /*for observation level diagnostics*/ %IF %length(&OBSOUT)^= 0 %then %DO; run DFOBS(p,q,n, nobs, i, B,invB, C, naive, A,firstx,robust,VEE,Z_c, naivealp,R,clusterid, clustern,dfbetobs,cookbetobs,dfalpobs,cookalpobs,cookdobs); levbetobs[firstx[i]:lastx[i]] = vecdiag(Hi1); %END; end; /* loop if cluster > 1 */ Dfboth = dfbetclsi//dfalpclsi; cookdcls[i] = Dfboth`*Findinv(robust)*Dfboth/(p+q); cookbetcls[i] = Dfbetcls[i,]*Findinv(robust[1:p,1:p])*dfbetclsi/p; cookalpcls[i] = Dfalpcls[i,]*Findinv(robust[p+1:p+q,p+1:p+q])*dfalpclsi/q; * alternative way of calculating bias-corrected covariance estimator; biascov = biascov + Dfboth*Dfboth`; end; /* do i=1 to KK */ Bias = inustar * UUbc * inustar; * print bias; * print biascov; /* Request Output Datasets for diagnostics (and covariance matrices ?) */ %IF (%length(&CLSOUT) ^= 0) %THEN %DO; id = (unique(id))`; if type(id)="N" then id=char(id); C1 = {"I" "NI" "Hbeta" "Halpha" "cookdcls" "cookbetcls" "cookalpcls" &xvar &zvar}; xvar = dfbetcls; zvar = dfalpcls; i = (1:KK)`; clout = i||n||Hbeta||Halpha||cookdcls||cookbetcls||cookalpcls||xvar||zvar; create &CLSOUT from clout [rowname= id colname=c1]; append from clout [rowname= id]; print clout [rowname= id colname=c1]; %END; %IF (%length(&OBSOUT) ^= 0) %THEN %DO; obs = (1:nobs)`; if type(obs)="N" then obs=char(obs); clobsout = clusterid||clustern||levbetobs||cookdobs||cookbetobs||cookalpobs||dfbetobs||dfalpobs; C2 = {CLUSTER NI levbetobs cookdobs cookbetobs cookalpobs &xvar &zvar}; create &OBSOUT from clobsout [rowname= obs colname=c2]; append from clobsout [rowname= obs]; * print clobsout [rowname=obs colname=c2]; print "Observation Deletion Diagnostics located in &OBSOUT"; %END; %IF (%length(&PROBOUT) ^= 0) %THEN %DO; prob= 1/(1+exp(-X*beta)); stdres = (y - prob)/sqrt(prob#(1-prob)); probout= y || X || prob || stdres; C3={Y &xvar Pred_prob Std_res}; create &PROBOUT from probout [rowname=obs colname=c3]; append from probout [rowname=obs]; print "Predicted Probabilities are located in &PROBOUT"; %END; finish MAKEVAR; /************************************************************************************** MODULE: DFOBS Calculates observation level deletion diagnostics for the mean and covariance parameters. OUTPUT deletion diagnostics for alpha and beta parameters, approximated by the one step iteration. Location specified by user. ************************************************************************************** ************************************************************************************** NOTES: i indexes cluster n[i]=cluster size KK = number of clusters p= number of mean parameters (beta) q= number of covariance parameters (alpha) naive = (D' Vinv D)inv = inxwx B = covariance matrix for beta estimating equation, var(Y) X_c = X covariates for cluster obs_i = within cluster variable = vector of 1:n[i] A = y-u C = d u / d Beta = d u / d eta X = Linv X (=D from preisser notes) B = V W = Vinv naivealp = (S'Tinv S)inv = M_2 inv mu: Vector of n_i marginal means n: Sample size (scalar) for cluster i gamma: Vector of rho_jk between outcome j and k for cluster i VEE: working variances for alpha estimating equatinons = L` T L R: residual vector for alpha estimating equations = E 2 DB: -E(d(corr)/dbeta) (i.e. Q) matrix ***********************************************************************************/ start DFOBS(p,q,n,nobs,i,B,invB,C,naive,A,firstx,robust,VEE,Z_c,naivealp,R,clusterid, clustern,dfbetobs,cookbetobs,dfalpobs,cookalpobs,cookdobs); do ii = 1 to n[i]; obs_i = 1:n[i]; *vector; comp = loc(obs_i^= ii); /*for beta*/ /**************************************/ vii = B[comp, ii]; wii = invB[comp, ii]; Wti = invB[comp, comp]-(wii*wii`)/invB[ii, ii]; WV = wti*vii; Ctilda = C[ii, ]`-C[comp, ]`*WV; Htilda = Ctilda`*naive*Ctilda; Stilda = B[ii, ii]-vii`*WV; Atilda = A[ii]-A[comp]`*WV; hdobs = Ctilda*Atilda/(Stilda-Htilda); dfbetii = naive*hdobs; dfbetobs[firstx[i]- 1 + ii, ] = dfbetii`; /*Cluster information*/ clusterid[firstx[i]- 1 + ii] =i; clustern[firstx[i]- 1 + ii] =n[i]; /**************************************/ /* Cook's D normalized by robust variance estimate */ *cbetobs = (hdobs`*robust[1:p,1:p]*hdobs)/p; cbetobs = (dfbetii`*Findinv(robust[1:p,1:p])*dfbetii)/p; cookbetobs[firstx[i] -1 + ii] = cbetobs; /*for alpha*//* if n[i]=2 special treatment is needed because there is only one row in z*/ /**************************************/ dimobs = n[i]-1; if n[i] > 2 then do; obs = J(dimobs,1,0); comp = J(NCHOOSE2(dimobs),1,0); end; else if n[i] =2 then do; comp = .; obs = 1;end; count=0; ncomp=1; nii=1; /*routine to make list of observations to be removed*/ do obsj = 1 to n[i] - 1; do obsk =obsj+1 to n[i]; count=count+1; if ( ii^=obsj & ii^=obsk) then do; comp[ncomp]=count; ncomp=ncomp+1; end; else do; obs[nii]=count; nii=nii+1; end; end; end;/*end routine*/ VEEDIAG=diag(VEE); if n[i] > 2 then do; veeDCOMPinv = diag(1/VEE[comp]); veeveeinv = VEEDIAG[obs,comp]*veeDCOMPinv; Ztilda = Z_c[obs,]-veeveeinv*Z_c[comp,]; VEEtilda = VEEDIAG[obs,obs]-veeveeinv*VEEDIAG[comp,obs]; H2tilda = Ztilda*naivealp*Ztilda`; Rtilda = R[obs]-veeveeinv*R[comp]; end; /* if n[i] > 2 */ else if n[i] = 2 then do; Ztilda = Z_c[obs,]; VEEtilda = VEEDIAG[obs,obs]; H2tilda = Ztilda*naivealp*Ztilda`; Rtilda = R[obs]; end; /* if n[i] = 2 */ ginvv = VEEtilda - H2tilda; hadobs = Ztilda`*ginv(ginvv)*Rtilda; dfalphaii = naivealp*hadobs; Dfalpobs[ firstx[i] -1 + ii, ] = Dfalphaii`; * calpobs = (hadobs`*robust[p+1:p+q,p+1:p+q]*hadobs)/q; calpobs = (dfalphaii`*Findinv(robust[p+1:p+q,p+1:p+q])*dfalphaii)/q; cookalpobs[firstx[i] -1 + ii] = calpobs; Dfboth = dfbetii//dfalphaii; cookdobs[firstx[i]-1+ii] = Dfboth`*Findinv(robust)*Dfboth/(p+q); end; finish DFOBS; /**************************************************************************************/ run GRABDATA(y, X, id, n, Z, w, maxiter, epsilon, nobs); run FITPRENTICE(beta, alpha, robust, naive, bias, niter, converge, y, X, Z, n, w, maxiter, epsilon, id, nobs, startbeta, startalpha); if converge = 1 then run RESULTS(beta, alpha, robust, naive, bias, niter, n, w); else if converge = 0 then print'ALGORITHM DID NOT CONVERGE'; /*************************************************************************************/ %mend PRENTICEcorrJSP; /*************************************************************************************/