/* file: geedia_100.mac last change: 07/01/03 * correction made 07/09/03 - Division by phi added to Cook's D * minor modifications by BQ, November 2003 GEE WITH DIAGNOSTICS - Version 1.00 SAS/IML MACRO FOR LONGITUDINAL DATA ANLAYSIS AND GEE DELETION DIAGNOSTICS BY JOHN PREISSER - THE UNIVERSITY OF NORTH CAROLINA-CHAPEL HILL Purpose: Fit GEE to population averaged models with option to request cluster-deletion and observation-deletion diagnostics, described in Preisser & Qaqish (Biometrika 1996:83,551-562). The current version of the macro applies GEE with either independence or exchangeable correlation structures only. Future versions of the macro will include other working correlation structures. Parameters specified: Default value %GEE ( DATA = SAS dataset, { _LAST_ } YVAR = y-variable, { required, Y } XVAR = x-variables, { required, X } ID = id-variable, { required, ID } TIME = within cluster variable { } LINK = link function, { required, 1 } VARI = mean-variance relation, { required, 1 } N = binomial denominator variable, { required only for binomial} CORR = correlation structure, { required, 1 } M = dependence, { 1 } R = given correlation matrix, { I } SCALE= scale parameter, { } BETA = initial estimate of beta, { glim} OFFS = offset variable, { } NCOVOUT = output dataset of beta, naive se, and naive covariance matrix, { } RCOVOUT = output dataset of beta, robust se, and robust covariance matrix, { } OBSOUT = output dataset of observation diagnostics, { } CLSOUT = output dataset of cluster diagnostics, { } ITER = maximum iterations, { 20 } MONITOR = print out iterations { NO } CRIT = convergence criterion { 0.01 } ) Notes: 1. If a data set is not assigned the macro will use the last working SAS data set. 2. Must give one and only one y variable. 3. Must explicitly include the intercept variable (if desired) among the x-variables. 4. The ID variable pertains to cluster identification. 5. Link function must be one of the following: 1=identity, 2=log, 3=logit, 4=reciprocal. 6. VARI function must be one of the following: 1=constant (normal), 2=poisson, 3=binomial, 4=inv. gamma. 7. If N is not given for binomial data, the default is N=1. 8. For binary data, scale is fixed at 1. Otherwise it is estimated or a fixed value may be given with the SCALE= option. 9. The scale parameter is assumed constant across all observations. 10. NCOVOUT = provides sas output data set with covariate name in first column, beta estimate in second column, model-based (naive) standard error in third column, and model-based covariance matrix in remaining columns. 11. RCOVOUT = provides sas output data set with covariate name in first column, beta estimate in second column, empirical sandwich (robust) standard error in third column, and empirical sandwich covariance matrix in remaining columns. 12. OBSOUT provides sas output data set for observation level regression diagnostics including Cooks distance, DFBETA and leverage. 13. CLSOUT provides sas output data set for cluster level regression diagnostics Cooks distance, DFBETA and leverage. 14. MONITOR=YES will print out details of each iteration. REQUIRED MACRO SPECIFICATIONS To run the macro, the user is required to provide a sas data set with response variable (YVAR), a list of independent variables (XVAR), cluster identifier variable (ID), link (LINK) and variance (VARI) function, and working correlation structure (CORR). Currently, there are only two choices for CORR (1 - independence, 4 - exchangeable). Options TIME and M are not yet active. A future version of this macro will use them in specifying other working correlation structures. OPTIONAL SPECIFICATIONS DEN (denominator) is required for binomial data, but if not specified is assumed to equal one (used for binary data). Scale is assumed to equal 1 for binary data, but is otherwise estimated. Optionally, it may be set to equal 1 (or some other value) with the SCALE option. The BETA option may be used to specify starting values for the regression coefficient estimates, otherwise glim estimates are used as starting values. The option OFFS specifies a sas variable containing offsets (these are used for example in poisson regression with unequal exposure periods). The user can control the maximum number of iterations allowed (ITER), the convergence criteria (CRIT), and can print out each iteration (MONITOR). LINK FUNCTION The following choices of link (LINK) function are available: 1 - Identity 2 - Logarithm 3 - Logit 4 - Reciprocal VARIANCE FUNCTION The following choices of variance (VARI) function are available: 1 - Gaussian 2 - Poisson 3 - Binomial 4 - Gamma REGRESSION DIAGNOSTICS This macro provides computational formulae for case-deletion regression diagnostics (Preisser & Qaqish, 1996). These diagnostics are generalizations of Cook's distance, DFBETA and leverage for linear regression, and their counterparts for generalized linear models. They are an approximation to the difference in the estimated regression coefficients that one would obtain upon deleting either one observation or one cluster. The diagnostics are sometimes called "one-step" diagnostics because they are equal to the procedure that upon convergence of the iteratively reweighted least squares algorithm to the GEE solution, applies one more iteration after deletion of an observation (or cluster). The difference in the regression coefficients one would obtain from such a procedure is equivalent to the value of the diagnostic. Because, however, of computational formula for the diagnostics, no additional iterations are required in the computing algorithm to obtain the full set of observation-deletion and cluster-deletion diagnostics. Diagnostics are not provided automatically by the software, but may be requested with optional statements declared in the macro call. To request observation-deletion diagnostics, one set for each observation, use the statement OBSOUT. For example, OBSOUT=infobs will create a sas data set "infobs" that will contain the influence diagnostics for the observations, including cook's distance, DFBETA, observation leverage, cluster size of the cluster to which the observation belongs, fitted value, raw residual, and standardized residual (raw residual divided by the variance function). The statement CLSOUT is used to obtain cluster-deletion diagnostics. For example, CLSOUT=infcls will create a sas data set "infcls" that will contain the influence diagnostics for the clusters, including cook's distance, DFBETA, cluster leverage, cluster size, and a quadratic summary of the standardized residual vector of the cluster (the usefulness of this last statistic is yet to be investigated). The DFBETA produced by the macro, either for observations or clusters, are not standardized. However, the user may compute standardized versions using either the model-based or the empirical sandwich standard errors. These can be obtained by first requesting an output data set, using the NCOVOUT or RCOVOUT options, respectively. These output data sets are also useful for constructing contrasts and hypothesis tests, for example, using SAS/IML. ***********************************************************************/ %MACRO GEE ( DATA =_last_, YVAR =y, XVAR =x, ID =id, TIME =, LINK =1, VARI =1, N =, CORR =1, M =1, R =, SCALE=, BETA =, OFFS =, NCOVOUT=, RCOVOUT=, OBSOUT =, CLSOUT =, ITER =50, MONITOR=NO, CRIT =0.01); OPTION nocenter nodate nonumber; *****************************************************************; PROC IML worksize = 30000; RESET noprint noname fuzz; /* define link and derivative */ %IF &link = 1 %THEN %DO; %IF (%length(&offs) = 0) %THEN %DO; start linkid(lp,den,offset); return(lp); finish; %END; %ELSE %DO; start linkid(lp,den,offset); return(lp+offset); finish; %END; start deri(ui,den,nobs); return(J(nobs,1,1)); finish; %END; %ELSE %IF &link = 2 %THEN %DO; %IF (%length(&offs) = 0) %THEN %DO; start linkid(lp,den,offset); return(exp(lp)); finish; %END; %ELSE %DO; start linkid(lp,den,offset); return(exp(lp+offset)); finish; %END; start deri(ui,den,nobs); return(ui); finish; %END; %ELSE %IF &link = 3 %THEN %DO; %IF (%length(&offs) = 0) %THEN %DO; start linkid(lp,den,offset); ui = exp(lp); return((den#ui)/(1+ui)); finish; %END; %ELSE %DO; start linkid(lp,den,offset); ui = exp(lp+offset); return((den#ui)/(1+ui)); finish; %END; start deri(ui,den,nobs); return(den#ui#(1-ui)); finish; %END; %ELSE %IF &link = 4 %THEN %DO; %IF (%length(&offs) = 0) %THEN %DO; start linkid(lp,den,offset); return(1/lp); finish; %END; %ELSE %DO; start linkid(lp,den,offset); return(1/(lp+offset)); finish; %END; start deri(ui,den,nobs); return(-(ui#ui)); finish; %END; /* define variance function and deviance function */ %IF &vari = 1 %THEN %DO; start varfun(ui,den,nobs); return(J(nobs,1,1)); finish; start devi(ui,den,resid,y); return(sum((resid)##2)); finish; %END; %ELSE %IF &vari = 2 %THEN %DO; start varfun(ui,den,nobs); return(ui); finish; start devi(ui,den,resid,y); return(2#sum(y#log((y+.0001)/ui)-resid)); finish; %END; %ELSE %IF &vari = 3 %THEN %DO; start varfun(ui,den,nobs); return(ui#(1-ui/den)); finish; start devi(ui,den,resid,y); abino = y#log((y+.0001)/ui); bbino = (den-y)#log((den-y+.0001)/(den-ui)); return(2#sum(abino+bbino)); finish; %END; %ELSE %IF &vari = 4 %THEN %DO; start varfun(ui,den,nobs); return(ui#ui); finish; start devi(ui,den,resid,y); agam = -log((y+.0001)/ui); bgam = resid/ui; return(2#sum(agam+bgam)); finish; %END; /* Define (A*D) inverse vector */ %IF &link = &vari %THEN %DO; start canon(vu,di); return(sqrt(vu)); finish; %END; %ELSE %DO; start canon(vu,di); return((1/sqrt(vu))#di); finish; %END; *****************************************************************; start INFO(offset,den); R1 = {'Data File:' &DATA}; PRINT / 'GEE with deletion diagnostics ( Ver - 1.00 )', '============================================',,, R1; R1 = {'Outcome variable:'}; R2 = {'Covariates:'}; labely = { &YVAR }; PRINT labely [ROWNAME = R1], {&XVAR} [ROWNAME = R2]; %IF (%length(&OFFS)^= 0) %THEN %DO; labeloff = { &OFFS }; read all var labeloff into offset; R1 = {'Offset:'}; PRINT labeloff [ROWNAME = R1]; %END; %ELSE %DO; offset = 0; %END; _1 = {'(Identity)'}; _2 = {'(Logarithm)'}; _3 = {'(Logit)'}; _4 = {'(Reciprocal)'}; R1 = {'Link:'}; %IF (&link<1 | &link>4) %THEN %DO; PRINT &link [ROWNAME = R1 FORMAT = 2.0] {'(Invalid Option !!!)'}; %END; %ELSE %DO; PRINT &link [ROWNAME = R1 FORMAT = 2.0] _&LINK ; %END; _1 = {'(Gaussian)'}; _2 = {'(Poisson)'}; _3 = {'(Binomial)'}; _4 = {'(Gamma)'}; R1 = {'Variance:'}; %IF (&vari<1 | &vari>4) %THEN %DO; PRINT &vari [ROWNAME = R1 FORMAT = 2.0] {'(Invalid Option !!!)'}; %END; %ELSE %DO; PRINT &vari [ROWNAME = R1 FORMAT = 2.0] _&VARI ; %END; %IF &VARI = 3 %THEN %DO; %IF (%length(&N)^= 0) %THEN %DO; labeln = { &N }; read all var labeln into den; R1 = {'Denominator'}; PRINT labeln [ROWNAME = R1]; %END; %ELSE %DO; den = 1; %END; %END; %ELSE %DO; den = 1; %END; R1 = {'Correlation:'}; %IF &corr = 1 %THEN %DO; %IF %length(&R) = 0 %THEN %DO; print &corr [ROWNAME = R1 FORMAT = 2.0] {'(Independent)'}; %END; %ELSE %DO; r = { &R }; a = int(sqrt(ncol(r))); R = shape(r,a,a); PRINT &corr [ROWNAME = R1 FORMAT = 2.0] {'(R given):'}, r [FORMAT = 4.2]; %END; %END; %ELSE %IF &corr = 2 %THEN %DO; R2 = {'(Stationary'}; print &corr [ROWNAME = R1 FORMAT = 2.0] &m [ROWNAME = R2 FORMAT = 2.0] {'- dependent)'}; %END; %ELSE %IF &corr = 3 %THEN %DO; R2 = {'(NonStationary'}; print &corr [ROWNAME = R1 FORMAT = 2.0] &m [ROWNAME = R2 FORMAT = 2.0] {'- dependent)'}; %END; %ELSE %IF &corr = 4 %THEN %DO; PRINT &corr [ROWNAME = R1 FORMAT = 2.0] {'(Exchangeable)'}; %END; %ELSE %IF &corr = 5 %THEN %DO; R2 = {'(AR -'}; print &corr [ROWNAME = R1 FORMAT = 2.0] &m [ROWNAME = R2 FORMAT = 2.0] {')'}; %END; %ELSE %IF &corr = 6 %THEN %DO; print &corr [ROWNAME = R1 FORMAT = 2.0] {'(Unspecified)'}; %END; %ELSE %DO; PRINT &CORR [ROWNAME = R1 FORMAT = 2.0] {'(Invalid Option !!!)'}; %END; finish; /* INFO */ *****************************************************************; start desc(ni, k, maxn, minn, nobs); use &DATA; SETIN &DATA NOBS nobs; summary var {&YVAR &XVAR } class {&ID} stat {mean} opt {noprint save}; close &DATA; k = nrow(_nobs_); maxn = MAX(_nobs_); minn = MIN(_nobs_); ni = _nobs_; R1 = {'Total number of records read:'}; PRINT nobs [ROWNAME = R1 FORMAT = 8.0]; return; /* skip printing descriptive stats - BQ */ summary var {&YVAR &XVAR } stat {mean} opt {noprint save}; close &DATA; create c var{&YVAR &XVAR}; append; list; free &yvar &xvar; read var {&YVAR &XVAR} into ovmean; close c; create e var{&YVAR &XVAR}; append; free &yvar &xvar; summary var {&XVAR &YVAR } stat {mean} opt {noprint save}; close e; create f var{&YVAR &XVAR}; append; list; free &yvar &xvar; read var {&YVAR &XVAR} into clmean; close f; ovmean = ovmean//clmean; C1 = { &YVAR &XVAR }; R1 = {'Total number of clusters:'}; PRINT k [ROWNAME = R1 FORMAT = 5.0]; R1 = {'Maximum and minimum cluster size:'}; R2 = {'and'}; PRINT maxn [ROWNAME = R1 FORMAT = 5.0] minn [ROWNAME = R2 FORMAT = 5.0]; R1 = {'Observations: ','Cluster Means:'}; PRINT 'Averages of Outcome variable and Covariates (over all)',, ovmean [ROWNAME = R1 COLNAME = C1] , ; finish; /* desc */ *****************************************************************; start Firstb(beta,xwx,ad,yvar,xvar,den,p,labelx,nobs); %IF &VARI = 3|&link = 3 %THEN %DO; ui = choose(mod(yvar,1) = 0,(1-yvar)#0.1 + yvar#0.9,yvar); %END; %ELSE %IF &VARI = 2|&link = 2 %THEN %DO; ui = choose(yvar<= 0,.01,yvar); %END; %ELSE %IF &VARI = 4|&link = 4 %THEN %DO; ui = choose(yvar = 0,.01,yvar); %END; %ELSE %DO; ui = yvar; %END; %IF &link = 1 %THEN %DO; lp = ui; %END; %ELSE %IF &link = 2 %THEN %DO; lp = log(ui); %END; %ELSE %IF &link = 3 %THEN %DO; lp = log(ui/(den-ui)); %END; %ELSE %IF &link = 4 %THEN %DO; lp = 1/ui; %END; %IF (%length(&beta) = 0) %THEN %DO; vu = varfun(ui,den,nobs); run glim(beta,xwx,ad,yvar,xvar,den,offset,lp,ui,vu,nobs); %END; %ELSE %DO; beta = { &BETA }; beta = SHAPE(beta,p,1,0); vu = varfun(ui,den,nobs); /* %IF &link^= &vari %THEN %DO; di = deri(ui,den); %END; */ di = deri(ui,den,nobs); ad = canon(vu,di); adx = xvar#ad#ad; xwx = xvar`*adx; %END; %IF %upcase((&monitor) = yes) %THEN %DO; PRINT / 'Initial estimate of regression coefficients:', labelx beta; %END; finish; /* firstb */ *****************************************************************; start CONSCHK(ok,r0,T); /* Consistency Check */ %IF &CORR = 1 %THEN %DO; %IF %length(&R) = 0 %THEN %DO; r0 = I(T); %END; %ELSE %DO; r = { &R }; a = sqrt(ncol(r)); b = int(a); if a^= b then do; print "Given correlation matrix not square"; ok = 0; end; else do; r0 = shape(r,a,a); tr0 = r0`; if any(r0^= tr0) then do; print "Given correlation matrix not symmetric"; ok = 0; end; else if ncol(R0)^= T then do; print "Dimension of given correlation matrix is not compatible with data"; ok = 0; end; else do; end; end; %END; %END; %ELSE %IF &CORR = 2|&CORR = 3 %THEN %DO; If T<= &m then do; print "Value of M for dependence is too large"; ok = 0; end; %END; %ELSE %IF &corr = 5&(%length(&TIME)^= 0) %THEN %DO; print "Time variable not allowed as an option for AR-1"; ok = 0; %END; %ELSE %DO; ok = 1; %END; finish; /* conschk */ *****************************************************************; start init; ok = 1; %IF %length(&YVAR) = 0 %THEN %DO; print "Y variable has not been specified"; ok = 0; %END; %IF %length(&XVAR) = 0 %THEN %DO; print "Covariates have not been specified"; ok = 0; %END; %IF %length(&ID) = 0 %THEN %DO; print "Cluster ID has not been specified"; ok = 0; %END; RUN info(offset,den); RUN desc(ni,k,maxn,minn,nobs); %IF (%length(&TIME) = 0) %THEN %DO; time = 0; T = maxn; %END; %ELSE %DO; read all var {&TIME} into time; T = ncol(unique(time)); %END; RUN conschk(ok,r0,T); finish; /* init */ *****************************************************************; start FINDINV(A); B = half(A); /* intentional: will bomb if A is not +ve semidefinite */ C = ginv(B); AINV = C*C`; return(AINV); finish; /* findinv */ *****************************************************************; start glim(newb,xwx,w,y,x,den,offset,lp,ui,vu,nobs); /* values of linear predictor (lp), mean (ui) and variance function (vu) were computed in estr module */ di = deri(ui,den,nobs); z = lp+(1/di)#(y-ui); w = canon(vu,di); wx = x#w#w; wz = z#w#w; xwx = x`*wx; xwz = x`*wz; inxwx = findinv(xwx); newb = inxwx*xwz; * newb = solve(xwx,xwz); finish; /* GLIM */ *****************************************************************; start ESTB(newb,F,ad,y,x,den,time,R,ni,K,nobs,p,lp,ui,vu); /* values of linear predictor (lp), mean (ui) and variance function (vu) were computed in estphi module */ di = deri(ui,den,nobs); z = lp+(1/di)#(y-ui); ad = canon(vu,di); nx = 0; F = J(p,p,0); G = J(p,1,0); do h = 1 to K; nj = ni[h]; if nj = 1 then rj = 1; %IF (%length(&TIME) = 0) %THEN %DO; else if time = 0 then rj = invcorr(R[1:nj,1:nj]); %END; %ELSE %DO; else do; timej = time[nx+1:nx+nj]; rj = invcorr(R[timej,timej]); end; %END; xj = x[nx+1:nx+nj,]; zj = z[nx+1:nx+nj]; adj = ad[nx+1:nx+nj]; adxj = xj#adj; adzj = zj#adj; xwxj = adxj`*rj*adxj; xwzj = adxj`*rj*adzj; F = F+XWXj; G = G+XWZj; nx = nx+nj; end; newb = solve(F,G); finish; /* Estb */ *****************************************************************; start estrphi(R,alp,phi,pearchi,devi,lp,ui,vu,oldb,oldr,F,ad,y,x, den,offset,nobs,p,fixscal,ni,K,T,time,maxn,nsqsum); lp = x*oldb; ui = linkid(lp,den,offset); resid = y-ui; vu = varfun(ui,den,nobs); rit = resid/sqrt(vu); /* standardized residuals: input to estr */ pearchi = SSQ(rit); /* psuedo pearson chi-square */ devi = devi(ui,den,resid,y); /* pseudo deviance */ hf = half(F); hv = ginv(hf); nx = 0; sumg2 = 0; nsumg2 = 0; sumrit2 = 0; do hh = 1 to k; nj = ni[hh]; sumrit2 = sumrit2+(sum(rit[nx+1:nx+nj])##2); /* determine rj */ if nj = 1 then rj = 1; %IF (%length(&TIME) = 0) %THEN %DO; else if time = 0 then rj = invcorr(oldR[1:nj,1:nj]); %END; %ELSE %DO; else do; timej = time[nx+1:nx+nj]; rj = invcorr(oldR[timej,timej]); end; %END; c = X[nx+1:nx+nj,]*hv; q = c*c`; adjt = ad[nx+1:nx+nj]`; G = rj#adjt; H = q#adjt; g2 = G[+,]*H[,+]; sumg2 = sumg2+g2; nsumg2 = nsumg2+(nj#g2); nx = nx+nj; end; /* end do loop over clusters */ %IF &corr = 1 %THEN %DO; if fixscal = 1 then phi = 1; else do; %IF (%length(&scale) = 0) %THEN %DO; phi = pearchi/(nobs-p); %END; %ELSE %DO; phi = &scale; %END; end; alp = 0; %END; %ELSE %IF &corr = 4 %THEN %DO; nx = 0; top = 0; bot = 0; do h = 1 to K; nj = ni[h]; ej = rit[nx+1:nx+nj]; top = top + (sum(ej*ej`)-ssq(ej))/2; bot = bot+nj#(nj-1)/2; nx = nx+nj; end; if fixscal = 1 then phi = 1; else do; %IF (%length(&scale) = 0) %THEN %DO; phi = pearchi/(nobs-p); %END; %ELSE %DO; phi = &scale; %END; end; alp = (top/(bot-p))/phi; %END; %IF &corr = 4 %THEN %DO; if alp<= 1/(1-maxn) then do; print, 'estimated alpha is out of range: est. alpha = ', alp; alp = .99/(1-maxn); print, 'alpha is set to ' alp; end; else if alp >= 1 then do; print, 'estimated alpha is out of range: est. alpha = ', alp; alp = .99; print, 'alpha is set to .99'; end; %END; r = xch (T, alp); %IF %upcase((&monitor) = yes) %THEN %DO; print / 'Scale is ' phi, 'Pseudo deviance ' devi, 'Pseudo chi-square ' pearchi, 'Working exchangeable correlation: ' alp; * 'Working correlaton matrix: ',, R; %END; finish; /* estrphi */ *****************************************************************; start xch (n, rho); * returns n x n EXCH correlation matrix; * r = correlation; auto = {1} || j(1, n-1, rho); return (toeplitz(auto)); finish; *****************************************************************; start xchinv (n, rho); * returns the inverse of n x n EXCH correlation matrix; * rho = correlation; c = 1/(1-rho); d = -rho/(1+rho*(n-1)); auto = (c + c*d) || j(1, n-1, c*d); return (toeplitz(auto)); finish; *****************************************************************; /* find inverse of correlation matrix, n > 1 assumed */ start invcorr(a); %IF &corr = 4 %THEN %DO; /* exchangeable */ ainv = xchinv(ncol(a), a[1,2]); %END; %ELSE %DO; ainv = findinv(a); %END; return(ainv); finish; *****************************************************************; start varian(R,alp,phi,lp,ui,vu,oldb,oldr,F,ad,y,x, labelx,den,offset,nobs,p,fixscal,ni,K,T,time,maxn,nsqsum); lp = x*oldb; ui = linkid(lp,den,offset); di = deri(ui,den,nobs); ei = y-ui; vu = varfun(ui,den,nobs); ad = canon(vu,di); sei = ei/sqrt(vu); /* standardized residuals: input to estr */ pearchi = SSQ(sei); /* psuedo pearson chi-square */ devi = devi(ui,den,ei,y); /* pseudo deviance */ hf = half(F); hv = ginv(hf); nx = 0; sumg2 = 0; nsumg2 = 0; sumrit2 = 0; xwx = J(p,p,0); m1 = J(p,p,0); /* estimate correlation, scale, and XWX again */ do hh = 1 to k; nj = ni[hh]; sumrit2 = sumrit2+(sum(sei[nx+1:nx+nj])##2); /* determine rj */ if nj = 1 then rj = 1; %IF (%length(&TIME) = 0) %THEN %DO; else if (time = 0) then rj = invcorr(oldR[1:nj,1:nj]); %END; %ELSE %DO; else do; timej = time[nx+1:nx+nj]; rj = invcorr(oldR[timej,timej]); end; %END; xj = x[nx+1:nx+nj,]; adj = ad[nx+1:nx+nj]; adxj = xj#adj; XWX = XWX+adxj`*rj*adxj; seij = sei[nx+1:nx+nj]; mj = adxj`*rj*seij; m1 = m1+mj*mj`; c = xj*hv; q = c*c`; G = rj#adj`; H = q#adj`; g2 = G[+,]*H[,+]; sumg2 = sumg2+g2; nsumg2 = nsumg2+(nj#g2); nx = nx+nj; end; /* end do loop over clusters */ %IF &corr = 1 %THEN %DO; if fixscal = 1 then phi = 1; else do; %IF (%length(&scale) = 0) %THEN %DO; phi = pearchi/(nobs-p); %END; %ELSE %DO; phi = &scale; %END; end; alp = 0; %END; %ELSE %IF &corr = 4 %THEN %DO; nx = 0; top = 0; bot = 0; do h = 1 to K; nj = ni[h]; ej = sei[nx+1:nx+nj]; top = top + (sum(ej*ej`)-ssq(ej))/2; bot = bot+nj#(nj-1)/2; nx = nx+nj; end; if fixscal = 1 then phi = 1; else do; %IF (%length(&scale) = 0) %THEN %DO; phi = pearchi/(nobs-p); %END; %ELSE %DO; phi = &scale; %END; end; alp = (top/(bot-p))/phi; %END; %IF &corr = 4 %THEN %DO; if alp<= 1/(1-maxn) then do; print, 'estimated alpha is out of range: est. alpha = ', alp; alp = .99/(1-maxn); print, 'alpha is set to ' alp; end; else if alp >= 1 then do; print, 'estimated alpha is out of range: est. alpha = ', alp; alp = .99; print, 'alpha is set to .99'; end; else do; end; %END; r = xch (T, alp); %IF %upcase((&monitor) = yes) %THEN %DO; print / 'Scale is ' phi, 'Pseudo deviance ' devi, 'Pseudo chi-square ' pearchi, 'Working exchangeable correlation: ' alp; * 'Working correlaton matrix: ',, R; %END; /* calculate naive and robust variance estimates */ inxwx = FINDINV(XWX); nvar = inxwx#phi; rvar = inxwx*m1*inxwx; /***************************************/ /** calculate influence diagnostics **/ /***************************************/ dfbetcls = J(k,p,0); /* dfbeta for a cluster */ cookdcls = J(k,1,0); /* cookd for a cluster */ gcls = J(k,1,0); /* lack of fit for a cluster */ trqwcls = J(k,1,0); /* trace of qw for a cluster */ %IF %length(&OBSOUT)^= 0 %then %DO; i = J(nobs,1,0); /* cluster indicator starting with 1 */ ij = J(nobs,1,0); /* observation indicator starting with 1 */ nij = J(nobs,1,0); /* cluster size for observation data set */ dfbetobs = J(nobs,p,0); /* dfbeta for an observation */ cookdobs = J(nobs,1,0); /* cookd for an observation */ qwobs = J(nobs,1,0); /* diagonal element of qw for an observation */ %END; nx = 0; ijx = 0; do hh = 1 to k; nj = ni[hh]; xj = x[nx+1:nx+nj,]; adj = ad[nx+1:nx+nj]; %IF (%length(&TIME) = 0) %THEN %DO; timej = 1:nj; %END; %ELSE %DO; timej = time[nx+1:nx+nj]; %END; if nj = 1 then do; rj = 1; wj = adj*adj; vj = 1/wj; end; else do; rj = invcorr(R[timej,timej]); wj = diag(adj)*rj*diag(adj); vj = (diag(1/adj))*R[timej,timej]*(diag(1/adj)); end; qj = xj*inxwx*xj`; qw = qj*wj; trqwcls[hh] = trace(qw); eicls = ei[nx+1:nx+nj]; DE = eicls/di[nx+1:nx+nj]; projinv = ginv(I(nj)-qw); invarei = wj*projinv; hd = xj`*invarei*DE; difbetj = inxwx*hd; Dfbetcls[hh,] = difbetj`; ckj = (hd`*difbetj)/(p#phi); cookdcls[hh] = ckj; gi = eicls`*invarei*eicls/phi; gcls[hh] = gi; %IF %length(&OBSOUT)^= 0 %then %DO; if nj> 1 then do; do ii = 1 to nj; ij[nx+ii] = ijx+1; nij[nx+ii] = nj; i[nx+ii] = hh; comp = timej[loc(timej^= ii)]; vii = vj[comp,ii]; wii = wj[comp,ii]; Wti = wj[comp,comp]-(wii*wii`)/wj[ii,ii]; WV = wti*vii; xtilt = xj[ii,]`-xj[comp,]`*WV; qtil = xtilt`*inxwx*xtilt; stil = vj[ii,ii]-vii`*WV; DEtil = DE[ii]-DE[comp]`*WV; hdobs = xtilt*DEtil/(stil-qtil); difbetii = inxwx*hdobs; Dfbetobs[nx+ii,] = Difbetii`; cookdobs[nx+ii] = (hdobs`*difbetii)/(p#phi); qwobs[nx+ii] = qw[ii,ii]; ijx = ijx+1; end; end; else do; ij[nx+1] = ijx+1; nij[nx+1] = nj; i[nx+1] = hh; qwobs[nx+1] = trqwcls[hh]; dfbetobs[nx+1,] = difbetj`; cookdobs[nx+1] = cookdcls[hh]; ijx = ijx+1; end; %END; nx = nx+nj; end; %IF %upcase((&monitor)^= yes) %THEN %DO; print 'Working exchangeable correlation is ' alp; *print 'Working correlation matrix: ', R; %END; /* print 'naive variance estimate is',, nvar; print 'robust variance estimate is',, rvar; */ nvardiag = vecdiag(nvar); rvardiag = vecdiag(rvar); nvardiag = choose(nvardiag<0,0,nvardiag); rvardiag = choose(rvardiag<0,0,rvardiag); ns = sqrt(nvardiag); rs = sqrt(rvardiag); rbeta = oldb/rs; %IF (%length(&NCOVOUT) ^= 0) %THEN %DO; nbetvar = oldb||ns||nvar; variable = { &XVAR }; C1 = { 'estimate' 'naive_se' &XVAR }; create &NCOVOUT from nbetvar [rowname = variable colname = C1]; setout &NCOVOUT; append from nbetvar [rowname = variable colname = C1]; %END; %IF (%length(&RCOVOUT) ^= 0) %THEN %DO; rbetvar = oldb||rs||rvar; variable = { &XVAR }; C1 = { 'estimate' 'robust_se' &XVAR }; create &RCOVOUT from rbetvar [rowname = variable colname = C1]; setout &RCOVOUT; append from rbetvar [rowname = variable colname = C1]; %END; C1 = { 'Estimate' }; C2 = { 's.e.-Naive' }; C3 = { 'z-Naive' }; C4 = { 's.e.-Robust'}; C5 = { 'z-robust'}; Print/ 'Estimate, s.e. and z-score:',, labelx oldb [colname = C1 Format = 12.4] ns [colname = C2 Format = 12.4] rs [colname = C4 Format = 12.4] rbeta [colname = C5 Format = 9.3],; print 'Scale parameter is ' phi, 'Pseudo deviance ' devi, 'Pseudo chi-square ' pearchi; USE &DATA; SETIN &DATA; read all var {&ID} into id; id = char(id); %IF %length(&TIME) = 0 %THEN %DO; %END; %ELSE %DO; time = char(time); id = id||time; %END; %IF (%length(&OBSOUT) ^= 0) %THEN %DO; C1 = { I IJ NI FIT RES SRES QWOBS COOKDOBS &XVAR }; xvar = dfbetobs; obout1 = i||ij||nij||ui||ei||sei||qwobs||cookdobs||xvar; create &OBSOUT from obout1 [rowname = id colname = c1] ; setout &OBSOUT; append from obout1 [rowname = id colname = c1]; %END; id = (unique(id))`; %IF (%length(&CLSOUT) ^= 0) %THEN %DO; I = (1:k)`; C1 = { I NI TRQWCLS COOKDCLS GCLS &XVAR }; xvar = dfbetcls; clout2 = i||ni||trqwcls||cookdcls||gcls||xvar; create &CLSOUT from clout2 [rowname = id colname = c1] ; setout &CLSOUT; append from clout2 [rowname = id colname = c1]; %END; C1 = {id}; C2 = {ni}; c4 = {trqwcls}; c5 = {cookdcls}; c6 = {gcls}; c7 = {dfbetcls}; * print/ 'Leverage, residuals, and influence for clusters',, id [colname = C1] ni [colname = C2 format = 6.0] trqwcls [colname = C4 format = 8.4] cookdcls [colname = C5 format = 8.4] gcls [colname = C6 format = 10.4] dfbetcls [colname = C7 format = 8.4],; Print ,, "(c) J. S. Preisser, 2003" , "Department of Biostatistics " , "University of North Carolina " , "Chapel Hill, N.C."; finish; /* varian */ *****************************************************************; start main; /** Main iteration **/ USE &DATA; SETIN &DATA; run init; if (^ok) then do; print "Macro halted"; return; end; USE &DATA; SETIN &DATA; m = { &M }; labelx = { &XVAR }`; read all var {&YVAR} into yvar; read all var labelx into xvar; read all var {&id} into id; p = NROW(labelx); run firstb(beta,F,ad,yvar,xvar,den,p,labelx,nobs); pearchi = -5; crit = 1; /* oldr = I(T); */ oldr = xch (T, 0.7); fixscal = 0; nsqsum = sum(ni##2); /* evaluated once for estrphi module */ %IF &VARI = 3 %THEN %DO; ncfix = ncol(unique(yvar)); if ncfix = 2&(max(yvar) = 1)&(min(yvar) = 0)&(den = 1) then do; print 'binary responses: scale parameter is fixed at 1'; fixscal = 1; end; %END; DO iter = 1 to &ITER WHILE(crit>&CRIT); prevpear = pearchi; free R phi pearchi devi lp ui vu; run estrphi(R,alp,phi,pearchi,devi,lp,ui,vu,beta,oldr,F,ad,yvar,xvar, den,offset,nobs,p,fixscal,ni,K,T,time,maxn,nsqsum); %IF %upcase((&monitor) = yes) %THEN %DO; R1 = (' Iteration:'); Print iter [ROWNAME = R1 FORMAT = 3.0]; %END; free beta F w; %IF &corr = 1&(%length(&R) = 0) %THEN %DO; run glim(beta,F,ad,yvar,xvar,den,offset,lp,ui,vu,nobs); %END; %ELSE %DO; run estb(beta,F,ad,yvar,xvar,den, time,R,ni,K,nobs,p,lp,ui,vu); %END; %IF %upcase((&monitor) = yes) %THEN %DO; print / 'Estimate:',, labelx beta [Format = 11.3],; %END; crit = ABS(prevpear-pearchi); oldr = R; end; iter = iter-1; IF (iter>= &ITER) THEN PRINT / {'No Convergence after'} {&ITER} [FORMAT = 3.0] {'iterations.'}; ELSE PRINT ' '/ {'Convergence after'} iter [FORMAT = 3.0] {'iteration(s).'}; run varian(R,alp,phi,lp,ui,vu,beta,oldr,F,ad,yvar,xvar, labelx,den,offset,nobs,p,fixscal,ni,K,T,time,maxn,nsqsum); Finish; /* main */ *****************************************************************; run main; %MEND GEE;