/**************************************************************************************

geecorr1.03.macro

Modified by John Preisser and Bing Lu based on Richard Conrad Zink's GEECORR SAS macro 
for binary data, logit link for marginal mean model, and identity link for linear correlation
model based upon GEE procedure of Prentice (1988), Biometrics 44, 1033-1048.   
The procedure provides parameter (beta) estimates for a model for the logit of the mean 
in terms of covariates (X) in addition to parameter (alpha) estimates for a linear model 
(identity link) for pairwise within-cluster correlations as a function of covariates (Z). 
Model-based and empirical (robust) standard errors are provided for the beta estimates, and
empirical standard errors are provided for the alpha estimates.

Version 1.02 of the macro has the following including extensions to alpha::
1. Bias-corrected covariance estimators (alpha and beta) extending Mancl and DeRouen (2001) as described in
Lu, Preisser et al. (submitted)
2. Regression diagnostics for clusters and observations (alpha and beta) extending Preisser and Qaqish (1996)
by Preisser and Perin (submitted)

This version (1.03) of the macro has the following:
1. Correction to standardization of Cooks Distance (inverse taken of covariance matrix);
2. added leverage (h_it) and residual (r_it) to output data set for predicted probabilities (probout);
3. More detail printed for range violations.

*****  TECHNICAL NOTE ON THE ESTIMATION PROCEDURE  *****

The fitting algorithm used is not (17) of Prentice (1988) but rather the "more detailed
iterative procedure" referred to in that paper based upon (15).  Like (17), the estimation
equation for beta is
    beta_{s+1} = beta_s + A*U_{beta}
For alpha, however, the detailed procedure gives
    alpha_{s+1} = alpha_s + (B*U_{beta} + C*U_{alpha})
in contrast to (17) which fixes B=0.

Equation (15) provides a robust variance estimator based upon use of 
   \partial Z_i/ \partial beta  
in B.  As shown at the top of page 1040, this is a random quantity.  
Instead, this program uses the expected information matrix based 
   -E(\partial Z_i/ \partial beta)
The row of this matrix corresponding to the (j,k) pair in cluster i is
    rho_{ijk}/2 [(1-2*mu_{ij})x'_{ij} + (1-2*mu_{ik})x'_{ik}].

The macro uses these modifications in the fitting algorithm and variance estimators for
the estimating equations given in (14) of Prentice:
    U_{beta} = \sum_i D_i`*inv(V_i)*(Y_i - p_i) and
    U_{alpha} = \sum_i (E)`*diag(VEE_i)*(Z_{i} - \delta_{i})


*****  GENERAL COMMENTS ON USING THE SOFTWARE  *****

1. All inputs to macro are required (except makevone).  These are:


    xydata: The data set containing the outcome and marginal mean covariates 
      yvar: The 0/1 binary outcome in xydata
      xvar: A list of marginal mean covariates in xydata. Must all be numeric
        id: Cluster ID (consecutive integers) in xydata
     zdata: The data set containing the covariates for the marginal correlation regression
      zvar: A list of marginal correlation covariates in zdata. Must all be numeric 
     wdata: The data set containing cluster weights 
      wvar: The weight variable in wdata (numeric, i.e. cluster frequencies)

    clsout: output data set of cluster deletion diagnostics for beta and alpha
    obsout: output data set of observation deletion diagnostics for beta and alpha
   probout: output data set of predicted probabilities of Y=1 for each observation

   maxiter: Maximum number of iterations for Fisher Scoring
   epsilon: Tolerance for convergence
printrange: Print details of range violations? (Should initially check) (YES, NO)
            Range is always checked, however. Range is always checked and printed
            when variance computed.
    shrink: THETA or ALPHA. See Readme file for details.
  makevone: Option sets VEE = 1 if makevone=YES instead of w_ijk from Prentice (1988). 
            default.  See Sharples and Breslow (1992).

2. ID should go from 1 to K, where K is the number of clusters. Data should
   be sorted by ID before calling the macro. This goes for the W, Z, and XY
   datasets.  (IS 1 to K needed?)

3. ID is really only used to get sample size for each cluster before being 
   dropped. It isn't really needed as "i", which is the indicator for cluster
   in loops goes from 1 to K.

4. XYdata has 1 row for each observation so the matrix (vector) X (y) will be
   of dimension [\sum_{i=1}^K n_i] x p ([\sum_{i=1}^K n_i] x 1). Note 
   that p is number of beta parameters, and q is number of alpha parameters.

5. Zdata has one observation for all pairs (j,k) where j<k. Say cluster 1 has 5 obs,
   then the first 10 obs of Z should be for the pairs (1,2) (1,3)... (1,5) (2,3)...
   (3,5), (4,5). Then the next cluster occurs and so on. The matrix Z should be
   [\sum_{i=1}^K {n_i \choose 2}] x q. However, if there are clusters of size 1, these
   should not have rows within the Z matrix. For example, say I have 3 clusters of 
   size 4, 1, 3. Then the first 6 rows of Z (which will be 9 x q) will be for cluster 1, 
   and the last 3 will be for cluster 3. The BEGINEND function accounts for how to pick 
   off the y, X and Z for each cluster.

6. Wdata contains the frequency of a particular cluster. In some data sets, each 
   cluster may occur multiple times. In this case, the variable count represents how
   many times it occurs. In other data sets, each cluster is unique, so each value 
   is a "1".  (Make sure that you take the y, X AND Z into account before defining 
   a "wvar"). This dataset has K rows, 1 for each unique combo of the vector y, X AND Z.

7. The option makevone is based upon a suggestion of Sharples & Breslow (1992). If 
   makevone=YES then w_ijk = 1 instead of the usual formula found in Prentice (1988) 
   that is implemented when makevone=NO. No changes occur in the estimating equations 
   other than w_ijk = 1. 


*****  CONVERGENCE, POSITIVE-DEFINITIVENESS, AND RANGE VIOLATIONS  *****

A. Range violations can occur. See Prentice (1988). If this happens, the parameters are 
   shrunk back.  The software allows for a max of 20 shrinks for each iteration (max_modi 
   in FITORTH). Cluster and pair identifier as well as iteration and shrink number are printed 
   if printrange = YES. Warnings are always printed when computing the variance at the end 
   with convergence regardless of printrange value. Ranges are checked for all (j,k) pairs. 
   As soon as a violation is detected, shrinking occurs.  So only the first violation is printed. 
   Shrinking may fix the first violation, but others may occur later on down the line. It is 
   recommended to always check for range violations.

B. Code will inform user of lack of convergence, which may or may not occur with range problems.

C. Code checks for positive-definiteness of var(Y_i) for each cluster when the robust and naive 
   variance matrices are computed when convergence occurs. If var(Y_i) is non-PD, this is an 
   indication that a joint distribution does not exist for the particular cluster with the given 
   parameter estimates. Note that compliance with range restrictions is a necessary but not 
   sufficient condition for the existence of a joint distribution. But this is the best that 
   can be done presently.

D. Code has an option for "shrink". For both options for iteration 1, alpha is set to 0 if 
   out of range. From then on, THETA does theta = theta - (0.5)**(n_modi+1) * delta and ALPHA 
   does alpha = 0.95 * alpha. Could lead to slightly different solutions if range issues occur.


**************************************************************************************/

%macro PRENTICEcorrJSP(xydata=, yvar=, xvar=, id=, zdata=, zvar=, wdata=, wvar=, CLSOUT=, OBSOUT=, PROBOUT=, maxiter=, epsilon=, printrange=, shrink=, makevone=, startbeta=, startalpha=); 

   proc means noprint data = &xydata;
      var &yvar;
      by &id;
      output out = _sample(keep = n) n = n;
      run;

   proc iml;
   varnames={&xvar &zvar};

*   worksize=30000;  /* kilobytes */
*   symsize=256;   
/**************************************************************************************

MODULE: GRABDATA
Reads in the data for analysis

OUTPUT
      y: The 0/1 binary outcome 
      X: Marginal mean covariates
     id: Cluster identifier
      n: Vector of cluster sample sizes
   nobs: Scalar with the number of observations
      Z: Marginal correlation covariates
      w: Weights
maxiter: Max number of iterations
epsilon: Tolerence for convergence

**************************************************************************************/

   start GRABDATA(y, X, id, n, Z, w, maxiter, epsilon, nobs);
      use &xydata NOBS nobs;
         read all var{&yvar} into y;
         read all var{&xvar} into X;
         read all var{&id} into id;
         close &xydata;

      use _sample;
         read all var{n} into n;
         close _sample;

      use &zdata;
         read all var{&zvar} into Z;
         close &zdata;

      use &wdata;
         read all var{&wvar} into w;
         close &wdata;

      maxiter = &maxiter;
      epsilon = &epsilon;
      finish GRABDATA;

/**************************************************************************************

MODULE: BEGINEND
Creates two vectors that have the start and end points for each cluster

INPUT
n: Vector of cluster sample sizes

OUTPUT
first: Vector with starting row for cluster i
 last: Vector with ending row for cluster i

**************************************************************************************/

   start BEGINEND(first, last, n);
      last = cusum(n);
      first = last - n + 1;
      finish BEGINEND;

/**************************************************************************************

MODULE: FITPRENTICE
Performs Prentice (1988) Method

INPUT
      y: The 0/1 binary outcome
      X: Marginal mean covariates
      Z: Marginal correlation covariates
      n: Vector of cluster sample sizes
      w: Weights
maxiter: Max number of iterations
epsilon: Tolerence for convergence

OUTPUT
    beta: A p x 1 vector of marginal mean parameter estimates  
   alpha: A q x 1 vector of marginal correlation parameter estimates  
  robust: Robust covariance matrix for beta and alpha
   naive: Naive (Model-Based) covariance matrix for beta
   niter: Number of iterations required for convergence
converge: Did the algorithm converge (0 = no, 1 = yes)
   
**************************************************************************************/

   start FITPRENTICE(beta, alpha, robust, naive, bias, niter, converge, y, X, Z, n, w, maxiter, epsilon, id, nobs, startbeta, startalpha);

      p = ncol(X);
      q = ncol(Z);
      delta = J(p+q, 1, 2*epsilon);
      max_modi = 20;
      converge = 0;
      
      run INITVALS(beta, alpha, y, X, n, w, q, startbeta, startalpha);

      do niter = 1 to maxiter while (max(abs(delta)) > epsilon);
         n_modi = 0;
         do until(n_modi > max_modi | rangeflag = 0);
            run SCORE(U, UUtran,ustar, beta, alpha, y, X, Z, n, w, p, q, 0, rangeflag, niter);
            if rangeflag = 1 then do;
               %if %upcase(&shrink) = THETA %then %do;
                  if niter = 1 then alpha = J(q, 1, 0);
                     else do;
                        theta = theta - (0.5)**(n_modi+1) * delta;
                        beta = theta[1:p];
                        alpha = theta[p+1:p+q];
                        end;
                  %end; 
                  %else %if %upcase(&shrink) = ALPHA %then %do;
                     if niter = 1 then alpha = J(q, 1, 0);
                        else do;
                           alpha = 0.95 * alpha;
                           end;
                     %end;
               n_modi = n_modi + 1;
               %if %upcase(&printrange) = YES %then %do; 
                  print "Iteration and Shrink Number", niter n_modi;
                  %end;
               end;
            end;
         %if %upcase(&printrange) = YES %then %do; 
            if n_modi > max_modi then do;
               print "n_modi too great, more than 20 shrinks";
               goto theend;
               end;
            %end;
         theta = beta // alpha;
         delta = solve(Ustar, U);
         theta = theta + delta;
         beta = theta[1:p];
         alpha = theta[p+1:p+q];
         converge = (max(abs(delta)) <= epsilon); 
         end;
         
      niter = niter - 1;

      if converge = 1 then run MAKEVAR(robust, naive, bias, beta, alpha, y, X, Z, n, w, p, q, id, nobs, niter);
	

      theend:finish FITPRENTICE;

/**************************************************************************************

MODULE: INITVALS
Generates initial values for beta and alpha. Uses 0.1 for 
initial value for alphas. Beta uses approximate parms from a logistic
regression on all observations, treating them independently

INPUT
y: The 0/1 binary outcome
X: Marginal mean covariates
n: Vector of cluster sample sizes
w: Weights
q: Number of marginal correlation parameters

OUTPUT
 beta: Vector of marginal mean parameters
alpha: Vector of marginal correlation parameters

**************************************************************************************/

   start INITVALS(beta, alpha, y, X, n, w, q,startbeta,startalpha);
      alpha = J(q, 1, 0.01);

	/* User specified inital values */
	%IF %length(&startbeta)^= 0 %then %do;
		use &startbeta;
		read into startb;
		beta=startb`; 
	%end;
	%else %do;
	      run INITBETA(beta, y, X, n, w);
	%end;

	%IF %length(&startalpha)^= 0 %then %do;
	use &startalpha; 
	read into starta;
	alpha=starta`;
	%end;
	*print beta;
	*print alpha;

      finish INITVALS;

/**************************************************************************************

MODULE: INITBETA
Generates initial values for beta. Approximates logistic regression

INPUT
y: The 0/1 binary outcome
X: Marginal mean covariates
n: Vector of cluster sample sizes
w: Weights

OUTPUT
beta: Vector of marginal mean parameters

**************************************************************************************/

   start INITBETA(beta, y, X, n, w);
      run EXPANDW(obsw, n, w);
      z = y + y - 1;
      beta = solve(X` * (X # obsw), X` * (z # obsw));
      
      do i = 1 to 2;
         u = X * beta;
         u = 1 / (1 + exp(-u));
         v = u # (1 - u) # obsw;
         z = X` * ((y - u) # obsw);
         d = solve(X` * (X # v), z);
         beta = beta + d;
         end;
      finish INITBETA; 

/**************************************************************************************

MODULE: EXPANDW
Generates an observed weight vector

INPUT
n: Vector of cluster sample sizes
w: Weights

OUTPUT
obsw: Vector of observed weights

**************************************************************************************/

   start EXPANDW(obsw, n, w);
      obsw = J(sum(n), 1, 0);
      run BEGINEND(first, last, n);
      
      do i = 1 to nrow(n);
         obsw[first[i]:last[i]] = w[i]; 
         end;
      finish EXPANDW;

/**************************************************************************************

MODULE: NCHOOSE2
For a value n, returns n choose 2

INPUT
n: Vector of cluster sample sizes

OUTPUT
n choose 2

*************************************************************************************/
   start NCHOOSE2(n);
      return((n#n - n)/2);
      finish NCHOOSE2;

/************************************************************************************
MODULE: FINDINV
Calculate the inverse for symmetric and positive definite matrix
************************************************************************************/
   start FINDINV (A);
   AHALF=ROOT(A);
   GINV=ginv(AHALF);
   AINV=GINV*GINV`;
   return (AINV);
   finish;


/**************************************************************************************
MODULE: VEERDB
Embedded within SCORE MODULE to do the following:
(i) create VEE, R, and DB matrices
(ii) check to see if correlation is within range based on marginal means

INPUT
	gamma_c: correlation vector for cluster
	mu_c:	 mean vector for cluster
	X_c:	 Covariates for beta estimating equations
	flag:    Performs an Eigenanalysis of B to see if positive definite. Only
                 called when computing the variance at the end (0 = no, 1 = yes). Prints
                 warning for each cluster violation.
	n:	 vector of cluster sizes
	i:	 index for cluster
	p:	 dimension of X

OUTPUT
        VEE:	 working variances for alpha estimating equatinons
	R:	 residual vector for alpha estimating equations
	DB:	 -E(d(corr)/dbeta) (i.e. Q) matrix
	rangeflag: Checks to see if correlation is within range based on marginal means
           (0 = in range, 1 = out of range). See Prentice (1988).
**************************************************************************************/


start VEERDB(VEE,R,DB,rangeflag,mu_c,gamma_c,X_c,y_c,flag,n,i,p,niter);

VEE = J(NCHOOSE2(n[i]), 1, 0);
R = J(NCHOOSE2(n[i]), 1, 0);
DB = J(NCHOOSE2(n[i]), p, 0);

l = 1;
do j = 1 to n[i] - 1;
   do k = j + 1 to n[i];

   rho_upper_limit = min(sqrt((mu_c[j]*(1-mu_c[k]))/(mu_c[k]*(1-mu_c[j]))), 
                       sqrt((mu_c[k]*(1-mu_c[j]))/(mu_c[j]*(1-mu_c[k])))); 

   rho_lower_limit = max(-sqrt((mu_c[j]*mu_c[k])/((1-mu_c[j])*(1-mu_c[k]))), 
                       -sqrt(((1-mu_c[j])*(1-mu_c[k]))/(mu_c[j]*mu_c[k]))); 

  * RANGE CHECKS ;
   if ((gamma_c[l] >= rho_upper_limit) | (gamma_c[l] <= rho_lower_limit))
        & flag = 0 then do;
       rangeflag = 1;

       %if %upcase(&printrange) = YES %then %do; 
           print "Range Violation Detected for Cluster and Pair:", niter i j k;

	   CovarJ = X_c[j,];
	   CovarK = X_c[k,];
  	   print covarj covark;
           corrjk = gamma_c[l];
           margmeanj = mu_c[j];
           margmeank = mu_c[k];
           print rho_lower_limit corrjk rho_upper_limit margmeanj margmeank;
 
       %end;

*       return; /* this statement breaks the loop when there's a range violation*/

   end;

   if ((gamma_c[l] >= rho_upper_limit) | (gamma_c[l] <= rho_lower_limit))
        & flag = 1 then do;
       print "Last Update pushes Parms Out of Range.";
       print "Range Violation Detected for Cluster and Pair:", niter i j k;

   end;
   
   %if %upcase(&makevone) = YES %then %do;
       VEE[l] = 1; 
*	print &makevone;
   %end;

   %else %do;
       VEE[l] = 1 + ((1-2*mu_c[j])*(1-2*mu_c[k])*gamma_c[l]) /
            sqrt(mu_c[j]*(1-mu_c[j])*mu_c[k]*(1-mu_c[k])) - gamma_c[l]**2;

   %end;
 


   DB[l,] = GETROWB(mu_c, gamma_c[l], j, k, X_c);
   R[l] = ((y_c[j] - mu_c[j])*(y_c[k] - mu_c[k])) / 
          sqrt(mu_c[j]*(1-mu_c[j])*mu_c[k]*(1-mu_c[k])) - gamma_c[l];

   l = l + 1;
   end;

end;

finish;


/**************************************************************************************
MODULE: SCORE
Generates the score matrix for each cluster and approximate information
to be used to estimate parameters and generate standard errors

INPUT
     beta: Vector of marginal mean parameters
    alpha: Vector of marginal correlation parameters
        y: The 0/1 binary outcome
        X: Marginal mean covariates
        Z: Marginal correlation covariates
        n: Vector of cluster sample sizes
        w: Weights
        p: Number of marginal mean parameters
        q: Number of marginal correlation parameters
     flag: Performs an Eigenanalysis of B to see if positive definite. Only
           called when computing the variance at the end (0 = no, 1 = yes). Prints
           warning for each cluster violation.
rangeflag: Checks to see if correlation is within range based on marginal means
           (0 = in range, 1 = out of range). See Prentice (1988).

OUTPUT
     U: Score vector
UUtran: Sum of U_i*U_i` across all clusters
 Ustar: Approximate information matrix

**************************************************************************************/

   start SCORE(U, UUtran, Ustar, beta, alpha, y, X, Z, n, w, p, q, flag, rangeflag, niter);
      U = J(p+q, 1, 0);
      UUtran = J(p+q, p+q, 0);
      Ustar = J(p+q, p+q, 0);

      run BEGINEND(firstx, lastx, n);
      run BEGINEND(firstz, lastz, NCHOOSE2(n));
      
      do i = 1 to nrow(n);
           X_c = X[firstx[i]:lastx[i],];
           y_c = y[firstx[i]:lastx[i]];
           if n[i] = 1 then Z_c = 0;
           else Z_c = Z[firstz[i]:lastz[i],];

           if n[i] = 1 then do;
                 mu_c = 1 / (1 + exp(-X_c * beta));
                 U_c = X_c` * (y_c - mu_c);
                 UUtran_c = U_c * U_c`;
                 Ustar_c = X_c` * (mu_c * (1 - mu_c)) * X_c;
              
                 U[1:p] = U[1:p] + U_c * w[i];
                 UUtran[1:p, 1:p] = UUtran[1:p, 1:p] + UUtran_c * w[i];
                 Ustar[1:p, 1:p] = Ustar[1:p, 1:p] + Ustar_c * w[i];
            end;
            else do;
                 U_c = J(p+q, 1, 0);
                 Ustar_c = J(p+q, p+q, 0);
                 mu_c = 1 / (1 + exp(-X_c * beta));
                 gamma_c = Z_c * alpha;

                 run VEERDB(VEE,R,DB,rangeflag,mu_c,gamma_c,X_c,y_c,flag,n,i,p,niter);
                 

                 C = CREATEC(X_c, mu_c);
                 B = CREATEB(mu_c, gamma_c, n[i]);
                 A = CREATEA(mu_c, y_c);

                 * Check for pos def of B;
                 if flag = 1 then do;
                    if min(eigval(B)) <= 0 then do;
                       print"Cluster:" i,
                            "Var(Y) is not Positive-Definite",
                            "Joint Distribution Does Not Exist";
                    end;
                 end;
                 
		 INVB=Findinv(B);
                 U_c[1:p] = C` * INVB * A;	*print VEE;
                 U_c[p+1:p+q] = Z_c` * (R # (1 / VEE));
                 UUtran_c = U_c * U_c`;
                 Ustar_c[1:p, 1:p]  = C` * INVB * C;
* when commented out its treat alpha and beta estimating equations as orthogonal;
                *Ustar_c[p+1:p+q, 1:p] = Z_c` * (DB # (1 / VEE));
                 Ustar_c[p+1:p+q, p+1:p+q] = Z_c` * (Z_c # (1 / VEE));

                 U = U + U_c * w[i];
                 UUtran = UUtran + UUtran_c * w[i];
                 Ustar = Ustar + Ustar_c * w[i];
                 end;
           end; 
      rangeflag = 0;

      finish SCORE;

/**************************************************************************************

MODULE: GETROWB
Generates a row of the -E(d(corr)/dbeta) (ie Q) matrix

INPUT
   mu: marginal means for cluster i
gamma: pairwise correlation for outcome j and k for cluster i
    j: indicator for mean 
    k: indicator for mean 
    X: covariate matrix for cluster i

OUTPUT
Row of -E(d(corr)/dbeta) (ie Q) matrix

**************************************************************************************/

   start GETROWB(mu, gamma, j, k, X);
      row = (gamma/2) * ((1-2*mu[j])*X[j,] + (1-2*mu[k])*X[k,]);
      return(row);
      finish GETROWB;

/**************************************************************************************

MODULE: GETPAIRWISE
Generates pairwise marginal means with marginal means and CORR

INPUT
  muj: marginal mean for outcome j for cluster i
  muk: marginal mean for outcome k for cluster i
gamma: linear predictor between outcome j and k for cluster i

OUTPUT
pairwise marginal means, mujk

**************************************************************************************/

   start GETPAIRWISE(muj, muk, gamma);
      mujk = sqrt(muj * muk * (1-muj) * (1-muk)) * gamma + muj * muk;
      return(mujk);
      finish GETPAIRWISE;

/**************************************************************************************

MODULE: CREATEA
Creates residual for beta estimating equation, (Y - mu)

INPUT
mu: Vector of n_i marginal means
 y: Outcome vector for ith cluster

OUTPUT
residuals for beta estimating equation

**************************************************************************************/

   start CREATEA(mu, y);
      return(y - mu);
      finish CREATEA;

/**************************************************************************************

MODULE: CREATEC
Creates derivative matrix for beta estimating equation, dmu/dbeta

INPUT
 X: Covariate matrix for cluster i
mu: Vector of n_i marginal means

OUTPUT
derivative matrix for beta estimating equation

**************************************************************************************/

   start CREATEC(X, mu);
      return(X # (mu # (1 - mu)));
      finish CREATEC;

/**************************************************************************************

MODULE: CREATEB
Creates covariance matrix for beta estimating equation, var(Y)

INPUT
   mu: Vector of n_i marginal means
    n: Sample size (scalar) for cluster i
gamma: Vector of rho_jk between outcome j and k for cluster i

OUTPUT
covariance matrix for beta estimating equation

**************************************************************************************/

   start CREATEB(mu, gamma, n);
      d = mu # (1 - mu);
      B = diag(d);
      l = 1;
      do j = 1 to n - 1;
         do k = j + 1 to n;
            B[j,k] = GETPAIRWISE(mu[j], mu[k], gamma[l]) - mu[j] * mu[k];
            l = l + 1;
            end;
         end;
      return(sqrsym(symsqr(B`)));
      finish CREATEB;

/**************************************************************************************

MODULE: RESULTS
Creates printed output to screen of parameters and other information

INPUT
  beta: Vector of marginal mean parameters
 alpha: Vector of marginal correlation Parameters  
robust: Robust covariance matrix for beta and alpha
 naive: Naive (Model-Based) covariance matrix for beta 
 niter: Number of iterations until convergence
     n: Vector of cluster sample sizes
     w: Weights

OUTPUT
To Screen

**************************************************************************************/

   start RESULTS(beta, alpha, robust, naive, bias, niter, n, w); 
      p = nrow(beta);
      q = nrow(alpha);

	

      print 'Prentice Method (1988)',,
            'Number of Clusters:' (sum(w)),
            'Maximum Cluster Size:' (max(n)),
            'Minimum Cluster Size:' (min(n)),
            'Number of Iterations:' niter[label=''],
            "Outcome Variable: &yvar";

      print '***** Robust and Bias-corrected Standard Errors *****';

      var = {&xvar}`;
      parm = beta;
      stderr = sqrt(vecdiag(robust[1:p, 1:p]));
      chisq = (beta / stderr) ## 2;
      pvalue = 1 - probchi(chisq, 1);

      bstd = sqrt(vecdiag(bias[1:p, 1:p]));
      bchi = (beta / bstd) ## 2;
      bpval = 1 - probchi(bchi, 1);

	

      print 'Marginal Mean Parameter Estimates',,
            var parm stderr chisq pvalue bstd bchi bpval;

      var = {&zvar}`;
      parm = alpha;
      stderr = sqrt(vecdiag(robust[p+1:p+q, p+1:p+q]));
      chisq = (alpha / stderr) ## 2;
      pvalue = 1 - probchi(chisq, 1);

      bstd = sqrt(vecdiag(bias[p+1:p+q, p+1:p+q]));
      bchi = (alpha / bstd) ## 2;
      bpval = 1 - probchi(bchi, 1);

      print 'Marginal Correlation Parameter Estimates',,
            var parm stderr chisq pvalue bstd bchi bpval;

      print '***** Naive (Model-Based) Standard Errors *****';

      var = {&xvar}`;
      parm = beta;
      stderr = sqrt(vecdiag(naive));
      chisq = (beta / stderr) ## 2;
      pvalue = 1 - probchi(chisq, 1);

      print 'Marginal Mean Parameter Estimates',,
            var parm stderr chisq pvalue;

      finish RESULTS;


/**************************************************************

/**************************************************************

MODULE: INVBIG
compute (A - mm`)^{-1}c without performing the inverse directly
INPUT
   ainvc: inverse of matrix A times vector c
   ainvm: inverse of matrix A times matrix (with low no. of columns) M
      M: matrix of eigen column vectors m1,m2, ..
      c: vector 
  start: of do loop
  end:   of do loop, rank of X
**************************************************************/

Start INVBIG(ainvc,ainvm,m,c,start,end);
do i=start to end;
   b = ainvm[,i];
   bt = b`;
   btm = bt*m;
   btmi = btm[,i];
   gam = 1 - btmi;
   bg = b/gam;
   ainvc = ainvc + bg*(bt*c);
   if i< end then ainvm = ainvm  + bg*btm;  
end;
finish INVBIG;
      
/**************************************************************************************

MODULE: MAKEVAR
Creates covariance matrix of beta and alpha.

INPUT
  beta: Vector of marginal mean parameters
 alpha: Vector of marginal correlation parameters
     y: The 0/1 binary outcome
     X: Marginal mean covariates
     Z: Marginal correlation covariates
     n: Vector of cluster sample sizes
     w: Weights
     p: Number of marginal mean parameters
     q: Number of marginal correlation parameters

OUTPUT
robust: Robust covariance matrix for beta and alpha
 naive: Naive (Model-Based) covariance matrix for beta

**************************************************************************************/

   start MAKEVAR(robust, naive, bias, beta, alpha, y, X, Z, n, w, p, q, id,nobs,niter);
      run SCORE(U, UUtran, Ustar, beta, alpha, y, X, Z, n, w, p, q, 1, 0, niter);
      inustar = FINDINV(Ustar);
      robust = inustar * UUtran * inustar;
      naive = FINDINV(Ustar[1:p,1:p]);   /* if B=0, this is upper left matrix of inustar */ 
      naivealp = FINDINV(Ustar[p+1:p+q,p+1:p+q]);  /* for use in cluster diagnostics */

* new commands to compute INV(I - H1);
      Call Eigen(evals1,evecs1,naive);
      sqrevals1 = sqrt(evals1);
      sqe1 = evecs1*diag(sqrevals1);

* new commands to compute INV(I - H2);
      Call Eigen(evals2,evecs2,naivealp);
      sqrevals2 = sqrt(evals2);
      sqe2 = evecs2*diag(sqrevals2);

     /****************************************************
     * Bias-corrected variance and influence diagnostics *
     ****************************************************/
      U = J(p+q, 1, 0);
      UUbc = J(p+q, p+q, 0);
/* alternative way */
      biascov = J(p+q, p+q, 0);

      run BEGINEND(firstx, lastx, n);
      run BEGINEND(firstz, lastz, NCHOOSE2(n));

      KK = nrow(n);      

      dfbetcls = J(KK,p,0);  /* dfbeta for a cluster */
      dfalpcls = J(KK,q,0);  /* dfalpha for a cluster */
      Hbeta    = J(KK,1,0); /* cluster leverage for beta */
      Halpha   = J(KK,1,0); /* cluster leverage for alpha */
      cookdcls = J(KK,1,0); /* cooks distance overall */ 
      cookbetcls = J(KK,1,0); /* cooks distance for beta */
      cookalpcls = J(KK,1,0);  /* cooks distance for alpha */

	  /*for observation level deletion diagnostics*/
	  Dfbetobs=J(nobs,p,0);
	  Dfalpobs = J(nobs,q,0);
	  cookdobs = J(nobs,1,0);
	  CookBetObs = J(nobs,1,0);
	  CookAlpObs = J(nobs,1,0);
	  Clusterid = J(nobs,1,0);
	  ClusterN = J(nobs,1,0);
          Levbetobs = J(nobs,1,0); /* observation leverage */


      do i = 1 to KK; /*cluster*/
           X_c = X[firstx[i]:lastx[i],];
           y_c = y[firstx[i]:lastx[i]];
           mu_c = 1 / (1 + exp(-X_c * beta));
 
           if n[i] = 1 then do;
                 B = mu_c * (1 - mu_c);
                 C = X_c #B;
                 Hi1=C*naive*X_c`;
                 U_c = X_c` * (y_c - mu_c)/(1 - Hi1);
                 UUbc_c = U_c * U_c`;
              
                 U[1:p] = U[1:p] + U_c * w[i];
                 UUbc[1:p, 1:p] = UUbc[1:p, 1:p] + UUbc_c * w[i];

                * Hi2 is not defined for clusters of size 1;
                 Hbeta[i] = Hi1;

                 /* Halpha is set to 0 by default */
                 dfbetclsi = naive*U_c; 
                 Dfbetcls[i,] = dfbetclsi`;
		 Dfbetobs[firstx[i],] = dfbetclsi`;/* dfbetobs = dfbetcls when n[i]=1*/
				 /* dfalpcls[i,] value is 0 by default */
                 dfalpclsi = dfalpcls[i,]`;
		 dfalpobs[firstx[i],] = dfalpclsi`;/* dfalpobs = dfalpcls when n[i]=1*/

				 /*observation level cook's d*/

		         Dfobsboth = dfbetclsi//dfalpclsi;
        		 cookdobs[firstx[i]] =   Dfobsboth`*robust*Dfobsboth/(p+q);
          		 cookbetobs[firstx[i]] = Dfbetcls[i,]*robust[1:p,1:p]*dfbetclsi/p;
          		 cookalpobs[firstx[i]] = Dfalpcls[i,]*robust[p+1:p+q,p+1:p+q]*dfalpclsi/q;
                         levbetobs[firstx[i]] = Hbeta[i];
 
           end;  /* if n[i] = 1 */

           else do;
              U_c = J(p+q, 1, 0);
              Z_c = Z[firstz[i]:lastz[i],];
              gamma_c = Z_c * alpha;

* commands for beta;
              C = CREATEC(X_c, mu_c);
              B = CREATEB(mu_c, gamma_c, n[i]);
              A = CREATEA(mu_c, y_c);
              INVB=Findinv(B);
              CtinvB = C`*invB; 
              Hi1=C*naive*CtinvB;
              Hbeta[i] = trace(Hi1);  /* cluster leverage for beta */

* commands to compute generalized inverse - beta;
              ai1 = invB; 
              mm1 = C*sqe1; 
	      ai1A=ai1*A; 
       	      ai1m1=ai1*mm1;   
			  *print 'INVBIG(ai1A, ai1m1, mm1, A, 1, p)';
			  *print ai1A ai1m1 mm1 A p;
              run INVBIG(ai1A,ai1m1,mm1,A,1,p); 
              U_c[1:p] = C`*ai1A; 
 
* commands for alpha;
              VEE = J(NCHOOSE2(n[i]), 1, 0);
              R = J(NCHOOSE2(n[i]), 1, 0);
*              DB = J(NCHOOSE2(n[i]), p, 0);
                 
              l = 1;
              do j = 1 to n[i] - 1;
                 do k = j + 1 to n[i];
				  	%if %upcase(&makevone) = YES %then %do;
       						VEE[l] = 1; 
		   			    %end;
					%else %do;
					VEE[l] = 1 + ((1-2*mu_c[j])*(1-2*mu_c[k])*gamma_c[l]) /
                       sqrt(mu_c[j]*(1-mu_c[j])*mu_c[k]*(1-mu_c[k])) - gamma_c[l]**2;
					%end;
*                    DB[l,] = GETROWB(mu_c, gamma_c[l], j, k, X_c);
                    R[l] = ((y_c[j] - mu_c[j])*(y_c[k] - mu_c[k])) / 
                       sqrt(mu_c[j]*(1-mu_c[j])*mu_c[k]*(1-mu_c[k])) - gamma_c[l];
                    l = l + 1;
                 end;
              end;

			
              Zctvee = Z_c`#(1/VEE`);
	      Hi2=Z_c*naivealp*Zctvee;
              Halpha[i] = trace(Hi2);  /* cluster leverage for alpha */

* commands to compute generalized inverse - alpha;
              mm2 = Z_c*sqe2; 
              ai2R = R/VEE;  
              ai2m2 = mm2#(1/VEE);  
              run INVBIG(ai2R,ai2m2,mm2,R,1,q);  
              U_c[p+1:p+q] = Z_c` *ai2R;  

              UUbc_c = U_c * U_c`;
              UUbc = UUbc + UUbc_c * w[i];

              /* cluster deletion diagnostics */
 
              dfbetclsi = naive*U_c[1:p]; 
              dfalpclsi = naivealp*U_c[p+1:p+q];
              Dfbetcls[i,] = dfbetclsi`;
              Dfalpcls[i,] = dfalpclsi`;

			  /*for observation level diagnostics*/

	%IF %length(&OBSOUT)^= 0 %then %DO;
	   	 run DFOBS(p,q,n, nobs, i, B,invB, C, naive, A,firstx,robust,VEE,Z_c,
	        	naivealp,R,clusterid, clustern,dfbetobs,cookbetobs,dfalpobs,cookalpobs,cookdobs);
                 levbetobs[firstx[i]:lastx[i]] = vecdiag(Hi1);

       %END;


          end; /* loop if cluster > 1 */

          Dfboth = dfbetclsi//dfalpclsi;
          cookdcls[i] =   Dfboth`*Findinv(robust)*Dfboth/(p+q);
          cookbetcls[i] = Dfbetcls[i,]*Findinv(robust[1:p,1:p])*dfbetclsi/p;
          cookalpcls[i] = Dfalpcls[i,]*Findinv(robust[p+1:p+q,p+1:p+q])*dfalpclsi/q;

* alternative way of calculating bias-corrected covariance estimator;
          biascov = biascov + Dfboth*Dfboth`;

      end;  /* do i=1 to KK */
      Bias = inustar * UUbc * inustar;

*      print bias;
*      print biascov;


/* Request Output Datasets for diagnostics (and covariance matrices ?) */

     %IF (%length(&CLSOUT) ^= 0) %THEN %DO;
         id = (unique(id))`;
         if type(id)="N" then id=char(id);
         C1 = {"I" "NI" "Hbeta" "Halpha" "cookdcls" "cookbetcls" "cookalpcls" &xvar &zvar};
         xvar = dfbetcls;
         zvar = dfalpcls;
         i = (1:KK)`;
         clout = i||n||Hbeta||Halpha||cookdcls||cookbetcls||cookalpcls||xvar||zvar;       
         create &CLSOUT from clout [rowname= id colname=c1];
         append from clout [rowname= id];
		 print clout [rowname= id colname=c1];
      %END;

    %IF (%length(&OBSOUT) ^= 0) %THEN %DO;
	 obs = (1:nobs)`;
	 if type(obs)="N" then obs=char(obs);
         clobsout = clusterid||clustern||levbetobs||cookdobs||cookbetobs||cookalpobs||dfbetobs||dfalpobs; 
         C2 = {CLUSTER NI levbetobs cookdobs cookbetobs cookalpobs &xvar &zvar};
         create &OBSOUT from clobsout [rowname= obs colname=c2];
         append from clobsout [rowname= obs];
*	 print clobsout [rowname=obs colname=c2];
	 print "Observation Deletion Diagnostics located in &OBSOUT";
    %END;


    %IF (%length(&PROBOUT) ^= 0) %THEN %DO;

	prob= 1/(1+exp(-X*beta));
        stdres = (y - prob)/sqrt(prob#(1-prob));
	probout= y || X || prob || stdres;
	C3={Y &xvar Pred_prob Std_res};
        create &PROBOUT from probout [rowname=obs colname=c3];
	append from probout [rowname=obs];
	print "Predicted Probabilities are located in &PROBOUT";

   %END;


finish MAKEVAR;


/**************************************************************************************
MODULE: DFOBS
Calculates observation level deletion diagnostics 
for the mean and covariance parameters.

OUTPUT
deletion diagnostics for alpha and beta parameters, 
approximated by the one step iteration.  Location specified by user.

**************************************************************************************
**************************************************************************************

NOTES:
i indexes cluster
n[i]=cluster size
KK = number of clusters
p= number of mean parameters (beta)
q= number of covariance parameters (alpha)
naive = (D' Vinv D)inv = inxwx

B = covariance matrix for beta estimating equation, var(Y)

X_c = X covariates for cluster
obs_i = within cluster variable = vector of 1:n[i]

A = y-u
C = d u / d Beta = d u / d eta X = Linv X (=D from preisser notes)
B = V
W = Vinv

naivealp = (S'Tinv S)inv = M_2 inv
mu: Vector of n_i marginal means
n: Sample size (scalar) for cluster i
gamma: Vector of rho_jk between outcome j and k for cluster i
VEE:	 working variances for alpha estimating equatinons = L` T L
R:	 residual vector for alpha estimating equations = E 2
DB:	 -E(d(corr)/dbeta) (i.e. Q) matrix 

***********************************************************************************/



start DFOBS(p,q,n,nobs,i,B,invB,C,naive,A,firstx,robust,VEE,Z_c,naivealp,R,clusterid,
clustern,dfbetobs,cookbetobs,dfalpobs,cookalpobs,cookdobs);

do ii = 1 to n[i];

	obs_i = 1:n[i];		*vector;
	comp = loc(obs_i^= ii);

	/*for beta*/
        /**************************************/

	vii = B[comp, ii];
	wii = invB[comp, ii];

	Wti = invB[comp, comp]-(wii*wii`)/invB[ii, ii]; 
	WV = wti*vii;
	
	Ctilda = C[ii, ]`-C[comp, ]`*WV; 
	Htilda = Ctilda`*naive*Ctilda;   
	Stilda = B[ii, ii]-vii`*WV;      
	Atilda = A[ii]-A[comp]`*WV;      

	hdobs = Ctilda*Atilda/(Stilda-Htilda);

	dfbetii = naive*hdobs;
	dfbetobs[firstx[i]- 1 + ii, ] = dfbetii`;

	/*Cluster information*/
	clusterid[firstx[i]- 1 + ii] =i;
	clustern[firstx[i]- 1 + ii] =n[i];

	/**************************************/

/* Cook's D normalized by robust variance estimate  */
*cbetobs = (hdobs`*robust[1:p,1:p]*hdobs)/p; 
cbetobs = (dfbetii`*Findinv(robust[1:p,1:p])*dfbetii)/p; 
cookbetobs[firstx[i] -1 + ii] = cbetobs;


	/*for alpha*//* if n[i]=2 special treatment is needed because there is only one row in z*/
        /**************************************/
	dimobs = n[i]-1;

	if n[i] > 2 then do;
		obs =  J(dimobs,1,0);
		comp = J(NCHOOSE2(dimobs),1,0); end;
	else if n[i] =2 then do; comp = .; obs = 1;end;	  					

	count=0;
	ncomp=1;
	nii=1;

/*routine to make list of observations to be removed*/								
	do obsj = 1 to n[i] - 1;
		do obsk =obsj+1 to n[i];
			count=count+1;
				if ( ii^=obsj & ii^=obsk) then do;
					comp[ncomp]=count;
					ncomp=ncomp+1;
				end;
				else do; 	obs[nii]=count;
						nii=nii+1;
				end;
	end;	end;/*end routine*/	
						

		VEEDIAG=diag(VEE);
		if n[i] > 2  then do;
				veeDCOMPinv = diag(1/VEE[comp]);
				veeveeinv = VEEDIAG[obs,comp]*veeDCOMPinv;	

				Ztilda = Z_c[obs,]-veeveeinv*Z_c[comp,];
				VEEtilda = VEEDIAG[obs,obs]-veeveeinv*VEEDIAG[comp,obs];
				H2tilda = Ztilda*naivealp*Ztilda`;
				Rtilda = R[obs]-veeveeinv*R[comp];
		end; /*  if n[i] > 2  */
		else if n[i] = 2 then do;
				Ztilda = Z_c[obs,];
				VEEtilda = VEEDIAG[obs,obs];
				H2tilda = Ztilda*naivealp*Ztilda`;
				Rtilda = R[obs];
		end; /*  if n[i] = 2  */

	ginvv = VEEtilda - H2tilda;
	hadobs = Ztilda`*ginv(ginvv)*Rtilda;

	dfalphaii = naivealp*hadobs; 
	Dfalpobs[ firstx[i] -1 + ii, ] = Dfalphaii`;

*	calpobs = (hadobs`*robust[p+1:p+q,p+1:p+q]*hadobs)/q; 
	calpobs = (dfalphaii`*Findinv(robust[p+1:p+q,p+1:p+q])*dfalphaii)/q; 
	cookalpobs[firstx[i] -1 + ii] = calpobs;

	Dfboth = dfbetii//dfalphaii;
	cookdobs[firstx[i]-1+ii] =   Dfboth`*Findinv(robust)*Dfboth/(p+q);

end;

finish DFOBS;


/**************************************************************************************/

   run GRABDATA(y, X, id, n, Z, w, maxiter, epsilon, nobs); 
   run FITPRENTICE(beta, alpha, robust, naive, bias, niter, converge, y, X, Z, n, w, maxiter, epsilon, id, nobs, startbeta, startalpha);
   if converge = 1 then run RESULTS(beta, alpha, robust, naive, bias, niter, n, w);
      else if converge = 0 then print'ALGORITHM DID NOT CONVERGE';

/*************************************************************************************/



   %mend PRENTICEcorrJSP;
   
/*************************************************************************************/