* ldr.sas : lossless data reduction for clustering, discriminant analysis, etc
Use for high-dimensional data, i.e. if the number of samples or observations 
or items to be clustered is far less than the number of features or variables.

Please cite:

Qaqish, B. F., O'Brien, J. J., Hibbard, J. C., Clowers, K. J. (2017).
Accelerating high-dimensional clustering with lossless data reduction.
BIOINFORMATICS, Volume: 33, Issue: 18, Pages: 2867-2872.
DOI: 10.1093/bioinformatics/btx328

*********************************************************************************************;

proc iml;

start ldr (x);
  * x is p * n, n points in p dimensions, p > n-1;
  * find coordinates of the n points in the hyperplane in n-1 dimensions;
  n = ncol(x);
  call qr(q, r, pivot, lindep, x - x[,n]); * QR. Set origin = the last point, arbitrary;
  r = r[,pivot];
* print x, y, r ;
  return(r[1:(n-1),]);  * return n points (columns) in n-1 dimensions (rows);
finish;

**********************************************************************************************;
*
 Example showing how to use the function ldr() to speed up the clustering 
 of n samples based on p features, p >> n.
 Basically, cluster ldr(X) instead of X. Yes, it is that easy!
;

n = 100;           * samples, examples, observations;
p = 20000;       * features, variables, attributes;
seed = 314159;
X = j(p, n, seed);         * rows are features, columns are samples;
X = rannor(X);
  
* Do this;
tinyX = ldr(X);      *  tinyX is (n-1) * n;

* Then run the clustering on tinyX instead of on X.
* You'll have to transpose it since in X and tinyX above, observations are in columns, not rows.
* Also speeds up discriminant analysis, regression, and similar multivariate procedures.
* Bootstrap or resample columns from tinyX instead of X.
* Same results. Huge speedup.
;