* ldr.sas : lossless data reduction for clustering, discriminant analysis, etc Use for high-dimensional data, i.e. if the number of samples or observations or items to be clustered is far less than the number of features or variables. Please cite: Qaqish, B. F., O'Brien, J. J., Hibbard, J. C., Clowers, K. J. (2017). Accelerating high-dimensional clustering with lossless data reduction. BIOINFORMATICS, Volume: 33, Issue: 18, Pages: 2867-2872. DOI: 10.1093/bioinformatics/btx328 *********************************************************************************************; proc iml; start ldr (x); * x is p * n, n points in p dimensions, p > n-1; * find coordinates of the n points in the hyperplane in n-1 dimensions; n = ncol(x); call qr(q, r, pivot, lindep, x - x[,n]); * QR. Set origin = the last point, arbitrary; r = r[,pivot]; * print x, y, r ; return(r[1:(n-1),]); * return n points (columns) in n-1 dimensions (rows); finish; **********************************************************************************************; * Example showing how to use the function ldr() to speed up the clustering of n samples based on p features, p >> n. Basically, cluster ldr(X) instead of X. Yes, it is that easy! ; n = 100; * samples, examples, observations; p = 20000; * features, variables, attributes; seed = 314159; X = j(p, n, seed); * rows are features, columns are samples; X = rannor(X); * Do this; tinyX = ldr(X); * tinyX is (n-1) * n; * Then run the clustering on tinyX instead of on X. * You'll have to transpose it since in X and tinyX above, observations are in columns, not rows. * Also speeds up discriminant analysis, regression, and similar multivariate procedures. * Bootstrap or resample columns from tinyX instead of X. * Same results. Huge speedup. ;