| Matrix_eQTL_main {MatrixEQTL} | R Documentation | 
Matrix_eQTL_engine function tests association of every row of the snps dataset with every row of the gene dataset using a linear regression model defined by the useModel parameter (see below).
The testing procedure accounts for extra covariates in cvrt parameter.
The errorCovariance parameter can be set to the error variance-covariance matrix to account for heteroskedastic and/or correlated errors.
Associations significant at pvOutputThreshold (pvOutputThreshold.cis) levels are saved to output_file_name (output_file_name.cis), with corresponding estimates of effect size (slope coefficient), test statistics, p-values, and q-values (false discovery rate).
Matrix eQTL can perform separate analysis for local (cis) and distant (trans) eQTLs.For such analysis one has to set the cis-analysis specific parameters pvOutputThreshold.cis > 0, cisDist, snpspos and genepos in the call of Matrix_eQTL_main function.A gene-SNP pair is considered local if the distance between them is less or equal to cisDist.The genomic location of genes and SNPs is defined by data frames snpspos and genepos.Depending on p-value thresholds pvOutputThreshold and pvOutputThreshold.cis Matrix eQTL runs in one of three different modes:
 Set pvOutputThreshold > 0 and pvOutputThreshold.cis = 0 (or use Matrix_eQTL_engine) to perform eQTL analysis without using gene/SNP locations. Associations significant at the pvOutputThreshold level are be recorded in output_file_name and in the returned object.
 Set pvOutputThreshold = 0 and pvOutputThreshold.cis > 0 to perform eQTL analysis for local gene-SNP pairs only. Local associations significant at pvOutputThreshold.cis level will be recorded in output_file_name.cis and in the returned object.
 Set pvOutputThreshold > 0 and pvOutputThreshold.cis > 0 to perform eQTL analysis with separate p-value thresholds for local and distant eQTLs. Distant and local associations significant at corresponding thresholds are recorded in output_file_name and output_file_name.cis respectively and in the returned object. In this case the false discovery rate is calculated separately for these two sets of eQTLs.
Matrix_eQTL_engine is a wrapper for Matrix_eQTL_main for eQTL analysis without regard to gene/SNP location and provided for compatibility with the previous versions of the package.
The parameter pvalue.hist allows to record information sufficient to create a histogram or QQ-plot of all the p-values (see plot).
Matrix_eQTL_main( snps, 
                  gene, 
                  cvrt = SlicedData$new(), 
                  output_file_name = "", 
                  pvOutputThreshold = 1e-5,
                  useModel = modelLINEAR, 
                  errorCovariance = numeric(), 
                  verbose = TRUE, 
                  output_file_name.cis = "", 
                  pvOutputThreshold.cis = 0,
                  snpspos = NULL, 
                  genepos = NULL,
                  cisDist = 1e6,
                  pvalue.hist = FALSE,
                  min.pv.by.genesnp = FALSE,
                  noFDRsaveMemory = FALSE)
Matrix_eQTL_engine(snps, 
                   gene, 
                   cvrt = SlicedData$new(), 
                   output_file_name, 
                   pvOutputThreshold = 1e-5, 
                   useModel = modelLINEAR, 
                   errorCovariance = numeric(), 
                   verbose = TRUE,
                   pvalue.hist = FALSE,
                   min.pv.by.genesnp = FALSE,
                   noFDRsaveMemory = FALSE)
| snps | 
 | |||||||||||||||||
| gene | 
 | |||||||||||||||||
| cvrt | 
 | |||||||||||||||||
| output_file_name | 
 | |||||||||||||||||
| output_file_name.cis | 
 | |||||||||||||||||
| pvOutputThreshold | 
 | |||||||||||||||||
| pvOutputThreshold.cis | 
 | |||||||||||||||||
| useModel | 
 
 | |||||||||||||||||
| errorCovariance | 
 | |||||||||||||||||
| verbose | 
 | |||||||||||||||||
| snpspos | 
 
 | |||||||||||||||||
| genepos | 
 
 | |||||||||||||||||
| cisDist | 
 | |||||||||||||||||
| pvalue.hist | 
 | |||||||||||||||||
| min.pv.by.genesnp | 
 | |||||||||||||||||
| noFDRsaveMemory | 
 | 
Note that the columns of gene, snps, and cvrt must match.If they do not match in the input files, use ColumnSubsample method to subset and/or reorder them.
The detected eQTLs are saved in output_file_name and/or output_file_name.cis if they are not NULL.The method also returns a list with a summary of the performed analysis.
| param | Keeps all input parameters and also records the number of degrees of freedom for the full model. | 
| time.in.sec | Time difference between the start and the end of the analysis (in seconds). | 
| all | Information about all detected eQTLs. | 
| cis | Information about detected local eQTLs. | 
| trans | Information about detected distant eQTLs. | 
The elements all, cis, and trans may contain the following components
ntestsTotal number of tests performed. This is used for FDR calculation.
eqtlsData frame with recorded significant associations. Not available if noFDRsaveMemory=FALSE
neqtlsNumber of significant associations recorded.
hist.binsHistogram bins used for recording p-value distribution. See pvalue.hist parameter.
hist.countsNumber of p-value that fell in each histogram bin. See pvalue.hist parameter.
min.pv.snpsVector with the best p-value for each SNP. See min.pv.by.genesnp parameter.
min.pv.geneVector with the best p-value for each gene. See min.pv.by.genesnp parameter.
Andrey Shabalin ashabalin@vcu.edu
The package website: http://www.bios.unc.edu/research/genomic_software/Matrix_eQTL/
The code below is the sample code for eQTL analysis NOT using gene/SNP locations.
See MatrixEQTL_cis_code for sample code for eQTL analysis that separates local and distant tests.
# Matrix eQTL by Andrey A. Shabalin
# http://www.bios.unc.edu/research/genomic_software/Matrix_eQTL/
# 
# Be sure to use an up to date version of R and Matrix eQTL.
# source("Matrix_eQTL_R/Matrix_eQTL_engine.r");
library(MatrixEQTL)
## Location of the package with the data files.
base.dir = find.package('MatrixEQTL');
## Settings
# Linear model to use, modelANOVA, modelLINEAR, or modelLINEAR_CROSS
useModel = modelLINEAR; # modelANOVA, modelLINEAR, or modelLINEAR_CROSS
# Genotype file name
SNP_file_name = paste(base.dir, "/data/SNP.txt", sep="");
# Gene expression file name
expression_file_name = paste(base.dir, "/data/GE.txt", sep="");
# Covariates file name
# Set to character() for no covariates
covariates_file_name = paste(base.dir, "/data/Covariates.txt", sep="");
# Output file name
output_file_name = tempfile();
# Only associations significant at this level will be saved
pvOutputThreshold = 1e-2;
# Error covariance matrix
# Set to numeric() for identity.
errorCovariance = numeric();
# errorCovariance = read.table("Sample_Data/errorCovariance.txt");
## Load genotype data
snps = SlicedData$new();
snps$fileDelimiter = "\t";      # the TAB character
snps$fileOmitCharacters = "NA"; # denote missing values;
snps$fileSkipRows = 1;          # one row of column labels
snps$fileSkipColumns = 1;       # one column of row labels
snps$fileSliceSize = 2000;      # read file in slices of 2,000 rows
snps$LoadFile(SNP_file_name);
## Load gene expression data
gene = SlicedData$new();
gene$fileDelimiter = "\t";      # the TAB character
gene$fileOmitCharacters = "NA"; # denote missing values;
gene$fileSkipRows = 1;          # one row of column labels
gene$fileSkipColumns = 1;       # one column of row labels
gene$fileSliceSize = 2000;      # read file in slices of 2,000 rows
gene$LoadFile(expression_file_name);
## Load covariates
cvrt = SlicedData$new();
cvrt$fileDelimiter = "\t";      # the TAB character
cvrt$fileOmitCharacters = "NA"; # denote missing values;
cvrt$fileSkipRows = 1;          # one row of column labels
cvrt$fileSkipColumns = 1;       # one column of row labels
if(length(covariates_file_name)>0) {
  cvrt$LoadFile(covariates_file_name);
}
## Run the analysis
me = Matrix_eQTL_engine(
    snps = snps,
    gene = gene,
    cvrt = cvrt,
    output_file_name = output_file_name,
    pvOutputThreshold = pvOutputThreshold,
    useModel = useModel, 
    errorCovariance = errorCovariance, 
    verbose = TRUE,
    pvalue.hist = TRUE,
    min.pv.by.genesnp = FALSE,
    noFDRsaveMemory = FALSE);
		
unlink(output_file_name);
## Results:
cat('Analysis done in: ', me$time.in.sec, ' seconds', '\n');
cat('Detected eQTLs:', '\n');
show(me$all$eqtls)
## Plot the histogram of all p-values
plot(me)