Title: | True Discovery Guarantee by Sum-Based Tests |
---|---|
Description: | It allows to quickly perform closed testing by sum-based global tests, and construct lower confidence bounds for the TDP, simultaneously over all subsets of hypotheses. As main features, it produces permutation-based simultaneous lower confidence bounds for the proportion of active voxels in clusters for fMRI data, differentially expressed genes in pathways for gene expression data, and significant effects for multiverse analysis. Details may be found in Vesely at al. (2023) < doi:10.1093/jrsssb/qkad019> and Tian at al. (2022) <doi:10.1111/sjos.12614>. |
Authors: | Anna Vesely and Xu Chen |
Maintainer: | Anna Vesely <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.1 |
Built: | 2025-02-22 05:55:54 UTC |
Source: | https://github.com/annavesely/sumsome |
It provides true discovery guarantees, using sum-based global statistics (sum of t-scores, p-value combinations, etc.). As main features, it produces permutation-based simultaneous lower confidence bounds for the proportion of active voxels in clusters for fMRI data, differentially expressed genes in pathways for gene expression data, and significant effects for multiverse analysis.
Anna Vesely and Xu Chen.
Maintainer: Anna Vesely <[email protected]>
Goeman J. J. and Solari A. (2011). Multiple testing for exploratory research. Statistical Science, doi: 10.1214/1-STS356.
Tian J., Chen X., Katsevich E., Goeman J. J. and Ramdas A. (2022). Large-scale simultaneous inference under dependence. Scandinavian Journal of Statistics, doi: 10.1111/sjos.12614.
Vesely A., Finos L., and Goeman J. J. (2023). Permutation-based true discovery guarantee by sum tests. Journal of the Royal Statistical Society, Series B (Statistical Methodology), doi: 10.1093/jrsssb/qkad019.
fMRI cluster analysis: brainScores
, brainPvals
, brainClusters
, brainAnalysis
Gene expression pathway analysis: geneScores
, genePvals
, geneAnalysis
Multiverse analysis: pimaAnalysis
General setting: sumStats
and sumPvals
(permutations), sumStatsPar
and sumPvalsPar
(parametric)
This function uses permutation t-statistics/p-values to determine a true discovery guarantee for fMRI cluster analysis. It computes confidence bounds for the number of true discoveries and the true discovery proportion within each cluster. The bounds are simultaneous over all sets, and remain valid under post-hoc selection.
brainAnalysis(sumBrain, clusters = NULL, nMax = 50, silent = FALSE)
brainAnalysis(sumBrain, clusters = NULL, nMax = 50, silent = FALSE)
sumBrain |
an object of class sumBrain, as returned by the functions |
clusters |
3D numeric array of cluster indices, or character for a Nifti file name. If NULL, the whole brain is considered. |
nMax |
maximum number of iterations per cluster. |
silent |
logical, |
brainAnalysis
returns a list containing summary
(data frame) and
TDPmap
(3D numeric array of the true discovery proportions).
The data frame summary
contains, for each cluster,
size
: size
TD
: lower (1-alpha
)-confidence bound for the number of true discoveries
maxTD
: maximum value of TD
that could be found under convergence of the algorithm
TDP
: lower (1-alpha
)-confidence bound for the true discovery proportion
maxTD
: maximum value of TDP
that could be found under convergence of the algorithm
dim1
, dim2
, dim3
: coordinates of the center of mass.
Anna Vesely.
Goeman J. J. and Solari A. (2011). Multiple testing for exploratory research. Statistical Science, doi: 10.1214/11-STS356.
Vesely A., Finos L., and Goeman J. J. (2023). Permutation-based true discovery guarantee by sum tests. Journal of the Royal Statistical Society, Series B (Statistical Methodology), doi: 10.1093/jrsssb/qkad019.
Permutation statistics for brain imaging: brainScores
, brainPvals
Suprathreshold clusters: brainClusters
# simulate 20 copes with dimensions 10x10x10 set.seed(42) copes <- list() for(i in seq(20)){copes[[i]] <- array(rnorm(10^3, mean = -10, sd = 30), dim=c(10,10,10))} # cluster map where t scores are grater than 2.8, in absolute value thr <- 2.8 cl <- brainClusters(copes = copes, thr = thr) # create object of class sumBrain res <- brainScores(copes = copes, alpha = 0.2, seed = 42, truncFrom = thr) res summary(res) # confidence bound for the number of true discoveries and the TDP within clusters out <- brainAnalysis(res, clusters = cl$clusters) out$summary
# simulate 20 copes with dimensions 10x10x10 set.seed(42) copes <- list() for(i in seq(20)){copes[[i]] <- array(rnorm(10^3, mean = -10, sd = 30), dim=c(10,10,10))} # cluster map where t scores are grater than 2.8, in absolute value thr <- 2.8 cl <- brainClusters(copes = copes, thr = thr) # create object of class sumBrain res <- brainScores(copes = copes, alpha = 0.2, seed = 42, truncFrom = thr) res summary(res) # confidence bound for the number of true discoveries and the TDP within clusters out <- brainAnalysis(res, clusters = cl$clusters) out$summary
This function determines spatially connected clusters, where t-scores are more extreme than a given threshold.
brainClusters(copes, mask = NULL, thr = 3.2, alternative = "two.sided", silent = FALSE)
brainClusters(copes, mask = NULL, thr = 3.2, alternative = "two.sided", silent = FALSE)
copes |
list of 3D numeric arrays (contrasts maps for each subject). |
mask |
3D logical array, where |
thr |
threshold. |
alternative |
direction of the alternative hypothesis ( |
silent |
logical, |
brainClusters
returns a 3D numeric array, with integer values corresponding to clusters,
and 0 to other voxels.
Anna Vesely.
Permutation statistics for brain imaging: brainScores
, brainPvals
True discovery guarantee for cluster analysis: brainAnalysis
# simulate 20 copes with dimensions 10x10x10 set.seed(42) copes <- list() for(i in seq(20)){copes[[i]] <- array(rnorm(10^3, mean = -10, sd = 30), dim=c(10,10,10))} # cluster map where t scores are grater than 2.8, in absolute value thr <- 2.8 cl <- brainClusters(copes = copes, thr = thr) # create object of class sumBrain res <- brainScores(copes = copes, alpha = 0.2, seed = 42, truncFrom = thr) res summary(res) # confidence bound for the number of true discoveries and the TDP within clusters out <- brainAnalysis(res, clusters = cl$clusters) out$summary
# simulate 20 copes with dimensions 10x10x10 set.seed(42) copes <- list() for(i in seq(20)){copes[[i]] <- array(rnorm(10^3, mean = -10, sd = 30), dim=c(10,10,10))} # cluster map where t scores are grater than 2.8, in absolute value thr <- 2.8 cl <- brainClusters(copes = copes, thr = thr) # create object of class sumBrain res <- brainScores(copes = copes, alpha = 0.2, seed = 42, truncFrom = thr) res summary(res) # confidence bound for the number of true discoveries and the TDP within clusters out <- brainAnalysis(res, clusters = cl$clusters) out$summary
This function computes p-value combinations for different permutations of brain imaging data. A voxel's p-value is calculated by performing the one-sample t test for the null hypothesis that its mean contrast over the different subjects is zero.
brainPvals(copes, mask = NULL, alternative = "two.sided", alpha = 0.05, B = 200, seed = NULL, truncFrom = NULL, truncTo = 0.5, type = "vovk.wang", r = 0, rand = FALSE)
brainPvals(copes, mask = NULL, alternative = "two.sided", alpha = 0.05, B = 200, seed = NULL, truncFrom = NULL, truncTo = 0.5, type = "vovk.wang", r = 0, rand = FALSE)
copes |
list of 3D numeric arrays (contrasts maps for each subject). |
mask |
3D logical array, where |
alternative |
direction of the alternative hypothesis ( |
alpha |
significance level. |
B |
number of permutations, including the identity. |
seed |
seed. |
truncFrom |
truncation parameter: values greater than |
truncTo |
truncation parameter: truncated values are set to |
type |
p-value combination among |
r |
parameter for Vovk and Wang's p-value combination. |
rand |
logical, |
A p-value p
is transformed as following.
Edgington: p
(Edgington, 1972)
Fisher: -2log(p)
(Fisher, 1925)
Pearson: 2log(1-p)
(Pearson, 1933)
Liptak: qnorm(1-p)
(Liptak, 1958; Stouffer et al., 1949)
Cauchy: tan[(0.5-p)pi]
with pi=3.142
(Liu and Xie, 2020)
Harmonic mean: 1/p
(Wilson, 2019)
Vovk and Wang: p^r
(log(p)
for r
=0) (Vovk and Wang, 2020)
An error message is returned if the transformation produces infinite values.
For Vovk and Wang, r=0
corresponds to Fisher, and r=-1
to the harmonic mean.
Truncation parameters should be such that truncTo
is not smaller than truncFrom
.
As Pearson's and Liptak's transformations produce infinite values in 1, for such methods
truncTo
should be strictly smaller than 1.
The significance level alpha
should be in the interval [1/B
, 1).
brainPvals
returns an object of class sumBrain
, containing
statistics
: numeric matrix of p-values, where columns correspond to voxels inside the brain, and rows to permutations.
The first permutation is the identity
mask
: 3D logical array, where TRUE
values correspond to voxels inside the brain
alpha
: significance level
truncFrom
: transformed first truncation parameter
truncTo
: transformed second truncation parameter
Anna Vesely.
Goeman J. J. and Solari A. (2011). Multiple testing for exploratory research. Statistical Science, doi: 10.1214/11-STS356.
Vesely A., Finos L., and Goeman J. J. (2023). Permutation-based true discovery guarantee by sum tests. Journal of the Royal Statistical Society, Series B (Statistical Methodology), doi: 10.1093/jrsssb/qkad019.
Permutation statistics for brain imaging using t scores: brainScores
True discovery guarantee for cluster analysis: brainAnalysis
Suprathreshold clusters: brainClusters
# simulate 20 copes with dimensions 10x10x10 set.seed(42) copes <- list() for(i in seq(20)){copes[[i]] <- array(rnorm(10^3, mean = -10, sd = 30), dim=c(10,10,10))} # cluster map where t scores are grater than 2.8, in absolute value thr <- 2.8 cl <- brainClusters(copes = copes, thr = thr) # create object of class sumBrain (combination: Cauchy) res <- brainPvals(copes = copes, alpha = 0.2, seed = 42, type = "cauchy") res summary(res) # confidence bound for the number of true discoveries and the TDP within clusters out <- brainAnalysis(res, clusters = cl$clusters) out$summary
# simulate 20 copes with dimensions 10x10x10 set.seed(42) copes <- list() for(i in seq(20)){copes[[i]] <- array(rnorm(10^3, mean = -10, sd = 30), dim=c(10,10,10))} # cluster map where t scores are grater than 2.8, in absolute value thr <- 2.8 cl <- brainClusters(copes = copes, thr = thr) # create object of class sumBrain (combination: Cauchy) res <- brainPvals(copes = copes, alpha = 0.2, seed = 42, type = "cauchy") res summary(res) # confidence bound for the number of true discoveries and the TDP within clusters out <- brainAnalysis(res, clusters = cl$clusters) out$summary
This function computes t-scores for different permutations of brain imaging data. A voxel's score is calculated by performing the one-sample t test for the null hypothesis that its mean contrast over the different subjects is zero.
brainScores(copes, mask = NULL, alternative = "two.sided", alpha = 0.05, B = 200, seed = NULL, truncFrom = 3.2, truncTo = 0, squares = FALSE)
brainScores(copes, mask = NULL, alternative = "two.sided", alpha = 0.05, B = 200, seed = NULL, truncFrom = 3.2, truncTo = 0, squares = FALSE)
copes |
list of 3D numeric arrays (contrasts maps for each subject). |
mask |
3D logical array, where |
alternative |
direction of the alternative hypothesis ( |
alpha |
significance level. |
B |
number of permutations, including the identity. |
seed |
seed. |
truncFrom |
truncation parameter: values less extreme than |
truncTo |
truncation parameter: truncated values are set to |
squares |
logical, |
Truncation parameters should be such that truncTo
is not more extreme than truncFrom
.
The significance level alpha
should be in the interval [1/B
, 1).
brainScores
returns an object of class sumBrain
, containing
statistics
: numeric matrix of t-scores, where columns correspond to voxels inside the brain, and rows to permutations.
The first permutation is the identity
mask
: 3D logical array, where TRUE
values correspond to voxels inside the brain
alpha
: significance level
truncFrom
: transformed first truncation parameter
truncTo
: transformed second truncation parameter
Anna Vesely.
Goeman J. J. and Solari A. (2011). Multiple testing for exploratory research. Statistical Science, doi: 10.1214/11-STS356.
Vesely A., Finos L., and Goeman J. J. (2023). Permutation-based true discovery guarantee by sum tests. Journal of the Royal Statistical Society, Series B (Statistical Methodology), doi: 10.1093/jrsssb/qkad019.
Permutation statistics for brain imaging using p-values: brainPvals
True discovery guarantee for cluster analysis: brainAnalysis
Suprathreshold clusters: brainClusters
# simulate 20 copes with dimensions 10x10x10 set.seed(42) copes <- list() for(i in seq(20)){copes[[i]] <- array(rnorm(10^3, mean = -10, sd = 30), dim=c(10,10,10))} # cluster map where t scores are grater than 2.8, in absolute value thr <- 2.8 cl <- brainClusters(copes = copes, thr = thr) # create object of class sumBrain res <- brainScores(copes = copes, alpha = 0.2, seed = 42, truncFrom = thr) res summary(res) # confidence bound for the number of true discoveries and the TDP within clusters out <- brainAnalysis(res, clusters = cl$clusters) out$summary
# simulate 20 copes with dimensions 10x10x10 set.seed(42) copes <- list() for(i in seq(20)){copes[[i]] <- array(rnorm(10^3, mean = -10, sd = 30), dim=c(10,10,10))} # cluster map where t scores are grater than 2.8, in absolute value thr <- 2.8 cl <- brainClusters(copes = copes, thr = thr) # create object of class sumBrain res <- brainScores(copes = copes, alpha = 0.2, seed = 42, truncFrom = thr) res summary(res) # confidence bound for the number of true discoveries and the TDP within clusters out <- brainAnalysis(res, clusters = cl$clusters) out$summary
This function uses permutation t-statistics/p-values to determine a true discovery guarantee for gene pathway analysis. It computes confidence bounds for the number of true discoveries and the true discovery proportion within each cluster. The bounds are simultaneous over all sets, and remain valid under post-hoc selection.
geneAnalysis(sumGene, pathways = NULL, nMax = 50, silent = FALSE)
geneAnalysis(sumGene, pathways = NULL, nMax = 50, silent = FALSE)
sumGene |
an object of class sumGene, as returned by the functions |
pathways |
list of character vectors containing gene names (one vector per pathway). If NULL, the whole gene set is considered. |
nMax |
maximum number of iterations per cluster. |
silent |
logical, |
geneAnalysis
returns a data frame containing, for each pathway,
size
: size
TD
: lower (1-alpha
)-confidence bound for the number of true discoveries
maxTD
: maximum value of TD
that could be found under convergence of the algorithm
TDP
: lower (1-alpha
)-confidence bound for the true discovery proportion
maxTD
: maximum value of TDP
that could be found under convergence of the algorithm.
Anna Vesely.
Goeman J. J. and Solari A. (2011). Multiple testing for exploratory research. Statistical Science, doi: 10.1214/11-STS356.
Vesely A., Finos L., and Goeman J. J. (2023). Permutation-based true discovery guarantee by sum tests. Journal of the Royal Statistical Society, Series B (Statistical Methodology), doi: 10.1093/jrsssb/qkad019.
Permutation statistics for gene expression: geneScores
, genePvals
# simulate 20 samples of 100 genes set.seed(42) expr <- matrix(c(rnorm(1000, mean = 0, sd = 10), rnorm(1000, mean = 13, sd = 10)), ncol = 20) rownames(expr) <- seq(100) labels <- rep(c(1,2), each = 10) # simulate pathways pathways <- lapply(seq(3), FUN = function(x) sample(rownames(expr), 3*x)) # create object of class sumGene res <- geneScores(expr = expr, labels = labels, alpha = 0.2, seed = 42) res summary(res) # confidence bound for the number of true discoveries and the TDP within pathways out <- geneAnalysis(res, pathways = pathways) out
# simulate 20 samples of 100 genes set.seed(42) expr <- matrix(c(rnorm(1000, mean = 0, sd = 10), rnorm(1000, mean = 13, sd = 10)), ncol = 20) rownames(expr) <- seq(100) labels <- rep(c(1,2), each = 10) # simulate pathways pathways <- lapply(seq(3), FUN = function(x) sample(rownames(expr), 3*x)) # create object of class sumGene res <- geneScores(expr = expr, labels = labels, alpha = 0.2, seed = 42) res summary(res) # confidence bound for the number of true discoveries and the TDP within pathways out <- geneAnalysis(res, pathways = pathways) out
This function computes p-value combinations for different permutations of gene expression data. A gene's p-value is calculated by performing the two-sample t test for the null hypothesis that the mean expression value is the same between two populations.
genePvals(expr, labels, alternative = "two.sided", alpha = 0.05, B = 200, seed = NULL, truncFrom = NULL, truncTo = 0.5, type = "vovk.wang", r = 0, rand = FALSE)
genePvals(expr, labels, alternative = "two.sided", alpha = 0.05, B = 200, seed = NULL, truncFrom = NULL, truncTo = 0.5, type = "vovk.wang", r = 0, rand = FALSE)
expr |
matrix where rows correspond to genes, and columns to samples. |
labels |
numeric/character vector with two levels, denoting the population of each sample. |
alternative |
direction of the alternative hypothesis ( |
alpha |
significance level. |
B |
number of permutations, including the identity. |
seed |
seed. |
truncFrom |
truncation parameter: values greater than |
truncTo |
truncation parameter: truncated values are set to |
type |
p-value combination among |
r |
parameter for Vovk and Wang's p-value combination. |
rand |
logical, |
A p-value p
is transformed as following.
Edgington: p
(Edgington, 1972)
Fisher: -2log(p)
(Fisher, 1925)
Pearson: 2log(1-p)
(Pearson, 1933)
Liptak: qnorm(1-p)
(Liptak, 1958; Stouffer et al., 1949)
Cauchy: tan[(0.5-p)pi]
with pi=3.142
(Liu and Xie, 2020)
Harmonic mean: 1/p
(Wilson, 2019)
Vovk and Wang: p^r
(log(p)
for r
=0) (Vovk and Wang, 2020)
An error message is returned if the transformation produces infinite values.
For Vovk and Wang, r=0
corresponds to Fisher, and r=-1
to the harmonic mean.
Truncation parameters should be such that truncTo
is not smaller than truncFrom
.
As Pearson's and Liptak's transformations produce infinite values in 1, for such methods
truncTo
should be strictly smaller than 1.
The significance level alpha
should be in the interval [1/B
, 1).
genePvals
returns an object of class sumGene
, containing
statistics
: numeric matrix of p-values, where columns correspond to genes, and rows to permutations.
The first permutation is the identity
alpha
: significance level
truncFrom
: transformed first truncation parameter
truncTo
: transformed second truncation parameter
Anna Vesely.
Goeman J. J. and Solari A. (2011). Multiple testing for exploratory research. Statistical Science, doi: 10.1214/11-STS356.
Vesely A., Finos L., and Goeman J. J. (2023). Permutation-based true discovery guarantee by sum tests. Journal of the Royal Statistical Society, Series B (Statistical Methodology), doi: 10.1093/jrsssb/qkad019.
Permutation statistics for gene expression using t scores: geneScores
True discovery guarantee for cluster analysis: geneAnalysis
# simulate 20 samples of 100 genes set.seed(42) expr <- matrix(c(rnorm(1000, mean = 0, sd = 10), rnorm(1000, mean = 13, sd = 10)), ncol = 20) rownames(expr) <- seq(100) labels <- rep(c(1,2), each = 10) # simulate pathways pathways <- lapply(seq(3), FUN = function(x) sample(rownames(expr), 3*x)) # create object of class sumGene res <- genePvals(expr = expr, labels = labels, alpha = 0.2, seed = 42, type = "liptak") res summary(res) # confidence bound for the number of true discoveries and the TDP within pathways out <- geneAnalysis(res, pathways = pathways) out
# simulate 20 samples of 100 genes set.seed(42) expr <- matrix(c(rnorm(1000, mean = 0, sd = 10), rnorm(1000, mean = 13, sd = 10)), ncol = 20) rownames(expr) <- seq(100) labels <- rep(c(1,2), each = 10) # simulate pathways pathways <- lapply(seq(3), FUN = function(x) sample(rownames(expr), 3*x)) # create object of class sumGene res <- genePvals(expr = expr, labels = labels, alpha = 0.2, seed = 42, type = "liptak") res summary(res) # confidence bound for the number of true discoveries and the TDP within pathways out <- geneAnalysis(res, pathways = pathways) out
This function computes t-scores for different permutations of gene expression data. A gene's score is calculated by performing the two-sample t test for the null hypothesis that the mean expression value is the same between two populations.
geneScores(expr, labels, alternative = "two.sided", alpha = 0.05, B = 200, seed = NULL, truncFrom = 3.2, truncTo = 0, squares = FALSE)
geneScores(expr, labels, alternative = "two.sided", alpha = 0.05, B = 200, seed = NULL, truncFrom = 3.2, truncTo = 0, squares = FALSE)
expr |
matrix where rows correspond to genes, and columns to samples. |
labels |
numeric/character vector with two levels, denoting the population of each sample. |
alternative |
direction of the alternative hypothesis ( |
alpha |
significance level. |
B |
number of permutations, including the identity. |
seed |
seed. |
truncFrom |
truncation parameter: values less extreme than |
truncTo |
truncation parameter: truncated values are set to |
squares |
logical, |
Truncation parameters should be such that truncTo
is not more extreme than truncFrom
.
The significance level alpha
should be in the interval [1/B
, 1).
geneScores
returns an object of class sumGene
, containing
statistics
: numeric matrix of scores, where columns correspond to genes, and rows to permutations.
The first permutation is the identity
alpha
: significance level
truncFrom
: transformed first truncation parameter
truncTo
: transformed second truncation parameter
Anna Vesely.
Goeman J. J. and Solari A. (2011). Multiple testing for exploratory research. Statistical Science, doi: 10.1214/11-STS356.
Vesely A., Finos L., and Goeman J. J. (2023). Permutation-based true discovery guarantee by sum tests. Journal of the Royal Statistical Society, Series B (Statistical Methodology), doi: 10.1093/jrsssb/qkad019.
Permutation statistics for gene expression using p-values: genePvals
True discovery guarantee for cluster analysis: geneAnalysis
# simulate 20 samples of 100 genes set.seed(42) expr <- matrix(c(rnorm(1000, mean = 0, sd = 10), rnorm(1000, mean = 13, sd = 10)), ncol = 20) rownames(expr) <- seq(100) labels <- rep(c(1,2), each = 10) # simulate pathways pathways <- lapply(seq(3), FUN = function(x) sample(rownames(expr), 3*x)) # create object of class sumGene res <- geneScores(expr = expr, labels = labels, alpha = 0.2, seed = 42) res summary(res) # confidence bound for the number of true discoveries and the TDP within pathways out <- geneAnalysis(res, pathways = pathways) out
# simulate 20 samples of 100 genes set.seed(42) expr <- matrix(c(rnorm(1000, mean = 0, sd = 10), rnorm(1000, mean = 13, sd = 10)), ncol = 20) rownames(expr) <- seq(100) labels <- rep(c(1,2), each = 10) # simulate pathways pathways <- lapply(seq(3), FUN = function(x) sample(rownames(expr), 3*x)) # create object of class sumGene res <- geneScores(expr = expr, labels = labels, alpha = 0.2, seed = 42) res summary(res) # confidence bound for the number of true discoveries and the TDP within pathways out <- geneAnalysis(res, pathways = pathways) out
This function uses permutation statistics/p-values to determine a true discovery guarantee for multiverse analysis, when studying one or more parameters of interest within a multiverse of models. It computes confidence bounds for the number of true discoveries and the true discovery proportion overall or within different groups. The bounds are simultaneous over all sets, and remain valid under post-hoc selection.
pimaAnalysis(obj, by = NULL, type = "sum", r = 0, alpha = 0.05, ...)
pimaAnalysis(obj, by = NULL, type = "sum", r = 0, alpha = 0.05, ...)
obj |
an object of class |
by |
name of grouping element among |
type |
combining function: |
r |
parameter for Vovk and Wang's p-value combination. |
alpha |
significance level. |
... |
further parameters of |
In the default by = NULL
, the procedure computes lower confidence bounds for the
number/proportion of significant effects (non-null coefficients) among all.
Other inputs of the argument by
return analogous bounds, defined by coefficient ("Coeff"
) or by model ("Model"
).
While the bounds are simultaneous over all possible groupings,
the combining function type
should be fixed in advance.
If truncation parameters are not specified among the further parameters, statistics/p-values are not truncated.
More generically, obj
can be any list containing:
Tspace
: data frame of statistics, where columns correspond to variables,
and rows to data transformations (e.g. permutations). The first transformation is the identity.
summary_table
: summary data frame where rows correspond to variables.
In this framework, the grouping element by
is the name of a column of summary_table
.
pimaAnalysis
returns a data frame containing a summary for each subset:
size
: number of considered coefficients
TD
: lower (1-alpha
)-confidence bound for the number of significant effects
TDP
: lower (1-alpha
)-confidence bound for the proportion of significant effects
Girardi P., Vesely A., Lakens D., Altoè G., Pastore M., Calcagnì A., and Finos L. (2024). Post-selection Inference in Multiverse Analysis (PIMA): An Inferential Framework Based on the Sign Flipping Score Test. Psychometrika, doi: 10.1007/s11336-024-09973-6.
Vesely A., Finos L., and Goeman J. J. (2023). Permutation-based true discovery guarantee by sum tests. Journal of the Royal Statistical Society, Series B (Statistical Methodology), doi: 10.1093/jrsssb/qkad019.
True discovery guarantees: sumStats
, sumPvals
# generate matrix of statistics for 2 coefficients X and Z within 3 models G <- simData(prop = 0.6, m = 6, B = 50, alpha = 0.4, p = FALSE, seed = 42) colnames(G) <- rep(c("X","Z"),3) # summary table summary_table <- data.frame( Model = rep(c("mod1","mod2","mod3"), each=2), Coeff = colnames(G) ) # list of Tspace and summary_table obj <- list(Tspace = as.data.frame(G), summary_table = summary_table) # significant effects overall (sum of test statistics) pimaAnalysis(obj, alpha = 0.4) # significant effects by coefficient (sum of test statistics) pimaAnalysis(obj, by = "Coeff", alpha = 0.4) # significant effects by model (Fisher's combination of p-values) pimaAnalysis(obj, by = "Model", type = "fisher", alpha = 0.4)
# generate matrix of statistics for 2 coefficients X and Z within 3 models G <- simData(prop = 0.6, m = 6, B = 50, alpha = 0.4, p = FALSE, seed = 42) colnames(G) <- rep(c("X","Z"),3) # summary table summary_table <- data.frame( Model = rep(c("mod1","mod2","mod3"), each=2), Coeff = colnames(G) ) # list of Tspace and summary_table obj <- list(Tspace = as.data.frame(G), summary_table = summary_table) # significant effects overall (sum of test statistics) pimaAnalysis(obj, alpha = 0.4) # significant effects by coefficient (sum of test statistics) pimaAnalysis(obj, by = "Coeff", alpha = 0.4) # significant effects by model (Fisher's combination of p-values) pimaAnalysis(obj, by = "Model", type = "fisher", alpha = 0.4)
This function simulates a matrix of permutation statistics, by performing a t test on normal data.
simData(prop, m, B = 200, rho = 0, n = 50, alpha = 0.05, pw = 0.8, p = TRUE, seed = NULL)
simData(prop, m, B = 200, rho = 0, n = 50, alpha = 0.05, pw = 0.8, p = TRUE, seed = NULL)
prop |
proportion of non-null hypotheses. |
m |
total number of variables. |
B |
number of permutations, including the identity. |
rho |
level of equicorrelation between pairs of variables. |
n |
number of observations. |
alpha |
significance level. |
pw |
power of the t test. |
p |
logical, |
seed |
seed. |
The function applies the one-sample two-sided t test to a matrix of simulated data,
for B
data permutations.
Data is obtained by simulating n
independent observations from a multivariate normal distribution,
where a proportion prop
of the variables has non-null mean.
This mean is such that the one-sample t test with significance level alpha
has power equal to pw
.
Each pair of distinct variables has equicorrelation rho
.
simData
returns a matrix where the B
rows correspond to permutations (the first is the identity),
and the m
columns correspond to variables.
The matrix contains p-values if p
is TRUE
, and t-scores otherwise.
The first columns (a proportion prop
) correspond to non-null hypotheses.
Anna Vesely.
True discovery guarantee: sumStats
, sumPvals
# generate matrix of p-values for 5 variables and 10 permutations G <- simData(prop = 0.6, m = 5, B = 10, alpha = 0.4, seed = 42) # subset of interest (variables 1 and 2) S <- c(1,2) # create object of class sumObj # combination: harmonic mean (Vovk and Wang with r = -1) res <- sumPvals(G, S, alpha = 0.4, r = -1) res summary(res) # lower confidence bound for the number of true discoveries in S discoveries(res) # lower confidence bound for the true discovery proportion in S tdp(res) # upper confidence bound for the false discovery proportion in S fdp(res)
# generate matrix of p-values for 5 variables and 10 permutations G <- simData(prop = 0.6, m = 5, B = 10, alpha = 0.4, seed = 42) # subset of interest (variables 1 and 2) S <- c(1,2) # create object of class sumObj # combination: harmonic mean (Vovk and Wang with r = -1) res <- sumPvals(G, S, alpha = 0.4, r = -1) res summary(res) # lower confidence bound for the number of true discoveries in S discoveries(res) # lower confidence bound for the true discovery proportion in S tdp(res) # upper confidence bound for the false discovery proportion in S fdp(res)
This function uses permutation p-values to determine confidence bounds for the number of true discoveries, the true discovery proportion and the false discovery proportion within a set of interest. The bounds are simultaneous over all sets, and remain valid under post-hoc selection.
sumPvals(G, S = NULL, alpha = 0.05, truncFrom = NULL, truncTo = 0.5, type = "vovk.wang", r = 0, nMax = 50)
sumPvals(G, S = NULL, alpha = 0.05, truncFrom = NULL, truncTo = 0.5, type = "vovk.wang", r = 0, nMax = 50)
G |
numeric matrix of p-values, where columns correspond to variables, and rows to data transformations (e.g. permutations). The first transformation is the identity. |
S |
vector of indices for the variables of interest (if not specified, all variables). |
alpha |
significance level. |
truncFrom |
truncation parameter: values greater than |
truncTo |
truncation parameter: truncated values are set to |
type |
p-value combination among |
r |
parameter for Vovk and Wang's p-value combination. |
nMax |
maximum number of iterations. |
A p-value p
is transformed as following.
Edgington: p
(Edgington, 1972)
Fisher: -2log(p)
(Fisher, 1925)
Pearson: 2log(1-p)
(Pearson, 1933)
Liptak: qnorm(1-p)
(Liptak, 1958; Stouffer et al., 1949)
Cauchy: tan[(0.5-p)pi]
with pi=3.142
(Liu and Xie, 2020)
Harmonic mean: 1/p
(Wilson, 2019)
Vovk and Wang: p^r
(log(p)
for r
=0) (Vovk and Wang, 2020)
An error message is returned if the transformation produces infinite values.
For Vovk and Wang, r=0
corresponds to Fisher, and r=-1
to the harmonic mean.
Truncation parameters should be such that truncTo
is not smaller than truncFrom
.
As Pearson's and Liptak's transformations produce infinite values in 1, for such methods
truncTo
should be strictly smaller than 1.
The significance level alpha
should be in the interval [1/B
, 1), where
B
is the number of data transformations (rows in G
).
sumPvals
returns an object of class sumObj
, containing
total
: total number of variables (columns in G
)
size
: size of S
alpha
: significance level
TD
: lower (1-alpha
)-confidence bound for the number of true discoveries in S
maxTD
: maximum value of TD
that could be found under convergence of the algorithm
iterations
: number of iterations of the algorithm
Anna Vesely.
Goeman J. J. and Solari A. (2011). Multiple testing for exploratory research. Statistical Science, doi: 10.1214/11-STS356.
Vesely A., Finos L., and Goeman J. J. (2023). Permutation-based true discovery guarantee by sum tests. Journal of the Royal Statistical Society, Series B (Statistical Methodology), doi: 10.1093/jrsssb/qkad019.
True discovery guarantee using generic statistics: sumStats
Access a sumObj
object: discoveries
, tdp
, fdp
# generate matrix of p-values for 5 variables and 10 permutations G <- simData(prop = 0.6, m = 5, B = 10, alpha = 0.4, seed = 42) # subset of interest (variables 1 and 2) S <- c(1,2) # create object of class sumObj # combination: harmonic mean (Vovk and Wang with r = -1) res <- sumPvals(G, S, alpha = 0.4, r = -1) res summary(res) # lower confidence bound for the number of true discoveries in S discoveries(res) # lower confidence bound for the true discovery proportion in S tdp(res) # upper confidence bound for the false discovery proportion in S fdp(res)
# generate matrix of p-values for 5 variables and 10 permutations G <- simData(prop = 0.6, m = 5, B = 10, alpha = 0.4, seed = 42) # subset of interest (variables 1 and 2) S <- c(1,2) # create object of class sumObj # combination: harmonic mean (Vovk and Wang with r = -1) res <- sumPvals(G, S, alpha = 0.4, r = -1) res summary(res) # lower confidence bound for the number of true discoveries in S discoveries(res) # lower confidence bound for the true discovery proportion in S tdp(res) # upper confidence bound for the false discovery proportion in S fdp(res)
This function uses p-values to determine confidence bounds for the number of true discoveries, the true discovery proportion and the false discovery proportion within a set of interest. The bounds are simultaneous over all sets, and remain valid under post-hoc selection.
sumPvalsPar(g, S = NULL, alpha = 0.05, type = "vovk.wang", r = 0, independence = NULL)
sumPvalsPar(g, S = NULL, alpha = 0.05, type = "vovk.wang", r = 0, independence = NULL)
g |
numeric vector of p-values. |
S |
vector of indices for the variables of interest (if not specified, all variables). |
alpha |
significance level. |
type |
p-value combination among |
r |
parameter for Vovk and Wang's p-value combination. |
independence |
logical, |
A p-value p
is transformed as following.
Fisher: -2log(p)
(Fisher, 1925)
Pearson: 2log(1-p)
(Pearson, 1933)
Liptak: qnorm(1-p)
(Liptak, 1958; Stouffer et al., 1949)
Cauchy: tan[(0.5-p)pi]
with pi=3.142
(Liu and Xie, 2020)
Harmonic mean: 1/p
(Wilson, 2019)
Vovk and Wang: p^r
(log(p)
for r
=0) (Vovk and Wang, 2020)
An error message is returned if the transformation produces infinite values.
For Vovk and Wang, r=-Inf
corresponds to the minimum p-value, r=Inf
to the maximum p-value,
r=0
to Fisher, and r=-1
to the harmonic mean.
Under independence, for Vovk and Wang the test is defined only
for r=0
and r=1
. Under general dependence, the test is defined only for
Fisher, the harmonic mean and Vovk and Wang.
For combinations that are not implemented, if the vector of critical values is known
the method can be applied through sumStatsPar
.
Please contact us to implement other known vectors of critical values that do not currently appear.
sumPvalsPar
returns an object of class sumObj
, containing
total
: total number of variables (length of g
)
size
: size of S
alpha
: significance level
TD
: lower (1-alpha
)-confidence bound for the number of true discoveries in S
maxTD
: maximum value of TD
that could be found under convergence of the algorithm
iterations
: number of iterations of the algorithm (NULL
)
Xu Chen.
Goeman J. J. and Solari A. (2011). Multiple testing for exploratory research. Statistical Science, doi: 10.1214/11-STS356.
Tian J., Chen X., Katsevich E., Goeman J. J. and Ramdas A. (2022). Large-scale simultaneous inference under dependence. Scandinavian Journal of Statistics, doi: 10.1111/sjos.12614.
True discovery guarantee using generic statistics (parametric): sumStatsPar
Access a sumObj
object: discoveries
, tdp
, fdp
# generate vector of p-values for 5 variables g <- as.vector(simData(prop = 0.6, m = 5, B = 1, alpha = 0.4, seed = 42)) # subset of interest (variables 1 and 2) S <- c(1,2) # create object of class sumObj # combination: harmonic mean under general dependence res <- sumPvalsPar(g, S, alpha = 0.4, type = "harmonic", independence = FALSE) res summary(res) # lower confidence bound for the number of true discoveries in S discoveries(res) # lower confidence bound for the true discovery proportion in S tdp(res) # upper confidence bound for the false discovery proportion in S fdp(res)
# generate vector of p-values for 5 variables g <- as.vector(simData(prop = 0.6, m = 5, B = 1, alpha = 0.4, seed = 42)) # subset of interest (variables 1 and 2) S <- c(1,2) # create object of class sumObj # combination: harmonic mean under general dependence res <- sumPvalsPar(g, S, alpha = 0.4, type = "harmonic", independence = FALSE) res summary(res) # lower confidence bound for the number of true discoveries in S discoveries(res) # lower confidence bound for the true discovery proportion in S tdp(res) # upper confidence bound for the false discovery proportion in S fdp(res)
This function uses generic permutation statistics to determine confidence bounds for the number of true discoveries, the true discovery proportion and the false discovery proportion within a set of interest. The bounds are simultaneous over all sets, and remain valid under post-hoc selection.
sumStats(G, S = NULL, alternative = "greater", alpha = 0.05, truncFrom = NULL, truncTo = NULL, nMax = 50)
sumStats(G, S = NULL, alternative = "greater", alpha = 0.05, truncFrom = NULL, truncTo = NULL, nMax = 50)
G |
numeric matrix of statistics, where columns correspond to variables, and rows to data transformations (e.g. permutations). The first transformation is the identity. |
S |
vector of indices for the variables of interest (if not specified, all variables). |
alternative |
direction of the alternative hypothesis ( |
alpha |
significance level. |
truncFrom |
truncation parameter: values less extreme than |
truncTo |
truncation parameter: truncated values are set to |
nMax |
maximum number of iterations. |
Truncation parameters should be such that truncTo
is not more extreme than truncFrom
.
The significance level alpha
should be in the interval [1/B
, 1), where
B
is the number of data transformations (rows in G
).
sumStats
returns an object of class sumObj
, containing
total
: total number of variables (columns in G
)
size
: size of S
alpha
: significance level
TD
: lower (1-alpha
)-confidence bound for the number of true discoveries in S
maxTD
: maximum value of TD
that could be found under convergence of the algorithm
iterations
: number of iterations of the algorithm
Anna Vesely.
Goeman J. J. and Solari A. (2011). Multiple testing for exploratory research. Statistical Science, doi: 10.1214/11-STS356.
Vesely A., Finos L., and Goeman J. J. (2023). Permutation-based true discovery guarantee by sum tests. Journal of the Royal Statistical Society, Series B (Statistical Methodology), doi: 10.1093/jrsssb/qkad019.
True discovery guarantee using p-values: sumPvals
Access a sumObj
object: discoveries
, tdp
, fdp
# generate matrix of t-scores for 5 variables and 10 permutations G <- simData(prop = 0.6, m = 5, B = 10, alpha = 0.4, p = FALSE, seed = 42) # subset of interest (variables 1 and 2) S <- c(1,2) # create object of class sumObj res <- sumStats(G, S, alpha = 0.4, truncFrom = 0.7, truncTo = 0) res summary(res) # lower confidence bound for the number of true discoveries in S discoveries(res) # lower confidence bound for the true discovery proportion in S tdp(res) # upper confidence bound for the false discovery proportion in S fdp(res)
# generate matrix of t-scores for 5 variables and 10 permutations G <- simData(prop = 0.6, m = 5, B = 10, alpha = 0.4, p = FALSE, seed = 42) # subset of interest (variables 1 and 2) S <- c(1,2) # create object of class sumObj res <- sumStats(G, S, alpha = 0.4, truncFrom = 0.7, truncTo = 0) res summary(res) # lower confidence bound for the number of true discoveries in S discoveries(res) # lower confidence bound for the true discovery proportion in S tdp(res) # upper confidence bound for the false discovery proportion in S fdp(res)
This function uses generic statistics and a suitable vector of critical values to determine confidence bounds for the number of true discoveries, the true discovery proportion and the false discovery proportion within a set of interest. The bounds are simultaneous over all sets, and remain valid under post-hoc selection.
sumStatsPar(g, S = NULL, alpha = 0.05, cvs)
sumStatsPar(g, S = NULL, alpha = 0.05, cvs)
g |
numeric vector of statistics. |
S |
vector of indices for the variables of interest (if not specified, all variables). |
alpha |
significance level. |
cvs |
numeric vector of critical values for summed statistics considering |
sumStatsPar
returns an object of class sumObj
, containing
total
: total number of variables (length of g
)
size
: size of S
alpha
: significance level
TD
: lower (1-alpha
)-confidence bound for the number of true discoveries in S
maxTD
: maximum value of TD
that could be found under convergence of the algorithm
iterations
: number of iterations of the algorithm (NULL
)
Xu Chen.
Goeman J. J. and Solari A. (2011). Multiple testing for exploratory research. Statistical Science, doi: 10.1214/11-STS356.
Tian J., Chen X., Katsevich E., Goeman J. J. and Ramdas A. (2022). Large-scale simultaneous inference under dependence. Scandinavian Journal of Statistics, doi: 10.1111/sjos.12614.
True discovery guarantee using p-values (parametric): sumPvalsPar
Access a sumObj
object: discoveries
, tdp
, fdp
# generate vector of statistics for 5 variables (Fisher transformation of p-values) g <- as.vector(simData(prop = 0.6, m = 5, B = 1, alpha = 0.4, seed = 42)) g <- -2 * log(g) # subset of interest (variables 1 and 2) S <- c(1,2) # vector of critical values cvs <- qchisq(p = 0.4, df = 2 * seq(5), lower.tail=FALSE) # create object of class sumObj res <- sumStatsPar(g, S, alpha = 0.4, cvs = cvs) res summary(res) # lower confidence bound for the number of true discoveries in S discoveries(res) # lower confidence bound for the true discovery proportion in S tdp(res) # upper confidence bound for the false discovery proportion in S fdp(res)
# generate vector of statistics for 5 variables (Fisher transformation of p-values) g <- as.vector(simData(prop = 0.6, m = 5, B = 1, alpha = 0.4, seed = 42)) g <- -2 * log(g) # subset of interest (variables 1 and 2) S <- c(1,2) # vector of critical values cvs <- qchisq(p = 0.4, df = 2 * seq(5), lower.tail=FALSE) # create object of class sumObj res <- sumStatsPar(g, S, alpha = 0.4, cvs = cvs) res summary(res) # lower confidence bound for the number of true discoveries in S discoveries(res) # lower confidence bound for the true discovery proportion in S tdp(res) # upper confidence bound for the false discovery proportion in S fdp(res)
These functions determine a lower confidence bound for the number of true discoveries, a lower confidence bound for the true discovery proportion (TDP), and an upper confidence bound for the false discovery proportion (FDP) within a set of interest. The bounds remain valid under post-hoc selection.
discoveries(object) tdp(object) fdp(object)
discoveries(object) tdp(object) fdp(object)
object |
an object of class |
discoveries
, tdp
and fdp
return a (1-alpha
)-confidence bound for the corresponding quantity in the subset.
Anna Vesely.
Create a sumObj
object: sumStats
, sumPvals
# generate matrix of p-values for 5 variables and 10 permutations G <- simData(prop = 0.6, m = 5, B = 10, alpha = 0.4, seed = 42) # subset of interest (variables 1 and 2) S <- c(1,2) # create object of class sumObj # combination: harmonic mean (Vovk and Wang with r = -1) res <- sumPvals(G, S, alpha = 0.4, r = -1) res summary(res) # lower confidence bound for the number of true discoveries in S discoveries(res) # lower confidence bound for the true discovery proportion in S tdp(res) # upper confidence bound for the false discovery proportion in S fdp(res)
# generate matrix of p-values for 5 variables and 10 permutations G <- simData(prop = 0.6, m = 5, B = 10, alpha = 0.4, seed = 42) # subset of interest (variables 1 and 2) S <- c(1,2) # create object of class sumObj # combination: harmonic mean (Vovk and Wang with r = -1) res <- sumPvals(G, S, alpha = 0.4, r = -1) res summary(res) # lower confidence bound for the number of true discoveries in S discoveries(res) # lower confidence bound for the true discovery proportion in S tdp(res) # upper confidence bound for the false discovery proportion in S fdp(res)