Title: | FDR Power |
---|---|
Description: | Computing Average and TPX Power under various BHFDR type sequential procedures. All of these procedures involve control of some summary of the distribution of the FDP, e.g. the proportion of discoveries which are false in a given experiment. The most widely known of these, the BH-FDR procedure, controls the FDR which is the mean of the FDP. A lesser known procedure, due to Lehmann and Romano, controls the FDX, or probability that the FDP exceeds a user provided threshold. This is less conservative than FWE control procedures but much more conservative than the BH-FDR proceudre. This package and the references supporting it introduce a new procedure for controlling the FDX which we call the BH-FDX procedure. This procedure iteratively identifies, given alpha and lower threshold delta, an alpha* less than alpha at which BH-FDR guarantees FDX control. This uses asymptotic approximation and is only slightly more conservative than the BH-FDR procedure. Likewise, we can think of the power in multiple testing experiments in terms of a summary of the distribution of the True Positive Proportion (TPP), the portion of tests truly non-null distributed that are called significant. The package will compute power, sample size or any other missing parameter required for power defined as (i) the mean of the TPP (average power) or (ii) the probability that the TPP exceeds a given value, lambda, (TPX power) via asymptotic approximation. All supplied theoretical results are also obtainable via simulation. The suggested approach is to narrow in on a design via the theoretical approaches and then make final adjustments/verify the results by simulation. The theoretical results are described in Izmirlian, G (2020) Statistics and Probability letters, "<doi:10.1016/j.spl.2020.108713>", and an applied paper describing the methodology with a simulation study is in preparation. See citation("pwrFDR"). |
Authors: | Grant Izmirlian [aut, cre] |
Maintainer: | Grant Izmirlian <[email protected]> |
License: | GPL (>= 2) |
Version: | 3.2.4 |
Built: | 2025-02-13 06:15:14 UTC |
Source: | https://github.com/cran/pwrFDR |
x %over% y = x/y when y!=0, equals 0 when y==0.
x %over% y
x %over% y
x , y
|
Numeric or complex vectors or objects that can be coerced to such. |
x/y when y!=0, otherwise 0.
Grant Izmirlian izmirlig at mail dot nih dot gov
Extracts the full argument list and call attribute from
an object of class pwr
, which is the result of a
call to pwrFDR
.
arg.vals(object)
arg.vals(object)
object |
An object of class 'pwr', which is the result of a call to |
A list with a call
component and one component for each of the
possible arguments, effect.size
, n.sample
,r.1
,
alpha
, N.tests
, lambda
, FDP.control.method
,
delta
, groups
, type
, grpj.per.grp1
,
method
and control
, with defaults filled in.
Grant Izmirlian <izmirlian at nih dot gov>
Izmirlian G. (2020) Strong consistency and asymptotic normality for quantities related to the Benjamini-Hochberg false discovery rate procedure. Statistics and Probability Letters; 108713, <doi:10.1016/j.spl.2020.108713>.
Izmirlian G. (2017) Average Power and -power in
Multiple Testing Scenarios when the Benjamini-Hochberg False
Discovery Rate Procedure is Used. <arXiv:1801.03989>
Jung S-H. (2005) Sample size for FDR-control in microarray data analysis. Bioinformatics; 21:3097-3104.
Liu P. and Hwang J-T. G. (2007) Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics; 23:739-746.
Lehmann E. L., Romano J. P.. Generalizations of the familywise error rate. Ann. Stat.. 2005;33(3):1138-1154.
Romano Joseph P., Shaikh Azeem M.. Stepup procedures for control of generalizations of the familywise error rate. Ann. Stat.. 2006;34(4):1850-1873.
rslt <- pwrFDR(effect.size = 0.79, n.sample = 46, r.1 = 2000/54675, alpha = 0.15, N.tests = 1000, FDP.control.method = "Auto") arg.vals(rslt)
rslt <- pwrFDR(effect.size = 0.79, n.sample = 46, r.1 = 2000/54675, alpha = 0.15, N.tests = 1000, FDP.control.method = "Auto") arg.vals(rslt)
backsolve.seFDPoalpha finds the missing argument, one of 'N.tests', 'r.1', 'n.sample' or 'effect size' giving the specified value of se[FDP]/alpha under the BH-FDR procedure.
backsolve.seTPPoavgpwr finds the missing argument, one of 'N.tests', 'r.1', 'n.sample' or 'effect size' giving the specified value of se[TPP]/average.power under the BH-FDR procedure.
backsolve.seFDPoalpha(seFDPoalpha, effect.size, n.sample, r.1, alpha, groups = 2, N.tests, type = "balanced", grpj.per.grp1 = 1, distopt = 1, rho, k.bs) backsolve.seTPPoavgpwr(seTPPoavgpwr, effect.size, n.sample, r.1, alpha, groups = 2, N.tests, type = "balanced", grpj.per.grp1 = 1, distopt = 1, rho, k.bs)
backsolve.seFDPoalpha(seFDPoalpha, effect.size, n.sample, r.1, alpha, groups = 2, N.tests, type = "balanced", grpj.per.grp1 = 1, distopt = 1, rho, k.bs) backsolve.seTPPoavgpwr(seTPPoavgpwr, effect.size, n.sample, r.1, alpha, groups = 2, N.tests, type = "balanced", grpj.per.grp1 = 1, distopt = 1, rho, k.bs)
seFDPoalpha |
In backsolve.seFDPoalpha, the user specified value of se[FDP]/alpha |
seTPPoavgpwr |
In backsolve.seTPPoavgpwr, the user specified value of se[TPP]/average.power |
effect.size |
The effect size (mean over standard deviation) for test statistics having non-zero means. Assumed to be a constant (in magnitude) over non-zero mean test statistics. |
n.sample |
The number of experimental replicates. Required for calculation of power |
r.1 |
The proportion of simultaneous tests that are non-centrally located |
alpha |
The false discovery rate (in the BH case) or the upper bound on the probability that the FDP exceeds delta (BHFDX and Romano case) |
groups |
The number of experimental groups to compare. Must be integral and >=1. The default value is 2. |
N.tests |
The number of simultaneous hypothesis tests. |
type |
A character string specifying, in the groups=2 case, whether the test is 'paired', 'balanced', or 'unbalanced' and in the case when groups >=3, whether the test is 'balanced' or 'unbalanced'. The default in all cases is 'balanced'. Left unspecified in the one sample (groups=1) case. |
grpj.per.grp1 |
Required when |
distopt |
Test statistic distribution in among null and alternatively distributed sub-populations. distopt=0 gives normal (2 groups), distop=1 gives t- (2 groups) and distopt=2 gives F- (2+ groups) |
rho |
This can be done under the assumption of tests that are correlated identically in pair within blocks of given size. |
k.bs |
When 'rho' is specified, the common block-size for correlated test statistics. |
A numeric vector having components
<missing argument> |
Value of missing argument giving required se[FDP]/alpha (backsolve.seFDPoalpha) or se[TPP]/average.power (backsolve.seTPPoavgpwr). |
average.power |
The average power at the given set of conditions |
se.VoR/se.ToM |
The standard error of the FDP (backsolve.seFDPoalpha) or standard error of the TPP (backsolve.seTPPoavgpwr). |
value |
Value returned by the solver. Should be near zero if a solution was found. |
Grant Izmirlian Jr <izmirlian at nih dot gov>
Izmirlian G. (2020) Strong consistency and asymptotic normality for quantities related to the Benjamini-Hochberg false discovery rate procedure. Statistics and Probability Letters; 108713, <doi:10.1016/j.spl.2020.108713>
Izmirlian G. (2017) Average Power and -power in
Multiple Testing Scenarios when the Benjamini-Hochberg False
Discovery Rate Procedure is Used. <arXiv:1801.03989>
Jung S-H. (2005) Sample size for FDR-control in microarray data analysis. Bioinformatics; 21:3097-3104.
Kluger D. M., Owen A. B. (2023) A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model. Bernoulli; xx:xxx-xxx.
Liu P. and Hwang J-T. G. (2007) Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics; 23:739-746.
Lehmann E. L., Romano J. P.. Generalizations of the familywise error rate. Ann. Stat.. 2005;33(3):1138-1154.
Romano Joseph P., Shaikh Azeem M.. Stepup procedures for control of generalizations of the familywise error rate. Ann. Stat.. 2006;34(4):1850-1873.
backsolve.seFDPoalpha(seFDPoalpha=0.50, n.sample=50, alpha=0.05, effect.size=0.8, r.1=0.20) backsolve.seTPPoavgpwr(seTPPoavgpwr=0.20, n.sample=30, alpha=0.05, effect.size=0.8, r.1=0.20)
backsolve.seFDPoalpha(seFDPoalpha=0.50, n.sample=50, alpha=0.05, effect.size=0.8, r.1=0.20) backsolve.seTPPoavgpwr(seTPPoavgpwr=0.20, n.sample=30, alpha=0.05, effect.size=0.8, r.1=0.20)
Creates a generic call to print.TableMonster which in turn calls xtable
basic.tmPrint(x, special = NULL, simple = FALSE, dbg = FALSE, ...)
basic.tmPrint(x, special = NULL, simple = FALSE, dbg = FALSE, ...)
x |
Any data.frame object. Here, the result of a call to pwrFDR. |
special |
Special arguments to print.TableMonster. See package documentation. |
simple |
The simplest use case |
dbg |
Set to a value >= 1 for debugging |
... |
Other arguments |
The value returned is an invisible version of the argument ‘x’.
Grant Izmirlian
pwrFDR
to allow power/sample
size calculation for multiple tests of ROC curve based hypothesis.
See details.
In hypothesis tests of TPR_1 vs TPR_0 at fixed FPR, or
FPR_1 vs FPR_0 at fixed TPR, this computes the optimal number
of controls per case. Required by es.ROC
cc.ROC(FPR0, FPR1 = NULL, TPR0, TPR1 = NULL, b = NULL)
cc.ROC(FPR0, FPR1 = NULL, TPR0, TPR1 = NULL, b = NULL)
FPR0 |
When the TPR is fixed, the FPR under the null. Otherwise the fixed FPR. |
FPR1 |
When the TPR is fixed, the FPR under the alternative. Otherwise left blank. |
TPR0 |
When the FPR is fixed, the TPR under the null. Otherwise the fixed TPR. |
TPR1 |
When the FPR is fixed, the TPR under the alternative. Otherwise left blank. |
b |
Nominal slope of the ROC at FPR0. Taken to be 1 by default. |
The optimal number of controls per case.
Grant Izmirlian <izmirlian at nih dot gov>
Pepe M. S., Feng Z, Janes, H Bossuyt P. M. and Potter J. D. Pivotal evaluation of the accuracy of a biomarker used for classification or prediction. Supplement. J Natl Cancer Inst 2008;100: 1432–1438
cc.ROC(FPR0=0.15, TPR0=0.80, TPR1=0.90)
cc.ROC(FPR0=0.15, TPR0=0.80, TPR1=0.90)
Computes the complimentary CDF for the significant call proportion, R_m/m via asymptotic approximation. Included here mainly for pedagogic purposes.
cCDF.Rom(u, effect.size, n.sample, r.1, alpha, delta, groups = 2, N.tests, type = c("paired", "balanced", "unbalanced"), grpj.per.grp1 = NULL, FDP.control.method = "BHFDR", distopt, control=list(tol=1e-08,max.iter=c(1000,20),sim.level=2,low.power.stop=TRUE, FDP.meth.thresh=FDP.cntl.mth.thrsh.def,verb=FALSE))
cCDF.Rom(u, effect.size, n.sample, r.1, alpha, delta, groups = 2, N.tests, type = c("paired", "balanced", "unbalanced"), grpj.per.grp1 = NULL, FDP.control.method = "BHFDR", distopt, control=list(tol=1e-08,max.iter=c(1000,20),sim.level=2,low.power.stop=TRUE, FDP.meth.thresh=FDP.cntl.mth.thrsh.def,verb=FALSE))
u |
A sorted vector of values on the interval [0, 1] for which the cCDF of R_m/m should be computed. |
effect.size |
The effect size (mean over standard deviation) for test statistics having non-zero means. Assumed to be a constant (in magnitude) over non-zero mean test statistics. |
n.sample |
The number of experimental replicates. Required for calculation of power |
r.1 |
The proportion of simultaneous tests that are non-centrally located |
alpha |
The false discovery rate (in the BH case) or the upper bound on the probability that the FDP exceeds delta (Romano case) |
delta |
If the "FDP.control.method" is set to 'Romano' or 'BHFDX', then the
user can set the exceedance thresh-hold for the FDP tail probability
control |
groups |
The number of experimental groups to compare. Must be integral and >=1. The default value is 2. |
N.tests |
The number of simultaneous hypothesis tests. |
type |
A character string specifying, in the groups=2 case, whether the test is 'paired', 'balanced', or 'unbalanced' and in the case when groups >=3, whether the test is 'balanced' or 'unbalanced'. The default in all cases is 'balanced'. Left unspecified in the one sample (groups=1) case. |
grpj.per.grp1 |
Required when |
FDP.control.method |
A character string specifying how the false discovery proportion (FDP) is to be
controlled. You may specify the whole word or any shortened uniquely
identifying truncation. |
distopt |
Test statistic distribution in among null and alternatively distributed sub-populations. distopt=0 gives normal (2 groups), distop=1 gives t- (2 groups) and distopt=2 gives F- (2+ groups) |
control |
Optionally, a list with components with the following
components: |
An object of class cdf
which contains components
call |
The call which produced the result |
cCDF.Rom |
A data frame with columns |
Grant Izmirlian <izmirlian at nih dot gov>
Izmirlian G. (2020) Strong consistency and asymptotic normality for quantities related to the Benjamini-Hochberg false discovery rate procedure. Statistics and Probability Letters; 108713, <doi:10.1016/j.spl.2020.108713>.
Izmirlian G. (2017) Average Power and -power in
Multiple Testing Scenarios when the Benjamini-Hochberg False
Discovery Rate Procedure is Used. <arXiv:1801.03989>
Jung S-H. (2005) Sample size for FDR-control in microarray data analysis. Bioinformatics; 21:3097-3104.
Kluger D. M., Owen A. B. (2023) A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model. Bernoulli; xx:xxx-xxx.
Liu P. and Hwang J-T. G. (2007) Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics; 23:739-746.
Lehmann E. L., Romano J. P.. Generalizations of the familywise error rate. Ann. Stat.. 2005;33(3):1138-1154.
Romano Joseph P., Shaikh Azeem M.. Stepup procedures for control of generalizations of the familywise error rate. Ann. Stat.. 2006;34(4):1850-1873.
library(pwrFDR) u <- seq(from=0,to=1,len=100000) rslt <- cCDF.Rom(u=u, effect.size=0.9, n.sample=70, r.1=0.05, alpha=0.15, N.tests=1000, FDP.control.method="Auto") ## plot the result with(rslt$cCDF.Rom, plot(u, cCDF.Rom, type="s")) ## compute the mean and median as a check DX <- function(x)c(x[1], diff(x)) .mean. <- with(rslt$cCDF.Rom, sum(cCDF.Rom*DX(u))) .median. <- with(rslt$cCDF.Rom, u[max(which(cCDF.Rom>0.5))])
library(pwrFDR) u <- seq(from=0,to=1,len=100000) rslt <- cCDF.Rom(u=u, effect.size=0.9, n.sample=70, r.1=0.05, alpha=0.15, N.tests=1000, FDP.control.method="Auto") ## plot the result with(rslt$cCDF.Rom, plot(u, cCDF.Rom, type="s")) ## compute the mean and median as a check DX <- function(x)c(x[1], diff(x)) .mean. <- with(rslt$cCDF.Rom, sum(cCDF.Rom*DX(u))) .median. <- with(rslt$cCDF.Rom, u[max(which(cCDF.Rom>0.5))])
Computes the complimentary CDF for the true positive proportion, T_m/M_m via asymptotic approximation. Included here mainly for pedagogic purposes.
cCDF.ToM(u, effect.size, n.sample, r.1, alpha, delta, groups = 2, N.tests, type = c("paired", "balanced", "unbalanced"), grpj.per.grp1 = NULL, FDP.control.method = "BHFDR", distopt, control=list(tol=1e-08,max.iter=c(1000,20),sim.level=2,low.power.stop=TRUE, FDP.meth.thresh=FDP.cntl.mth.thrsh.def,verb=FALSE))
cCDF.ToM(u, effect.size, n.sample, r.1, alpha, delta, groups = 2, N.tests, type = c("paired", "balanced", "unbalanced"), grpj.per.grp1 = NULL, FDP.control.method = "BHFDR", distopt, control=list(tol=1e-08,max.iter=c(1000,20),sim.level=2,low.power.stop=TRUE, FDP.meth.thresh=FDP.cntl.mth.thrsh.def,verb=FALSE))
u |
A sorted vector of values on the interval [0, 1] for which the cCDF of T_m/M_m should be computed. |
effect.size |
The effect size (mean over standard deviation) for test statistics having non-zero means. Assumed to be a constant (in magnitude) over non-zero mean test statistics. |
n.sample |
The number of experimental replicates. Required for calculation of power |
r.1 |
The proportion of simultaneous tests that are non-centrally located |
alpha |
The false discovery rate (in the BH case) or the upper bound on the probability that the FDP exceeds delta (Romano case) |
delta |
If the "FDP.control.method" is set to 'Romano' or 'BHFDX', then the
user can set the exceedance thresh-hold for the FDP tail probability
control |
groups |
The number of experimental groups to compare. Must be integral and >=1. The default value is 2. |
N.tests |
The number of simultaneous hypothesis tests. |
type |
A character string specifying, in the groups=2 case, whether the test is 'paired', 'balanced', or 'unbalanced' and in the case when groups >=3, whether the test is 'balanced' or 'unbalanced'. The default in all cases is 'balanced'. Left unspecified in the one sample (groups=1) case. |
grpj.per.grp1 |
Required when |
FDP.control.method |
A character string specifying how the false discovery proportion (FDP) is to be
controlled. You may specify the whole word or any shortened uniquely
identifying truncation. |
distopt |
Test statistic distribution in among null and alternatively distributed sub-populations. distopt=0 gives normal (2 groups), distop=1 gives t- (2 groups) and distopt=2 gives F- (2+ groups) |
control |
Optionally, a list with components with the following
components: |
An object of class cdf
which contains components
call |
The call which produced the result |
cCDF.ToM |
A data frame with columns |
Grant Izmirlian <izmirlian at nih dot gov>
Izmirlian G. (2020) Strong consistency and asymptotic normality for quantities related to the Benjamini-Hochberg false discovery rate procedure. Statistics and Probability Letters; 108713, <doi:10.1016/j.spl.2020.108713>.
Izmirlian G. (2017) Average Power and -power in
Multiple Testing Scenarios when the Benjamini-Hochberg False
Discovery Rate Procedure is Used. <arXiv:1801.03989>
Jung S-H. (2005) Sample size for FDR-control in microarray data analysis. Bioinformatics; 21:3097-3104.
Kluger D. M., Owen A. B. (2023) A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model. Bernoulli; xx:xxx-xxx.
Liu P. and Hwang J-T. G. (2007) Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics; 23:739-746.
Lehmann E. L., Romano J. P.. Generalizations of the familywise error rate. Ann. Stat.. 2005;33(3):1138-1154.
Romano Joseph P., Shaikh Azeem M.. Stepup procedures for control of generalizations of the familywise error rate. Ann. Stat.. 2006;34(4):1850-1873.
library(pwrFDR) u <- seq(from=0,to=1,len=100000) rslt <- cCDF.ToM(u=u, effect.size=0.9, n.sample=70, r.1=0.05, alpha=0.15, N.tests=1000, FDP.control.method="Auto") ## plot the result with(rslt$cCDF.ToM, plot(u, cCDF.ToM, type="s")) ## compute the mean and median as a check DX <- function(x)c(x[1], diff(x)) .mean. <- with(rslt$cCDF.ToM, sum(cCDF.ToM*DX(u))) .median. <- with(rslt$cCDF.ToM, u[max(which(cCDF.ToM>0.5))])
library(pwrFDR) u <- seq(from=0,to=1,len=100000) rslt <- cCDF.ToM(u=u, effect.size=0.9, n.sample=70, r.1=0.05, alpha=0.15, N.tests=1000, FDP.control.method="Auto") ## plot the result with(rslt$cCDF.ToM, plot(u, cCDF.ToM, type="s")) ## compute the mean and median as a check DX <- function(x)c(x[1], diff(x)) .mean. <- with(rslt$cCDF.ToM, sum(cCDF.ToM*DX(u))) .median. <- with(rslt$cCDF.ToM, u[max(which(cCDF.ToM>0.5))])
Computes the complimentary CDF for the false discovery proportion, V_m/R_m via asymptotic approximation. Included here mainly for pedagogic purposes.
cCDF.VoR(u, effect.size, n.sample, r.1, alpha, delta, groups = 2, N.tests, type = c("paired", "balanced", "unbalanced"), grpj.per.grp1 = NULL, FDP.control.method = "BHFDR", distopt, control=list(tol=1e-08,max.iter=c(1000,20),sim.level=2,low.power.stop=TRUE, FDP.meth.thresh=FDP.cntl.mth.thrsh.def,verb=FALSE))
cCDF.VoR(u, effect.size, n.sample, r.1, alpha, delta, groups = 2, N.tests, type = c("paired", "balanced", "unbalanced"), grpj.per.grp1 = NULL, FDP.control.method = "BHFDR", distopt, control=list(tol=1e-08,max.iter=c(1000,20),sim.level=2,low.power.stop=TRUE, FDP.meth.thresh=FDP.cntl.mth.thrsh.def,verb=FALSE))
u |
A sorted vector of values on the interval [0, 1] for which the cCDF of T_m/M_m should be computed. |
effect.size |
The effect size (mean over standard deviation) for test statistics having non-zero means. Assumed to be a constant (in magnitude) over non-zero mean test statistics. |
n.sample |
The number of experimental replicates. Required for calculation of power |
r.1 |
The proportion of simultaneous tests that are non-centrally located |
alpha |
The false discovery rate (in the BH case) or the upper bound on the probability that the FDP exceeds delta (Romano case) |
delta |
If the "FDP.control.method" is set to 'Romano' or 'BHFDX', then the
user can set the exceedance thresh-hold for the FDP tail probability
control |
groups |
The number of experimental groups to compare. Must be integral and >=1. The default value is 2. |
N.tests |
The number of simultaneous hypothesis tests. |
type |
A character string specifying, in the groups=2 case, whether the test is 'paired', 'balanced', or 'unbalanced' and in the case when groups >=3, whether the test is 'balanced' or 'unbalanced'. The default in all cases is 'balanced'. Left unspecified in the one sample (groups=1) case. |
grpj.per.grp1 |
Required when |
FDP.control.method |
A character string specifying how the false discovery proportion (FDP) is to be
controlled. You may specify the whole word or any shortened uniquely
identifying truncation. |
distopt |
Test statistic distribution in among null and alternatively distributed sub-populations. distopt=0 gives normal (2 groups), distop=1 gives t- (2 groups) and distopt=2 gives F- (2+ groups) |
control |
Optionally, a list with components with the following
components: |
An object of class cdf
which contains components
call |
The call which produced the result |
cCDF.VoR |
A data frame with columns |
Grant Izmirlian <izmirlian at nih dot gov>
Izmirlian G. (2020) Strong consistency and asymptotic normality for quantities related to the Benjamini-Hochberg false discovery rate procedure. Statistics and Probability Letters; 108713, <doi:10.1016/j.spl.2020.108713>.
Izmirlian G. (2017) Average Power and -power in
Multiple Testing Scenarios when the Benjamini-Hochberg False
Discovery Rate Procedure is Used. <arXiv:1801.03989>
Jung S-H. (2005) Sample size for FDR-control in microarray data analysis. Bioinformatics; 21:3097-3104.
Kluger D. M., Owen A. B. (2023) A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model. Bernoulli; xx:xxx-xxx.
Liu P. and Hwang J-T. G. (2007) Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics; 23:739-746.
Lehmann E. L., Romano J. P.. Generalizations of the familywise error rate. Ann. Stat.. 2005;33(3):1138-1154.
Romano Joseph P., Shaikh Azeem M.. Stepup procedures for control of generalizations of the familywise error rate. Ann. Stat.. 2006;34(4):1850-1873.
library(pwrFDR) u <- seq(from=0,to=1,len=100000) rslt <- cCDF.VoR(u=u, effect.size=0.9, n.sample=70, r.1=0.05, alpha=0.15, N.tests=1000, FDP.control.method="Auto") ## plot the result with(rslt$cCDF.VoR, plot(u, cCDF.VoR, type="s")) ## compute the mean and median as a check DX <- function(x)c(x[1], diff(x)) .mean. <- with(rslt$cCDF.VoR, sum(cCDF.VoR*DX(u))) .median. <- with(rslt$cCDF.VoR, u[max(which(cCDF.VoR>0.5))])
library(pwrFDR) u <- seq(from=0,to=1,len=100000) rslt <- cCDF.VoR(u=u, effect.size=0.9, n.sample=70, r.1=0.05, alpha=0.15, N.tests=1000, FDP.control.method="Auto") ## plot the result with(rslt$cCDF.VoR, plot(u, cCDF.VoR, type="s")) ## compute the mean and median as a check DX <- function(x)c(x[1], diff(x)) .mean. <- with(rslt$cCDF.VoR, sum(cCDF.VoR*DX(u))) .median. <- with(rslt$cCDF.VoR, u[max(which(cCDF.VoR>0.5))])
Computes the CDF of the pooled population p-values under the mixture model, e.g. the p-values are i.i.d. with CDF a mixture between a uniform (CDF in the null distributed population) and a concave function (CDF in the non-null distributed population).
CDF.Pval(u, effect.size, n.sample, r.1, groups=2, type="balanced", grpj.per.grp1=1, distopt, control)
CDF.Pval(u, effect.size, n.sample, r.1, groups=2, type="balanced", grpj.per.grp1=1, distopt, control)
u |
Argument of the CDF. Result will be Pr( P_i <= u ) |
effect.size |
The effect size (mean over standard deviation) for test statistics having non-zero means. Assumed to be a constant (in magnitude) over non-zero mean test statistics. |
n.sample |
The number of experimental replicates. |
r.1 |
The proportion of all test statistics that are distributed under HA. |
groups |
The number of experimental groups to compare. Default value is 2. |
type |
A character string specifying, in the groups=2 case, whether the test is 'paired', 'balanced', or 'unbalanced' and in the case when groups >=3, whether the test is 'balanced' or 'unbalanced'. The default in all cases is 'balanced'. Left unspecified in the one sample (groups=1) case. |
grpj.per.grp1 |
Required when |
distopt |
Test statistic distribution in among null and alternatively distributed sub-populations. distopt=0 gives normal (2 groups), distop=1 gives t- (2 groups) and distopt=2 gives F- (2+ groups) |
control |
Optionally, a list with components with the following components: 'groups', used when distop=3 (F-dist), specifying number of groups. 'tol' is a convergence criterion used in iterative methods which is set to 1e-8 by default 'max.iter' is an iteration limit, set to 1000 by default |
Computes the CDF of the pooled population p-values under the mixture model, e.g. the p-values are i.i.d. with CDF a mixture between a uniform (CDF in the null distributed population) and a concave function (CDF in the non-null distributed population). If Fc_0 is the cCDF of a test statistic under H0 and Fc_A is the cCDF of a test statistic under HA then the CDF of the P-values is
G(u) = (1-r) u + r Fc_A(Fc_0^{-1}(u))
The limiting positve call fraction, lim_m V_m/m = gamma (a.s.) is the solution to the equation
G( gamma alpha) = gamma
where alpha is the nominal FDR.
A list with components
call |
The call which produced the result |
u |
The argument that was passed to the function |
CDF.Pval |
The value of the CDF |
Grant Izmirlian <izmirlian at nih dot gov>
Izmirlian G. (2020) Strong consistency and asymptotic normality for quantities related to the Benjamini-Hochberg false discovery rate procedure. Statistics and Probability Letters; 108713, <doi:10.1016/j.spl.2020.108713>.
Izmirlian G. (2017) Average Power and -power in
Multiple Testing Scenarios when the Benjamini-Hochberg False
Discovery Rate Procedure is Used. arXiv:1801.03989
Genovese, C. and L. Wasserman. (2004) A stochastic process approach to false discovery control. Annals of Statistics. 32 (3), 1035-1061.
## First calculate an average power for a given set of parameters rslt.avgp <- pwrFDR(effect.size=0.79, n.sample=46, r.1=2000/54675, alpha=0.15) ## Now verify that G( gamma alpha ) = gamma gma <- rslt.avgp$gamma alpha <- rslt.avgp$call$alpha G.gma.a <- CDF.Pval(u=gma*alpha, r.1=2000/54675, effect.size=0.79, n.sample=46)$CDF.Pval$CDF.Pval c(G.of.gamma.alpha=G.gma.a, gamma=gma)
## First calculate an average power for a given set of parameters rslt.avgp <- pwrFDR(effect.size=0.79, n.sample=46, r.1=2000/54675, alpha=0.15) ## Now verify that G( gamma alpha ) = gamma gma <- rslt.avgp$gamma alpha <- rslt.avgp$call$alpha G.gma.a <- CDF.Pval(u=gma*alpha, r.1=2000/54675, effect.size=0.79, n.sample=46)$CDF.Pval$CDF.Pval c(G.of.gamma.alpha=G.gma.a, gamma=gma)
Calculates the fixed point for the Romano procedure, e.g. finds u which solves u = G( psi(u, d) a) where G is the common p-value CDF, and psi(u, d) = u d/(1 - (1-a) u). Essentially an internal function and included at the user level for pedagogic purposes.
CDF.Pval.apsi.eq.u(effect.size, n.sample, r.1, alpha, delta, groups, type, grpj.per.grp1, distopt, control)
CDF.Pval.apsi.eq.u(effect.size, n.sample, r.1, alpha, delta, groups, type, grpj.per.grp1, distopt, control)
effect.size |
The effect size (mean over standard deviation) for test statistics having non-zero means. Assumed to be a constant (in magnitude) over non-zero mean test statistics. |
n.sample |
The number of experimental replicates. Required for calculation of power |
r.1 |
The proportion of simultaneous tests that are non-centrally located |
alpha |
The upper bound on the probability that the FDP exceeds delta. |
delta |
The exceedance thresh-hold for the FDP tail probability control method
(BHFDX or Romano) |
groups |
The number of experimental groups to compare. Must be integral and >=1. The default value is 2. |
type |
A character string specifying, in the groups=2 case, whether the test is 'paired', 'balanced', or 'unbalanced' and in the case when groups >=3, whether the test is 'balanced' or 'unbalanced'. The default in all cases is 'balanced'. Left unspecified in the one sample (groups=1) case. |
grpj.per.grp1 |
Required when |
distopt |
Test statistic distribution in among null and alternatively distributed sub-populations. distopt=0 gives normal (2 groups), distop=1 gives t- (2 groups) and distopt=2 gives F- (2+ groups) |
control |
Optionally, a list with components with the following
components: |
An object of class cdf
which contains components
call |
The call which produced the result |
gamma |
The fixed point for the Romano method. |
Grant Izmirlian <izmirlian at nih dot gov>
Izmirlian G. (2020) Strong consistency and asymptotic normality for quantities related to the Benjamini-Hochberg false discovery rate procedure. Statistics and Probability Letters; 108713, <doi:10.1016/j.spl.2020.108713>.
Izmirlian G. (2017) Average Power and -power in
Multiple Testing Scenarios when the Benjamini-Hochberg False
Discovery Rate Procedure is Used. <arXiv:1801.03989>
Jung S-H. (2005) Sample size for FDR-control in microarray data analysis. Bioinformatics; 21:3097-3104.
Liu P. and Hwang J-T. G. (2007) Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics; 23:739-746.
Lehmann E. L., Romano J. P.. Generalizations of the familywise error rate. Ann. Stat.. 2005;33(3):1138-1154.
Romano Joseph P., Shaikh Azeem M.. Stepup procedures for control of generalizations of the familywise error rate. Ann. Stat.. 2006;34(4):1850-1873.
## An example showing that the Romano method is more conservative than the BHFDX method ## which is in turn more conservative than the BH-FDR method based upon ordering of the ## significant call proportions, R_m/m ## First find alpha.star for the BH-CLT method at level alpha=0.15 a.st.BHFDX <-controlFDP(effect.size=0.8,r.1=0.05,N.tests=1000,n.sample=70,alpha=0.15)$alpha.star ## now find the significant call fraction under the BH-FDR method at level alpha=0.15 gamma.BHFDR <- CDF.Pval.au.eq.u(effect.size = 0.8, n.sample = 70, r.1 = 0.05, alpha=0.15) ## now find the significant call fraction under the Romano method at level alpha=0.15 gamma.romano <- CDF.Pval.apsi.eq.u(effect.size = 0.8, n.sample = 70, r.1 = 0.05, alpha=0.15) ## now find the significant call fraction under the BH-CLT method at level alpha=0.15 gamma.BHFDX <- CDF.Pval.au.eq.u(effect.size = 0.8, n.sample = 70, r.1 = 0.05, alpha=a.st.BHFDX)
## An example showing that the Romano method is more conservative than the BHFDX method ## which is in turn more conservative than the BH-FDR method based upon ordering of the ## significant call proportions, R_m/m ## First find alpha.star for the BH-CLT method at level alpha=0.15 a.st.BHFDX <-controlFDP(effect.size=0.8,r.1=0.05,N.tests=1000,n.sample=70,alpha=0.15)$alpha.star ## now find the significant call fraction under the BH-FDR method at level alpha=0.15 gamma.BHFDR <- CDF.Pval.au.eq.u(effect.size = 0.8, n.sample = 70, r.1 = 0.05, alpha=0.15) ## now find the significant call fraction under the Romano method at level alpha=0.15 gamma.romano <- CDF.Pval.apsi.eq.u(effect.size = 0.8, n.sample = 70, r.1 = 0.05, alpha=0.15) ## now find the significant call fraction under the BH-CLT method at level alpha=0.15 gamma.BHFDX <- CDF.Pval.au.eq.u(effect.size = 0.8, n.sample = 70, r.1 = 0.05, alpha=a.st.BHFDX)
Function which solves the implicit equation u = G( u alpha) where G is the pooled P-value CDF and alpha is the FDR
CDF.Pval.au.eq.u(effect.size, n.sample, r.1, alpha, groups, type, grpj.per.grp1, distopt, control)
CDF.Pval.au.eq.u(effect.size, n.sample, r.1, alpha, groups, type, grpj.per.grp1, distopt, control)
effect.size |
The per statistic effect size |
n.sample |
The per statistic sample size |
r.1 |
The proportion of Statistics distributed according to the alternative distribution |
alpha |
The false discovery rate. |
groups |
Number of experimental groups from which the test statistic is calculated |
type |
A character string specifying, in the groups=2 case, whether the test is 'paired', 'balanced', or 'unbalanced' and in the case when groups >=3, whether the test is 'balanced' or 'unbalanced'. The default in all cases is 'balanced'. Left unspecified in the one sample (groups=1) case. |
grpj.per.grp1 |
Required when |
distopt |
Test statistic distribution in among null and alternatively distributed sub-populations. distopt=0 gives normal (2 groups), distop=1 gives t- (2 groups) and distopt=2 gives F- (2+ groups) |
control |
Optionally, a list with components with the following components: 'groups', used when distop=3 (F-dist), specifying number of groups. 'max.iter' is an iteration limit, set to 1000 by default |
A list with a single component,
gamma |
The solution of the implicit equation u = G( u alpha), where G is the pooled P-value CDF. This represents the infinite tests limiting proportion of hypothesis tests that are called significant by the BH-FDR procedure at alpha. |
Grant Izmirlian <izmirlian at nih dot gov>
Izmirlian G. (2020) Strong consistency and asymptotic normality for quantities related to the Benjamini-Hochberg false discovery rate procedure. Statistics and Probability Letters; 108713, <doi:10.1016/j.spl.2020.108713>.
Izmirlian G. (2017) Average Power and -power in
Multiple Testing Scenarios when the Benjamini-Hochberg False
Discovery Rate Procedure is Used. <arXiv:1801.03989>
Jung S-H. (2005) Sample size for FDR-control in microarray data analysis. Bioinformatics; 21:3097-3104.
Liu P. and Hwang J-T. G. (2007) Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics; 23:739-746.
## An example showing that the Romano method is more conservative than the BHFDX method ## which is in turn more conservative than the BH-FDR method based upon ordering of the ## significant call proportions, R_m/m ## First find alpha.star for the BH-CLT method at level alpha=0.15 a.st.BHFDX <-controlFDP(effect.size=0.8,r.1=0.05,N.tests=1000,n.sample=70,alpha=0.15)$alpha.star ## now find the significant call fraction under the BH-FDR method at level alpha=0.15 gamma.BHFDR <- CDF.Pval.au.eq.u(effect.size = 0.8, n.sample = 70, r.1 = 0.05, alpha=0.15) ## now find the significant call fraction under the Romano method at level alpha=0.15 gamma.romano <- CDF.Pval.apsi.eq.u(effect.size = 0.8, n.sample = 70, r.1 = 0.05, alpha=0.15) ## now find the significant call fraction under the BH-CLT method at level alpha=0.15 gamma.BHFDX <- CDF.Pval.au.eq.u(effect.size = 0.8, n.sample = 70, r.1 = 0.05, alpha=a.st.BHFDX)
## An example showing that the Romano method is more conservative than the BHFDX method ## which is in turn more conservative than the BH-FDR method based upon ordering of the ## significant call proportions, R_m/m ## First find alpha.star for the BH-CLT method at level alpha=0.15 a.st.BHFDX <-controlFDP(effect.size=0.8,r.1=0.05,N.tests=1000,n.sample=70,alpha=0.15)$alpha.star ## now find the significant call fraction under the BH-FDR method at level alpha=0.15 gamma.BHFDR <- CDF.Pval.au.eq.u(effect.size = 0.8, n.sample = 70, r.1 = 0.05, alpha=0.15) ## now find the significant call fraction under the Romano method at level alpha=0.15 gamma.romano <- CDF.Pval.apsi.eq.u(effect.size = 0.8, n.sample = 70, r.1 = 0.05, alpha=0.15) ## now find the significant call fraction under the BH-CLT method at level alpha=0.15 gamma.BHFDX <- CDF.Pval.au.eq.u(effect.size = 0.8, n.sample = 70, r.1 = 0.05, alpha=a.st.BHFDX)
Computes the CDF of p-values for test statistics distribted under HA.
CDF.Pval.HA(u, effect.size, n.sample, r.1, groups = 2, type="balanced", grpj.per.grp1=1, distopt, control)
CDF.Pval.HA(u, effect.size, n.sample, r.1, groups = 2, type="balanced", grpj.per.grp1=1, distopt, control)
u |
Argument of the CDF. Result will be Pr( P_i <= u ) |
effect.size |
The effect size (mean over standard deviation) for test statistics having non-zero means. Assumed to be a constant (in magnitude) over non-zero mean test statistics. |
n.sample |
The number of experimental replicates. |
r.1 |
The proportion of all test statistics that are distributed under HA. |
groups |
The number of experimental groups to compare. Default value is 2. |
type |
A character string specifying, in the groups=2 case, whether the test is 'paired', 'balanced', or 'unbalanced' and in the case when groups >=3, whether the test is 'balanced' or 'unbalanced'. The default in all cases is 'balanced'. Left unspecified in the one sample (groups=1) case. |
grpj.per.grp1 |
Required when |
distopt |
Test statistic distribution in among null and alternatively distributed sub-populations. distopt=0 gives normal (2 groups), distop=1 gives t- (2 groups) and distopt=2 gives F- (2+ groups) |
control |
Optionally, a list with components with the following components: 'groups', used when distop=3 (F-dist), specifying number of groups. 'tol' is a convergence criterion used in iterative methods which is set to 1e-8 by default 'max.iter' is an iteration limit, set to 20 for function iteration and 1000 for all others by default 'distop', specifying the distribution family of the central and non-centrally located sub-populations. =1 gives normal (2 groups) =2 gives t- (2 groups) and =3 gives F- (2+ groups) |
Computes the CDF of p-values for test statistics distribted under HA. If Fc_0 is the cCDF of a test statistic under H0 and Fc_A is the cCDF of a test statistic under HA then the CDF of a P-value for a test statistic distributed under HA is
G_A(u) = Fc_A(Fc_0^{-1}(u))
The limiting true positive fraction is the infinite simultaneous tests average power,
lim_m T_m/M_m = average.power (a.s.),
which is used to approximate the average power for finite 'm', is G_1 at gamma alpha:
G_1( gamma alpha) = average.pwer
where alpha is the nominal FDR and gamma = lim_m R_m/m (a.s.) is the limiting positive call fraction.
A list with components
call |
The call which produced the result |
u |
The argument that was passed to the function |
CDF.Pval.HA |
The value of the CDF |
Grant Izmirlian <izmirlian at nih dot gov>
Izmirlian G. (2020) Strong consistency and asymptotic normality for quantities related to the Benjamini-Hochberg false discovery rate procedure. Statistics and Probability Letters; 108713, <doi:10.1016/j.spl.2020.108713>.
Izmirlian G. (2017) Average Power and -power in
Multiple Testing Scenarios when the Benjamini-Hochberg False
Discovery Rate Procedure is Used. <arXiv:1801.03989>
Genovese, C. and L. Wasserman. (2004) A stochastic process approach to false discovery control. Annals of Statistics. 32 (3), 1035-1061.
## First calculate an average power for a given set of parameters rslt.avgp <- pwrFDR(effect.size=0.79, n.sample=42, r.1=0.05, alpha=0.15) ## Now verify that G_A( gamma f ) = average.power gma <- rslt.avgp$gamma alpha <- rslt.avgp$call$alpha GA.gma.alpha <- CDF.Pval.HA(u=gma*alpha, r.1=0.05, effect.size=0.79, n.sample=42) c(G.gm.alpha=GA.gma.alpha$CDF.Pval.HA$CDF.Pval.HA, average.power=rslt.avgp$average.power)
## First calculate an average power for a given set of parameters rslt.avgp <- pwrFDR(effect.size=0.79, n.sample=42, r.1=0.05, alpha=0.15) ## Now verify that G_A( gamma f ) = average.power gma <- rslt.avgp$gamma alpha <- rslt.avgp$call$alpha GA.gma.alpha <- CDF.Pval.HA(u=gma*alpha, r.1=0.05, effect.size=0.79, n.sample=42) c(G.gm.alpha=GA.gma.alpha$CDF.Pval.HA$CDF.Pval.HA, average.power=rslt.avgp$average.power)
Helper function for the BHFDX FDP control method. Calculates a reduced FDR required to bound the the false discovery proportion in probability using asymptotic approximation.
controlFDP(effect.size, n.sample, r.1, alpha, delta, groups = 2, N.tests, type, grpj.per.grp1, corr.struct, control, formula, data, distopt)
controlFDP(effect.size, n.sample, r.1, alpha, delta, groups = 2, N.tests, type, grpj.per.grp1, corr.struct, control, formula, data, distopt)
effect.size |
The effect size (mean over standard deviation) for test statistics having non-zero means. Assumed to be a constant (in magnitude) over non-zero mean test statistics. |
n.sample |
The number of experimental replicates. |
r.1 |
The proportion of simultaneous tests that are non-centrally located |
alpha |
The upper bound on the probability that the FDP exceeds delta. |
delta |
The exceedance thresh-hold for the FDP tail probability control method
(BHFDX or Romano) |
groups |
The number of experimental groups to compare. Default value is 2. |
N.tests |
The number of simultaneous hypothesis tests. |
type |
A character string specifying, in the groups=2 case, whether the test is 'paired', 'balanced', or 'unbalanced' and in the case when groups >=3, whether the test is 'balanced' or 'unbalanced'. The default in all cases is 'balanced'. Left unspecified in the one sample (groups=1) case. |
grpj.per.grp1 |
Required when |
corr.struct |
Specifies a block correlation structure between test
statistics which is used in both the simulation routine, and in the
computations based upon asymptotic approximation, e.g. the AFDX
control and the ATPP method. Its form is specified via the following
named elements. |
control |
Optionally, a list with components with the following components:
'groups', used when distop=3 (F-dist), specifying number of groups. |
formula |
Optionally, the function can be used to _estimate_ f*
from a given dataset of sorted p-values. In this case we specify
formula, which is a formula of the form |
data |
The name of the dataset. |
distopt |
Test statistic distribution in among null and alternatively distributed sub-populations. distopt=0 gives normal (2 groups), distop=1 gives t- (2 groups) and distopt=2 gives F- (2+ groups) |
Uses a CLT for the FDP to calculate a reduced alpha required to bound the the false discovery rate in probability...e.g. finds alpha* so that when the BH-FDR procedure is controlled at alpha*, we ensure that
Pr( V_m/R_m > (1-r) alpha ) < (1-r) alpha
where 'alpha' is the original false discovery rate and 'r' is the proportion of non-null distributed test statistics.
alpha.star |
The reduced alpha required to bound the FDP in probability |
obj |
Objective value at 'alpha.star'... should be close to 0 |
L.star |
The bound on the FDP, should be (1-r) f. See above. |
P.star |
The probability that the FDP is greater than L.star. See above. |
average.power |
Resulting average power. |
c.g |
The BH-FDR threshold on the scale of the test statistics. |
gamma |
The proportion of all 'm' tests declared significant. |
objective |
Result of optimization yielding the 'average.power'. |
err.III |
Mass on the wrong side of the threshold. |
sigma.rtm.SoM |
Asymptotic variance of the true positive fraction. |
Grant Izmirlian <izmirlian at nih dot gov>
Izmirlian G. (2020) Strong consistency and asymptotic normality for quantities related to the Benjamini-Hochberg false discovery rate procedure. Statistics and Probability Letters; 108713, <doi:10.1016/j.spl.2020.108713>
Izmirlian G. (2017) Average Power and -power in
Multiple Testing Scenarios when the Benjamini-Hochberg False
Discovery Rate Procedure is Used. <arXiv:1801.03989>
Kluger D. M., Owen A. B. (2023) A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model. Bernoulli; xx:xxx-xxx.
## at alpha=0.15 and other parameters, it takes n.sample=46 replicates for ## average power > 80% pwr.46.15 <- pwrFDR(alpha=0.15, r.1=0.03, N.tests=1000, effect.size=0.79, n.sample=46) ## when there are 'only' N.tests=1000 simultaneous tests, the distribution of the ## false discovery fraction, FDP, is not so highly spiked at the alpha=0.15 ## You need to set the alpha down to alpha=0.0657 to ensure that Pr( T/J > 0.145 ) < 0.0657 fstr <- controlFDP(alpha=0.15, r.1=0.03, N.tests=1000, effect.size=0.8, n.sample=46) ## at all the above settings, with alpha=0.0657 at an n.sample of 46, we only have 69% ## average power. pwr.46.0657 <- pwrFDR(alpha=0.065747, r.1=0.03, N.tests=1000, effect.size=0.79, n.sample=46) ## it'll cost 7 more replicates to get the average power up over 80%. pwr.53.0657 <- pwrFDR(alpha=0.065747, r.1=0.03, N.tests=1000, effect.size=0.8, n.sample=53) ## it costs only 8.75% more to get it right!
## at alpha=0.15 and other parameters, it takes n.sample=46 replicates for ## average power > 80% pwr.46.15 <- pwrFDR(alpha=0.15, r.1=0.03, N.tests=1000, effect.size=0.79, n.sample=46) ## when there are 'only' N.tests=1000 simultaneous tests, the distribution of the ## false discovery fraction, FDP, is not so highly spiked at the alpha=0.15 ## You need to set the alpha down to alpha=0.0657 to ensure that Pr( T/J > 0.145 ) < 0.0657 fstr <- controlFDP(alpha=0.15, r.1=0.03, N.tests=1000, effect.size=0.8, n.sample=46) ## at all the above settings, with alpha=0.0657 at an n.sample of 46, we only have 69% ## average power. pwr.46.0657 <- pwrFDR(alpha=0.065747, r.1=0.03, N.tests=1000, effect.size=0.79, n.sample=46) ## it'll cost 7 more replicates to get the average power up over 80%. pwr.53.0657 <- pwrFDR(alpha=0.065747, r.1=0.03, N.tests=1000, effect.size=0.8, n.sample=53) ## it costs only 8.75% more to get it right!
Compute BH-FDR step up criterion, or Romano step-down criterion
criterion(alpha, delta, N.tests, FDP.control.method = c("BHFDR", "Romano"))
criterion(alpha, delta, N.tests, FDP.control.method = c("BHFDR", "Romano"))
alpha |
The false discovery rate (in the BH case) or the upper bound on the probability that the FDP exceeds lambda (Romano case) |
delta |
If the "FDP.control.method" is set to 'Romano' then the user can set the
exceedance thresh-hold for the FDP tail probability control
|
N.tests |
The number of simultaneous hypothesis tests. |
FDP.control.method |
A character string specifying how the false discovery proportion (FDP) is to be
controlled. You may specify the whole word or any shortened uniquely
identifying truncation. |
The step down or step up criterion, which is a vector of length N.tests
Grant Izmirlian <izmirlian at nih dot gov>
Benjamini Y, Hochberg Y. Controlling the false discovery rate - a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 1995; 57(1):289-300.
Romano J.P. and Shaikh A.M. On stepdown control of the false discovery proportion. IMS Lecture Notes–Monograph Series. 2006; 49:33-50. DOI: 10.1214/074921706000000383.
library(pwrFDR) crit.b <- criterion(N.tests=1000, alpha=0.15, FDP.control.method="BHFDR") crit.r <- criterion(N.tests=1000, alpha=0.15, FDP.control.method="Romano") crit.r.17 <- criterion(N.tests=1000, alpha=0.15, delta=0.17, FDP.control.method="Romano") matplot(1:1000, cbind(crit.b, crit.r, crit.r.17), type="l", lty=1, col=2:4)
library(pwrFDR) crit.b <- criterion(N.tests=1000, alpha=0.15, FDP.control.method="BHFDR") crit.r <- criterion(N.tests=1000, alpha=0.15, FDP.control.method="Romano") crit.r.17 <- criterion(N.tests=1000, alpha=0.15, delta=0.17, FDP.control.method="Romano") matplot(1:1000, cbind(crit.b, crit.r, crit.r.17), type="l", lty=1, col=2:4)
Objects created by the pwrFDR
function with option
method
=="simulation" are returned with an attribute named
detail
. This is its extractor function
detail(obj)
detail(obj)
obj |
An object created by the |
A list with components
reps |
A data frame of |
X |
A single simulation replicate of the |
Grant Izmirlian izmirlig at mail dot nih dot gov
The pwrFDR
package currently incorporates 3 distribution types,
normal, t and F. The first two of these are strictly for statistics formed
from two group comparison while the third is for statistics formed from the
omnibus test of any difference among an arbitrary number of groups >=2. The
structure is general and user expandable. One must specify the density,
CDF and quantile function for a given distribution and its parameters under
the null and under the alternative. These parameters must be expressions
to be evaluated inside the kernel of the power program, functions of the
arguments n.sample
, groups
and effect.size
. This is
not used directly by the user at all unless she (he) wants to add a
distribution type.
A data frame with 3 observations on the following 6 variables.
pars0
a list vector having components 'c(nd, p1, p2, ...)' where 'nd' is the distribution number starting with 0, and p1, p2, ..., are paramters of the distribution, which are functions of 'n.sample', 'groups' and 'effect.size' as mentioned above. These must be expressed as a call e.g. as.call(expression(c, nd, p1, p2, ...)) etc. 'pars0' are the parameters under the null.
pars1
a list vector. See directly above. Parameters under the alternative.
minv
a list vector with components given the values -Inf or 0, which will be used to decide if the two sided corrections are used or not.
ddist
a list vector with components set to functions, each one computing the probability density function corresponding to the particular distribution. A function of arguments 'x' and 'par'. See details below.
pdist
a list vector with components set to the functions, each one computing the cumulative distribution function corresponding to the particular distribution. A function of arguments 'x' and 'par'. See details below.
qdist
a list vector with components set to the functions, each one computing the quantile function (inverse cumulative distribution function) corresponding to the particular distribution. A function of arguments 'x' and 'par'. See details below.
rdist
a list vector with components set to the functions, each one capable of simulating a specified number of replicates corresponding to the particular distribution. A function of arguments 'n' and 'par'. See details below.
dists
is a data.frame with components pars0
, pars1
,
minv
, ddist
, pdist
, qdist
and rdist
.
For the three available distribution options, "normal", "t" and "f",
the components pars0
and pars1
take the following form:
1. pars0 | pars1 |
2. c(0,ncp=0,sd=1) | c(0,ncp=.NCP.,sd=1) |
3. c(1,ncp=0,ndf=.DF.) | c(1,ncp=.NCP.,ndf=.DF.) |
4. c(2,ncp=0,ndf1=groups-1,ndf2=.DF.) | c(2,ncp=.NCP.^2,ndf1=groups-1,ndf2=.DF.) |
The component minv
gives the minumum value of the support set of the
distribution. For the above named three available distribution options,
minv
is set to the values -Inf, -Inf and 0, respectively. The components
ddist
, pdist
, qdist
, and rdist
contain functions
defining the density, CDF, quantile, and simulator function, respectively. For
the above named three available distribution options, ddist
takes
the following form:
1. ddist |
2. function (x, par) dnorm(x, mean = par[2], sd = par[3]) |
3. function (x, par) dt(x, ncp = par[2], df = par[3]) |
4. function (x, par) df(x, ncp = par[2], df1 = par[3], df2 = par[4]) |
exept for rdist, which has arguments 'n' and 'par' |
The components pdist
and qdist
are nearly identical to the
component ddist
, but with pnorm, pt, pf and qnorm, qt, qf replacing
dnorm, dt and df, respectively.
The variables, .NCP.
and .DF.
named above are defined within the
functions in which ddist
is used based upon corresponding expressions,
NCP
and DF
. These expressions currently contain 3 component
expressions, one for each of the available test types, "paired", "balanced"
and "unbalanced".NCP
is currently defined:
1. NCP |
expression(n.sample^0.5*effect.size,(n.sample/groups)^0.5*effect.size, |
((n.sample-1)/(1+sum((n.sample-1)/(nii.sample-1))))^0.5*effect.size) |
and DF
is currently defined:
1. DF |
expression(n.sample - 1, groups * (n.sample - 1), |
groups^2*(n.sample-1)/(1+sum((n.sample-1)/(nii.sample-1)))) |
This isn't 'data' data, its a kind of a 'family' object.
Izmirlian G. (2020) Strong consistency and asymptotic normality for quantities related to the Benjamini-Hochberg false discovery rate procedure. Statistics and Probability Letters; <doi:10.1016/j.spl.2020.108713>
Izmirlian G. (2017) Average Power and -power in
Multiple Testing Scenarios when the Benjamini-Hochberg False
Discovery Rate Procedure is Used. <arXiv:1801.03989>
pwrFDR
to allow power/sample
size calculation for multiple tests of ROC curve based hypothesis.
See details.
In hypothesis tests of TPR_1 vs TPR_0 at fixed FPR, or
FPR_1 vs FPR_0 at fixed TPR, this computes the
equivalent Z-test effect size. This can then be passed
the effect.size
argument in a call to
pwrFDR
or controlFDP
es.ROC(FPR0, FPR1 = NULL, TPR0, TPR1 = NULL, b = NULL)
es.ROC(FPR0, FPR1 = NULL, TPR0, TPR1 = NULL, b = NULL)
FPR0 |
When the TPR is fixed, the FPR under the null. Otherwise the fixed FPR. |
FPR1 |
When the TPR is fixed, the FPR under the alternative. Otherwise left blank. |
TPR0 |
When the FPR is fixed, the TPR under the null. Otherwise the fixed TPR. |
TPR1 |
When the FPR is fixed, the TPR under the alternative. Otherwise left blank. |
b |
Nominal slope of the ROC at FPR0. Taken to be 1 by default. |
The equivalent Z-test effect size in a hypothesis test for difference in TPR at fixed FPR or difference in FPR at fixed TPR
Grant Izmirlian <izmirlian at nih dot gov>
Pepe M. S., Feng Z, Janes, H Bossuyt P. M. and Potter J. D. Pivotal evaluation of the accuracy of a biomarker used for classification or prediction. Supplement. J Natl Cancer Inst 2008;100: 1432–1438
es.ROC(FPR0=0.15, TPR0=0.80, TPR1=0.90)
es.ROC(FPR0=0.15, TPR0=0.80, TPR1=0.90)
Generates a tempfile name with an optional user specified prefix and suffix Result is a character string
gentempfilenm(prfx = "temp", sfx = ".txt")
gentempfilenm(prfx = "temp", sfx = ".txt")
prfx |
prefix for the file name, e.g. "temp" |
sfx |
suffix (file extension) for the file name, e.g. ".txt" |
a character string containing the randomly generated name of the tempfile.
Grant Izmirlian izmirlig at mail dot nih dot gov
A helper function– remove if zero. Included at the user level because it's useful for setting up batch jobs.
if.0.rm(x)
if.0.rm(x)
x |
A numeric vector. |
A numeric vector, equal to the input vector, x
, except with 0's
removed.
Grant Izmirlian <izmirlian at nih dot gov>
A helper function – substitute 'NA's with a specified 'x'. Included at the user level because it's useful for setting up batch jobs.
if.na.x(x, x0 = FALSE)
if.na.x(x, x0 = FALSE)
x |
A numeric or boolean vector. |
x0 |
Value with which to replace |
A numeric vector, equal to the input vector, x
, except with NA
's
replaced by the value, x0
, which the user suplied.
Grant Izmirlian <izmirlian at nih dot gov>
y
's with a specified 'z'.
A helper function – in a numeric vector, substitute values equal to 'y' with user specified 'z'. Included at the user level because it's useful for setting up batch jobs.
if.y.z(x, y = 0, z = 1)
if.y.z(x, y = 0, z = 1)
x |
A numeric, character or boolean vector |
y |
The valued to be swapped out |
z |
The value which replaces swapped out values |
A numeric, character or boolean vector, equal to the input vector,
x
, except with occurences y
replaced with the value z
Grant Izmirlian <izmirlian at nih dot gov>
Joins pwrFDR objects into a single table.
join.tbl(...)
join.tbl(...)
... |
obj1, obj2, ... each being the result of a call to pwrFDR. See the example below. |
The table of joined pwrFDR objects as a data.frame
Grant Izmirlian
rslt.avgp.r15 <- pwrFDR(effect.size = 0.79, average.power=0.80, r.1 = 0.15, alpha = 0.15) rslt.avgp.r10 <- update(rslt.avgp.r15, r.1 = 0.10) rslt.avgp.r05 <- update(rslt.avgp.r15, r.1 = 0.05) join.tbl(rslt.avgp.r15, rslt.avgp.r10, rslt.avgp.r05)
rslt.avgp.r15 <- pwrFDR(effect.size = 0.79, average.power=0.80, r.1 = 0.15, alpha = 0.15) rslt.avgp.r10 <- update(rslt.avgp.r15, r.1 = 0.10) rslt.avgp.r05 <- update(rslt.avgp.r15, r.1 = 0.05) join.tbl(rslt.avgp.r15, rslt.avgp.r10, rslt.avgp.r05)
Computes the logit transform for objects of type numeric and objects of class "pwr".
logit(mu)
logit(mu)
mu |
A real number on the interval [0, 1] |
A numeric equal to the logit of mu, a real number.
Grant Izmirlian <izmirlian at mail.nih.gov>
Computes the inverse logit transform for objects of type numeric and objects of class "pwr".
logitInv(eta)
logitInv(eta)
eta |
Any real number |
A numeric equal to the logit inverse of mu, a real number on the interval [0, 1]
Grant Izmirlian <izmirlian at mail.nih.gov>
A helper function– turns a missing column into 'NA's inside of a with statement. Included at the user level because its useful in setting up batch jobs, especially since the 'pwrFDR' return argument list varies depending on the manner called.
nna(x)
nna(x)
x |
A named numeric vector component of a data frame which may or may not be present. |
Either the values in the component x
of the data.frame or NA\'s of equal length
Grant Izmirlian <izmirlian at nih dot gov>
sim.1 <- pwrFDR(effect.size=0.8, n.sample=60, lambda=0.90, r.1=0.05, N.tests=450, alpha=0.15, method="sim", FDP.control.method="BHFDX") sim.2 <- pwrFDR(effect.size=0.8, n.sample=60, lambda=0.90, r.1=0.05, N.tests=450, alpha=0.15, method="sim", FDP.control.method="both", control=list(sim.level=2)) with(detail(sim.1)$reps, cbind(R.st/100, nna(R.R)/450)) with(detail(sim.2)$reps, cbind(R.st/100, nna(R.R)/450))
sim.1 <- pwrFDR(effect.size=0.8, n.sample=60, lambda=0.90, r.1=0.05, N.tests=450, alpha=0.15, method="sim", FDP.control.method="BHFDX") sim.2 <- pwrFDR(effect.size=0.8, n.sample=60, lambda=0.90, r.1=0.05, N.tests=450, alpha=0.15, method="sim", FDP.control.method="both", control=list(sim.level=2)) with(detail(sim.1)$reps, cbind(R.st/100, nna(R.R)/450)) with(detail(sim.2)$reps, cbind(R.st/100, nna(R.R)/450))
A binary operator shortcut for paste(x,y)
x %,% y
x %,% y
x |
a character string |
y |
a character string |
The concatenated character string
Grant Izmirlian [email protected]
library(pwrFDR) "var" %,% (1:10)
library(pwrFDR) "var" %,% (1:10)
This is a function for calculating two differing notions of power, or deriving sample sizes for specified requisite power in multiple testing experiments under a variety of methods for control of the distribution of the False Discovery Proportion (FDP). More specifically, one can choose to control the FDP distribution according to control of its (i) mean, e.g. the usual BH-FDR procedure, or via the probability that it exceeds a given value, delta, via (ii) the Romano procedure, or via (iii) my procedure based upon asymptotic approximation. Likewise, we can think of the power in multiple testing experiments in terms of a summary of the distribution of the True Positive Proportion (TPP). The package will compute power, sample size or any other missing parameter required for power based upon (i) the mean of the TPP which is the average power (ii) the probability that the TPP exceeds a given value, lambda, via my asymptotic approximation procedure. The theoretical results are described in Izmirlian, G. (2020), and an applied paper describing the methodology with a simulation study is in preparation.
pwrFDR(effect.size, n.sample, r.1, alpha, delta = NULL, groups = 2, N.tests, average.power, TPX.power, lambda, type = c("paired","balanced","unbalanced"), grpj.per.grp1=NULL, corr.struct=list(type=c("CS-Blocks","Toeplitz-Blocks"), block.size=NA, rho = NA), FDP.control.method = c("BHFDR","BHFDX","Romano","Auto","both","Holm","Hochberg", "Bonferroni"), distopt=1, method = c("Theoretical","simulation"), n.sim = 1000, temp.file, control=list(tol=1e-08, max.iter=c(1000, 20), sim.level=2, low.power.stop=TRUE, FDP.meth.thresh=FDP.cntl.mth.thrsh.def, ast.le.a=TRUE, verb=FALSE, show.footer=TRUE))
pwrFDR(effect.size, n.sample, r.1, alpha, delta = NULL, groups = 2, N.tests, average.power, TPX.power, lambda, type = c("paired","balanced","unbalanced"), grpj.per.grp1=NULL, corr.struct=list(type=c("CS-Blocks","Toeplitz-Blocks"), block.size=NA, rho = NA), FDP.control.method = c("BHFDR","BHFDX","Romano","Auto","both","Holm","Hochberg", "Bonferroni"), distopt=1, method = c("Theoretical","simulation"), n.sim = 1000, temp.file, control=list(tol=1e-08, max.iter=c(1000, 20), sim.level=2, low.power.stop=TRUE, FDP.meth.thresh=FDP.cntl.mth.thrsh.def, ast.le.a=TRUE, verb=FALSE, show.footer=TRUE))
effect.size |
The effect size (mean over standard deviation) for test statistics having non-zero means. Assumed to be a constant (in magnitude) over non-zero mean test statistics. |
n.sample |
The number of experimental replicates. Required for calculation of power |
r.1 |
The proportion of simultaneous tests that are non-centrally located |
alpha |
The false discovery rate (in the BH case) or the upper bound on the probability that the FDP exceeds delta (BHFDX and Romano case) |
delta |
If the "FDP.control.method" is set to 'Romano' or 'BHFDX', then the
user can set the exceedance thresh-hold for the FDP tail probability
control |
groups |
The number of experimental groups to compare. Must be integral and >=1. The default value is 2. |
N.tests |
The number of simultaneous hypothesis tests. |
average.power |
The desired average power. Sample size calculation requires specification of either 'average.power' or 'TPX.power'. |
TPX.power |
The desired tp-power (see details for explanation). Sample size calculation requires specification of either 'average.power' or 'TPX.power'. |
lambda |
The tp-power threshold, required when calculating the tp-power (see details for explanation) or when calculating the sample size required for tp-power. |
type |
A character string specifying, in the groups=2 case, whether the test is 'paired', 'balanced', or 'unbalanced' and in the case when groups >=3, whether the test is 'balanced' or 'unbalanced'. The default in all cases is 'balanced'. Left unspecified in the one sample (groups=1) case. |
grpj.per.grp1 |
Required when |
corr.struct |
Specifies a block correlation structure between test
statistics which is used in both the simulation routine, and in the
computations based upon asymptotic approximation, e.g. the AFDX
control and the ATPP method. Its form is specified via the following
named elements. |
FDP.control.method |
A character string specifying how the false discovery proportion (FDP) is to be
controlled. You may specify the whole word or any shortened uniquely
identifying truncation. |
distopt |
Test statistic distribution in among null and alternatively distributed sub-populations. distopt=0 gives normal (2 groups), distop=1 gives t- (2 groups) and distopt=2 gives F- (2+ groups) |
method |
Specify the method whereby the average power is calculated.
You may specify the whole word or any unqiuely indentifying
truncation. |
control |
Optionally, a list with components with the following
components: |
n.sim |
If 'simulation' method is chosen you may specify number of simulations. Default is 1000. |
temp.file |
If 'simulation' method is chosen you may specify a tempfile where the current simulation replicate is updated. Very usefull for batch runs. You can use the included utility 'gentempfilenm' |
This function will compute one of a variety of ensemble powers under a given choice of FDP control methods. The underlying model assumes that the m simultaneous test statistics are i.i.d., each being formed from k samples which can be paired (k=2), balanced or unbalanced (k>=2), k=1,2,..., and distributed according to one of the available relevent distribution types (see above). The location parameter for each of the statistical tests is either 0 (null hypothesis) or a specified constant effect size (alternative hypothesis), with the identity of these two possibilities in each of the m cases being an i.i.d. unmeasured latent bernouli variable with density r.1, the mixing proportion. The m simultaneous statistical tests partition into those which are distributed according to the alternative, numbering M_m, and those distributed according to the NULL, numbering m-M_m. Once a selected thresholding method is applied, the m statistics can also be partitioned into those which are called significant, numbering R_m, and those which are not, numbering m-R_m. Each of the test statistics is thus given two labels, alternative hypothesis membership and whether a significant call was made. Of the R_m significant calls, T_m are true positives and V_m are false positives. This results in the following table.
1. | rej H0 | acc H0 | row Total | |
2. | H0 is FALSE | T_m | M.1-T | M_m |
3. | H0 is TRUE | R_m-T_m | (m-M_m)-(R_m-T_m) | m - M_m |
4. | col Total | R | m-R_m | m |
The ratio of the false positive count to the significant call count, V_m/R_m, is called the False Discovery Proportion (FDP). Thresholding methods which result in the most reproducibility seek to control the FDP distribution. The most well known is the Benjamini-Hochberg False Discovery Rate (BH-FDR) procedure. It guarantees that the FDR, which is the expected FDP, will be less than a stipulated alpha
While it is true that for large m, the distribution of the FDP,
V_m/M_m will become spiked at its mean, , in
many commonly occuring situations, there will still be
non-negligible dispersion in the distribution of the FDP. For
this reason, any validity promised by the BH-FDR procedure does
not actually apply on a case to case basis, and individual FDP's
may differ non-negligibly from the FDR. For this reason, the
function supplies two other methods of FDP control in addition to
FDP.control.method="BHFDR". These two alternate methods,
FDP.control.method="Romano" and FDP.control.method="BHFDX"
guarantee control of the tail probability of the FDP
distribution:
The lower bound is left arbitrary for greater
flexibility,
being the default. There is
also an automatic option, FDP.control.method="Auto", which lets
the function decide which of the three FDP control methods is
the most advisable in a given situation. The two tail
probability control options are preferred when the standard
error of the FDP exceeds a cutoff given in the default
'control' settings:
se[V_m/R_m] / alpha > FDP.cntl.mth.thrsh.def[1]
The default is 10%. When the standard error to alpha ratio is
10% or less then the BHFDR, being the least conservative, is
preferred. When the se to alpha ratio is 10% or more, then
Romano and BHFDX are decided between, with the BHFDX (asymptotic
approximation) being less conservative than Romano and therefor
preferred if the CLT approximation is adequate. This will be the
case provided is large enough,
FDP.cntl.mth.thrsh.def[2]. The default is 50.
The concept of ensemble power for the purposes of this function,
concern the distribution of the true positive proportion (TPP),
. The most well known is the average power, which
is the expected value of the TPP, which is called the true
positve rate (TPR):
For large m, the distribution of the TPP will be spiked at its mean, which is the asymptotic average power. This is used in the function in the average power computation. As was the case for the FDP, there are many commonly occuring situations when the distribution of the TPP will still be non-negligibly dispersed. For this reason, we provide an alternate notion of power which is based upon the tail probability of the TPP distribution:
This is computed via asymptotic approximation and also requires
that be large enough:
. The user decides
when the tp-power is to be preferred. A good check is to look
at the ratio of the se[TPP] to average power ratio which is the
sigma.rtm.TPP/average.power/N.tests^0.5 If this ratio is
unacceptibly large (10% or so) than the tp-power is preferred.
For the "FixedPoint" method (default) and for any specified choice of FDP.control method, the function can be used in the following ways:
1. Specify 'n.sample', 'effect.size', 'r.1' and 'alpha'. Calculates 'average power'
2. Specify 'n.sample', 'effect.size', 'r.1', 'alpha' and 'lambda'. 'N.tests' is also required. The function wil calculate the 'TPX.power' in addition to the 'average power'.
3. Specify the 'average.power' or the pair 'TPX.power' and 'lambda'. Specify all but one of the parameters, 'n.sample', 'effect.size', 'r.1' and 'alpha'. The function will calculate the value of the missing parameter required for the specified 'average power' or 'tp-power'. Note: a solution is guaranteed for missing 'n.sample' and missing 'effect.size', but not necessarily for missing 'r.1' or 'alpha'.
The Holm, Hochberg and Bonferroni procedures for controlling the FWER under the assumption of independent test statistics are also provided for sake of completeness.
An object of class "pwr" with with components including:
call |
The call which produced the result |
average.power |
Resulting average power. |
TPX.power |
When 'lambda' is specified, the tp-power is also computed |
L.eq |
The lambda at which the tp-power and average-power are equal. |
n.sample |
If 'n.sample' is missing from the argument list, then the sample size required for the specified average- or lambda- power. |
alpha.star |
If 'FDP.control.method' was set to "BHFDX" or it resulted from the "Auto" setting, the alpha at which the probability that the FDP exceeds alpha.star is less than or equal to the originally specified alpha. |
c.g |
The FDP control method threshold on the scale of the test statistics. |
gamma |
The proportion of all 'm' tests declared significant. |
objective |
Result of optimization yielding the average or tp- power. |
err.III |
Mass on the wrong side of the threshold. |
sigma.rtm.ToM |
Asymptotic standard deviation of the true positive fraction. |
Auto |
If 'FDP.control.method' was set to "Auto", this returns the resulting choice (a string) which was made internally. |
se.by.a |
The ratio of the standard error of the FDP to alpha, the nominal FDR, which gives an indication of the dispersion of its distribution relative to the nominal FDP. Used by the "Auto" specification. |
gma.Ntsts |
the effective denominator in the CLT asymptotic approximation to the distribution of the FDP, which equals the positive proportion, 'gamma', times the number of simultaneous tests, 'm'. |
detail |
The extractor function, |
Grant Izmirlian <izmirlian at nih dot gov>
Izmirlian G. (2020) Strong consistency and asymptotic normality for quantities related to the Benjamini-Hochberg false discovery rate procedure. Statistics and Probability Letters; 108713, <doi:10.1016/j.spl.2020.108713>
Izmirlian G. (2017) Average Power and -power in
Multiple Testing Scenarios when the Benjamini-Hochberg False
Discovery Rate Procedure is Used. <arXiv:1801.03989>
Jung S-H. (2005) Sample size for FDR-control in microarray data analysis. Bioinformatics; 21:3097-3104.
Kluger D. M., Owen A. B. (2023) A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model. Bernoulli; xx:xxx-xxx.
Liu P. and Hwang J-T. G. (2007) Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics; 23:739-746.
Lehmann E. L., Romano J. P.. Generalizations of the familywise error rate. Ann. Stat.. 2005;33(3):1138-1154.
Romano Joseph P., Shaikh Azeem M.. Stepup procedures for control of generalizations of the familywise error rate. Ann. Stat.. 2006;34(4):1850-1873.
## Example 1a: average power rslt.avgp <- pwrFDR(effect.size=0.79, n.sample=46, r.1=2000/54675, alpha=0.15) rslt.avgp ## Example 1b: average power, FDP.control.method set to "Auto", N.tests=1000 rslt.avgp.auto <- pwrFDR(effect.size = 0.79, n.sample = 46, r.1 = 2000/54675, alpha = 0.15, N.tests = 1000, FDP.control.method = "Auto") rslt.avgp.auto ## Example 1c: average power, FDP.control.method set to "Auto", N.tests=2000 rslt.avgp.auto <- update(rslt.avgp.auto, N.tests = 2000) rslt.avgp.auto ## Example 1d: tp-power rslt.lpwr <- pwrFDR(effect.size=0.79, n.sample=46, r.1=2000/54675, alpha=0.15, lambda=0.80, N.tests=54675) rslt.lpwr ## Example 1e: sample size required for given average power rslt.ss.avgp <- pwrFDR(effect.size=0.79, average.power=0.82, r.1=2000/54675, alpha=0.15) rslt.ss.avgp ## Example 1f: sample size required for given tp-power rslt.ss.lpwr <- pwrFDR(effect.size=0.79, TPX.power=0.82, lambda=0.80, r.1=2000/54675, alpha=0.15, N.tests=54675) rslt.ss.lpwr ## Example 1g: simulation rslt.sim <- update(rslt.avgp, method="sim", n.sim=500, N.tests=1000) rslt.sim ## Example 1h: simulation rslt.sim <- update(rslt.avgp, method="sim", FDP.control.method="both", n.sim=500, N.tests=1000) rslt.sim ## Example 2: methods for adding, subtracting, multiplying, dividing, exp, log, ## logit and inverse logit rslt.avgp - rslt.sim logit(rslt.avgp) ## etc ## Example 3: Compare the asymptotic distribution of T/M with kernel ## density estimate from simulated data pdf <- with(detail(rslt.sim)$reps, density(T/M1)) med <- with(detail(rslt.sim)$reps, median(T/M1)) avg <- rslt.sim$average.power sd <- rslt.sim$se.ToM rng.x <- range(pdf$x) rng.y <- range(c(pdf$y, dnorm(pdf$x, mean=avg, sd=sd))) plot(rng.x, rng.y, xlab="u", ylab="PDF for T/M", type="n") with(pdf, lines(x, y)) lines(rep(rslt.sim$average.power, 2), rng.y, lty=2) lines(pdf$x, dnorm(pdf$x, mean=avg, sd=sd), lty=3)
## Example 1a: average power rslt.avgp <- pwrFDR(effect.size=0.79, n.sample=46, r.1=2000/54675, alpha=0.15) rslt.avgp ## Example 1b: average power, FDP.control.method set to "Auto", N.tests=1000 rslt.avgp.auto <- pwrFDR(effect.size = 0.79, n.sample = 46, r.1 = 2000/54675, alpha = 0.15, N.tests = 1000, FDP.control.method = "Auto") rslt.avgp.auto ## Example 1c: average power, FDP.control.method set to "Auto", N.tests=2000 rslt.avgp.auto <- update(rslt.avgp.auto, N.tests = 2000) rslt.avgp.auto ## Example 1d: tp-power rslt.lpwr <- pwrFDR(effect.size=0.79, n.sample=46, r.1=2000/54675, alpha=0.15, lambda=0.80, N.tests=54675) rslt.lpwr ## Example 1e: sample size required for given average power rslt.ss.avgp <- pwrFDR(effect.size=0.79, average.power=0.82, r.1=2000/54675, alpha=0.15) rslt.ss.avgp ## Example 1f: sample size required for given tp-power rslt.ss.lpwr <- pwrFDR(effect.size=0.79, TPX.power=0.82, lambda=0.80, r.1=2000/54675, alpha=0.15, N.tests=54675) rslt.ss.lpwr ## Example 1g: simulation rslt.sim <- update(rslt.avgp, method="sim", n.sim=500, N.tests=1000) rslt.sim ## Example 1h: simulation rslt.sim <- update(rslt.avgp, method="sim", FDP.control.method="both", n.sim=500, N.tests=1000) rslt.sim ## Example 2: methods for adding, subtracting, multiplying, dividing, exp, log, ## logit and inverse logit rslt.avgp - rslt.sim logit(rslt.avgp) ## etc ## Example 3: Compare the asymptotic distribution of T/M with kernel ## density estimate from simulated data pdf <- with(detail(rslt.sim)$reps, density(T/M1)) med <- with(detail(rslt.sim)$reps, median(T/M1)) avg <- rslt.sim$average.power sd <- rslt.sim$se.ToM rng.x <- range(pdf$x) rng.y <- range(c(pdf$y, dnorm(pdf$x, mean=avg, sd=sd))) plot(rng.x, rng.y, xlab="u", ylab="PDF for T/M", type="n") with(pdf, lines(x, y)) lines(rep(rslt.sim$average.power, 2), rng.y, lty=2) lines(pdf$x, dnorm(pdf$x, mean=avg, sd=sd), lty=3)
pwrFDR
on a grid.
Function for evaluating pwrFDR
on a factorial design of
possible parameters.
pwrFDR.grid(effect.size, n.sample, r.1, alpha, delta, groups, N.tests, average.power, TPX.power, lambda, type, grpj.per.grp1, corr.struct, FDP.control.method, distopt, control)
pwrFDR.grid(effect.size, n.sample, r.1, alpha, delta, groups, N.tests, average.power, TPX.power, lambda, type, grpj.per.grp1, corr.struct, FDP.control.method, distopt, control)
effect.size |
A vector of effect sizes to be looped over. The effect size (mean over standard deviation) for test statistics having non-zero means. Assumed to be a constant (in magnitude) over non-zero mean test statistics. |
n.sample |
A vector of sample sizes to be looped over. The sample size is the number of experimental replicates. Required for calculation of power |
r.1 |
A vector of mixing proportions to be looped over. The mixing proportion is the proportion of simultaneous tests that are non-centrally located |
alpha |
The false discovery rate (in the BH case) or the upper bound on the probability that the FDP exceeds delta (BHFDX and Romano case) |
delta |
If the "FDP.control.method" is set to 'Romano' or 'BHFDX', then this
optional argument can be set to the exceedance thresh-hold in
defining the FDP-tp: |
groups |
The number of experimental groups to compare. Must be integral and >=1. The default value is 2. |
N.tests |
The number of simultaneous hypothesis tests. |
average.power |
The desired average power. Calculation of sample size, effect size mixing proportion or alpha requires specification of either 'average.power' or 'TPX.power'. |
TPX.power |
The desired tp-power (see |
lambda |
The tp-power threshold, required when calculating the tp-power
(see |
type |
A character string specifying, in the groups=2 case, whether the test is 'paired', 'balanced', or 'unbalanced' and in the case when groups >=3, whether the test is 'balanced' or 'unbalanced'. The default in all cases is 'balanced'. Left unspecified in the one sample (groups=1) case. |
grpj.per.grp1 |
Required when |
corr.struct |
Specifies a block correlation structure between test
statistics which is used in both the simulation routine, and in the
computations based upon asymptotic approximation, e.g. the AFDX
control and the ATPP method. Its form is specified via the following
named elements. |
FDP.control.method |
A character string specifying how the false discovery proportion (FDP) is to be
controlled. You may specify the whole word or any shortened uniquely
identifying truncation. |
distopt |
Test statistic distribution in among null and alternatively distributed sub-populations. distopt=0 gives normal (2 groups), distop=1 gives t- (2 groups) and distopt=2 gives F- (2+ groups) |
control |
Optionally, a list with components with the following
components: |
Arguments may be specified as vectors of possible values or can be set to a single constant value.
A list having two components:
conditions |
A data.frame with one column for each argument listing the distinct settings for all parameters. |
results |
A list with components objects of class |
Grant Izmirlian <izmirlian at nih dot gov>
Izmirlian G. (2020) Strong consistency and asymptotic normality for quantities related to the Benjamini-Hochberg false discovery rate procedure. Statistics and Probability Letters; 108713, <doi:10.1016/j.spl.2020.108713>
Izmirlian G. (2017) Average Power and -power in
Multiple Testing Scenarios when the Benjamini-Hochberg False
Discovery Rate Procedure is Used. <arXiv:1801.03989>
Jung S-H. (2005) Sample size for FDR-control in microarray data analysis. Bioinformatics; 21:3097-3104.
Kluger D. M., Owen A. B. (2023) A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model. Bernoulli; xx:xxx-xxx.
Liu P. and Hwang J-T. G. (2007) Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics; 23:739-746.
Lehmann E. L., Romano J. P.. Generalizations of the familywise error rate. Ann. Stat.. 2005;33(3):1138-1154.
Romano Joseph P., Shaikh Azeem M.. Stepup procedures for control of generalizations of the familywise error rate. Ann. Stat.. 2006;34(4):1850-1873.
tst <- pwrFDR.grid(effect.size=c(0.6,0.9), n.sample=c(50,60,70), r.1=0.4+0.2*(0:1), alpha=0.05+0.05*(0:3), N.tests=1000, FDP.control.method="Auto")
tst <- pwrFDR.grid(effect.size=c(0.6,0.9), n.sample=c(50,60,70), r.1=0.4+0.2*(0:1), alpha=0.05+0.05*(0:3), N.tests=1000, FDP.control.method="Auto")
A function which extracts the asymptotic standard deviation for the
postive call proportion, R_m/m, under the selected FDP control method
from the supplied pwr
object, which is the result of a call to
the main function, pwrFDR.
sd.rtm.Rom(object)
sd.rtm.Rom(object)
object |
An object of class, |
The siginificant call proportion (SCP), R_m/m, under the selected FDP control method, is directly related to the ensemble power, which in turn, is determined by the effect size for tests distributed under the alternative, the sample size, the proportion of tests which are distributed according to the alternative and the size, alpha, in the selected FDP control method. Its asymptotic standard error, e.g. the asymptotic standard deviation over the square root of the number of simultaneous tests, m, gives an indication of the range of values one can expect for the significant call proportion. The standard deviations of the ratios R_m/m, T_m/M_m, and V_m/R_m are used internally in control of the distribution of V_m/R_m for the BHFDX FDP control method, and in calculation of the tail probability power for T_m/M_m.
Returns the asymptotic standard deviation of the significant call proportion, sd[R_m/m], as an un-named numeric.
Grant Izmirlian <izmirlig at mail dot nih dot gov>
Izmirlian G. (2020) Strong consistency and asymptotic normality for quantities related to the Benjamini-Hochberg false discovery rate procedure. Statistics and Probability Letters; 108713, <doi:10.1016/j.spl.2020.108713>.
Izmirlian G. (2017) Average Power and -power in
Multiple Testing Scenarios when the Benjamini-Hochberg False
Discovery Rate Procedure is Used. <arXiv:1801.03989>
Kluger D. M., Owen A. B. (2023) A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model. Bernoulli; xx:xxx-xxx.
rslt.BHFDR <- pwrFDR(effect.size=0.79, n.sample=46, r.1=0.05, alpha=0.15) rslt.Auto.1 <- pwrFDR(effect.size=0.79, n.sample=46, r.1=0.05, alpha=0.15, N.tests=51, FDP.control.method="Auto") rslt.Auto.2 <- pwrFDR(effect.size=0.79, n.sample=46, r.1=0.05, alpha=0.15, N.tests=49, FDP.control.method="Auto") ## Asymptotic standard deviation of positive call proportion under BHFDR sdrtmRomBHFDR <- sd.rtm.Rom(rslt.BHFDR) ## Asymptotic standard deviation of positive call proportion under BHFDX sdrtmRomAuto1 <- sd.rtm.Rom(rslt.Auto.1) ## Asymptotic standard deviation of positive call proportionunder Romano sdrtmRomAuto2 <- sd.rtm.Rom(rslt.Auto.2)
rslt.BHFDR <- pwrFDR(effect.size=0.79, n.sample=46, r.1=0.05, alpha=0.15) rslt.Auto.1 <- pwrFDR(effect.size=0.79, n.sample=46, r.1=0.05, alpha=0.15, N.tests=51, FDP.control.method="Auto") rslt.Auto.2 <- pwrFDR(effect.size=0.79, n.sample=46, r.1=0.05, alpha=0.15, N.tests=49, FDP.control.method="Auto") ## Asymptotic standard deviation of positive call proportion under BHFDR sdrtmRomBHFDR <- sd.rtm.Rom(rslt.BHFDR) ## Asymptotic standard deviation of positive call proportion under BHFDX sdrtmRomAuto1 <- sd.rtm.Rom(rslt.Auto.1) ## Asymptotic standard deviation of positive call proportionunder Romano sdrtmRomAuto2 <- sd.rtm.Rom(rslt.Auto.2)
A function which extracts the asymptotic standard deviation for the
true positive proportion, T_m/M_m, under the selected FDP control method
from the supplied pwr
object, which is the result of a call to
the main function, pwrFDR.
sd.rtm.ToM(object)
sd.rtm.ToM(object)
object |
An object of class, |
The true positive proportion (TPP), T_m/M_m, is the proportion of all test statistics distributed according to the alternative that are declared significant by the selected FDP control method. Whereas the ensemble type I error in the multiple testing experiment is handled via control of the distribution of the FDP, V_m/R_m, the ensemble power is optimized via the distribution of the TPP. The most commonly used ensemble power is based upon the expected TPP, or true postive rate, E[TPP], which also called the average power. In situations of just adequate power or near adequate power, especially when testing less than 1000 simultaneous tests or so, the distribution of the TPP will be non-negligiby dispersed and this means that the TPP in a given multiple testing experiment for which sample size was based on the average power will likely not be close to the promised average power. For this reason, it is preferable to use a concept of ensemble power which is based upon the excedance probability for the TPP, or tail probability of the TPP (tp-TPP).
This package uses asymptotic approximation to derive the tp-TPP ensemble power under any one of the avaialbe FDP control methods, BHFDR, BHFDX or Romano.
Returns the asymptotic standard deviation of the true postive proportion, sd[T_m/M_m], as an un-named numeric.
Grant Izmirlian <izmirlig at mail dot nih dot gov>
Izmirlian G. (2020) Strong consistency and asymptotic normality for quantities related to the Benjamini-Hochberg false discovery rate procedure. Statistics and Probability Letters; 108713, <doi:10.1016/j.spl.2020.108713>.
Izmirlian G. (2017) Average Power and -power in
Multiple Testing Scenarios when the Benjamini-Hochberg False
Discovery Rate Procedure is Used. <arXiv:1801.03989>
Kluger D. M., Owen A. B. (2023) A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model. Bernoulli; xx:xxx-xxx.
rslt.BHFDR <- pwrFDR(effect.size=0.79, n.sample=46, r.1=0.05, alpha=0.15) rslt.Auto.1 <- pwrFDR(effect.size=0.79, n.sample=46, r.1=0.05, alpha=0.15, N.tests=51, FDP.control.method="Auto") rslt.Auto.2 <- pwrFDR(effect.size=0.79, n.sample=46, r.1=0.05, alpha=0.15, N.tests=49, FDP.control.method="Auto") ## Asymptotic standard deviation under BHFDR sdrtmToMBHFDR <- sd.rtm.ToM(rslt.BHFDR) ## Asymptotic standard deviation under BHFDX sdrtmToMAuto1 <- sd.rtm.ToM(rslt.Auto.1) ## Asymptotic standard deviation under Romano sdrtmToMAuto2 <- sd.rtm.ToM(rslt.Auto.2)
rslt.BHFDR <- pwrFDR(effect.size=0.79, n.sample=46, r.1=0.05, alpha=0.15) rslt.Auto.1 <- pwrFDR(effect.size=0.79, n.sample=46, r.1=0.05, alpha=0.15, N.tests=51, FDP.control.method="Auto") rslt.Auto.2 <- pwrFDR(effect.size=0.79, n.sample=46, r.1=0.05, alpha=0.15, N.tests=49, FDP.control.method="Auto") ## Asymptotic standard deviation under BHFDR sdrtmToMBHFDR <- sd.rtm.ToM(rslt.BHFDR) ## Asymptotic standard deviation under BHFDX sdrtmToMAuto1 <- sd.rtm.ToM(rslt.Auto.1) ## Asymptotic standard deviation under Romano sdrtmToMAuto2 <- sd.rtm.ToM(rslt.Auto.2)
A function which extracts the asymptotic standard deviation for the
false discovery proportion, V_m/R_m, under the selected FDP control method
from the supplied pwr
object, which is the result of a call to
the main function, pwrFDR.
sd.rtm.VoR(object)
sd.rtm.VoR(object)
object |
An object of class, |
The false discovery proportion (FDP), V_m/R_m, under the selected FDP control method, is the proportion of null distributed test statistics that were deemed significant calls by the FDP control method. The most well known of available FDP methods is the Benjamini-Hochberg False Discovery Rate (BH-FDR) procedure. It ensures that the expected value of the FDP will be less than alpha, E[FDP] < alpha. The other two included FDP control methods, "Romano" and "BHFDX", control the probability that the FDP exceeds a given value, delta:
In most cases, the choice is appropriate but
is a distinct parameter to allow greater flexibility.
The choice "Auto" will select the most appropriate choice from the
three, BHFDR, BHFDX and Romano. If the asymptotic standard error,
sd.rtm.VoR/m^0.5 is greater than a control parameter (default value
10%), then one of the choices "BHFDX" or "Romano" will be made. As
the "Romano" FDP control method is more conservative, there is a
preference for the "BHFDX" method, which can be used if the number
of simultaneous tests, m, is larger than 50. All of this is
handled internally within the function
pwrFDR
. These
extractor functions exist to allow the user 'under the hood'.
Returns the asymptotic standard deviation of the false discovery proportion, sd[V_m/R_m], as an un-named numeric.
Grant Izmirlian <izmirlig at mail dot nih dot gov>
Izmirlian G. (2020) Strong consistency and asymptotic normality for quantities related to the Benjamini-Hochberg false discovery rate procedure. Statistics and Probability Letters; 108713, <doi:10.1016/j.spl.2020.108713>.
Izmirlian G. (2017) Average Power and -power in
Multiple Testing Scenarios when the Benjamini-Hochberg False
Discovery Rate Procedure is Used. <arXiv:1801.03989>
Kluger D. M., Owen A. B. (2023) A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model. Bernoulli; xx:xxx-xxx.
rslt.BHFDR <- pwrFDR(effect.size=0.79, n.sample=46, r.1=0.05, alpha=0.15) rslt.Auto.1 <- pwrFDR(effect.size=0.79, n.sample=46, r.1=0.05, alpha=0.15, N.tests=51, FDP.control.method="Auto") rslt.Auto.2 <- pwrFDR(effect.size=0.79, n.sample=46, r.1=0.05, alpha=0.15, N.tests=49, FDP.control.method="Auto") ## Asymptotic standard deviation under BHFDR sdrtmVoRBHFDR <- sd.rtm.VoR(rslt.BHFDR) ## Asymptotic standard deviation under BHFDX sdrtmVoRAuto1 <- sd.rtm.VoR(rslt.Auto.1) ## Asymptotic standard deviation under Romano sdrtmVoRAuto2 <- sd.rtm.VoR(rslt.Auto.2)
rslt.BHFDR <- pwrFDR(effect.size=0.79, n.sample=46, r.1=0.05, alpha=0.15) rslt.Auto.1 <- pwrFDR(effect.size=0.79, n.sample=46, r.1=0.05, alpha=0.15, N.tests=51, FDP.control.method="Auto") rslt.Auto.2 <- pwrFDR(effect.size=0.79, n.sample=46, r.1=0.05, alpha=0.15, N.tests=49, FDP.control.method="Auto") ## Asymptotic standard deviation under BHFDR sdrtmVoRBHFDR <- sd.rtm.VoR(rslt.BHFDR) ## Asymptotic standard deviation under BHFDX sdrtmVoRAuto1 <- sd.rtm.VoR(rslt.Auto.1) ## Asymptotic standard deviation under Romano sdrtmVoRAuto2 <- sd.rtm.VoR(rslt.Auto.2)