Title: | Library of Research Designs |
---|---|
Description: | A simple interface to build designs using the package 'DeclareDesign'. In one line of code, users can specify the parameters of individual designs and diagnose their properties. The designers can also be used to compare performance of a given design across a range of combinations of parameters, such as effect size, sample size, and assignment probabilities. |
Authors: | Graeme Blair [aut], Jasper Cooper [aut, cre], Alexander Coppock [aut], Macartan Humphreys [aut], Clara Bicalho [aut], Neal Fultz [aut], Lily Medina [aut] |
Maintainer: | Jasper Cooper <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.10 |
Built: | 2025-02-07 21:22:12 UTC |
Source: | https://github.com/declaredesign/designlibrary |
Generates string of assignment of value to argument
assignment_string(arg_name, arg_values)
assignment_string(arg_name, arg_values)
arg_name |
A string. Label of assignment object. |
arg_values |
A list. Values to be assigned to the argument. Can be character, logical or numeric of any length. |
Builds a design with one instrument, one binary explanatory variable, and one outcome.
binary_iv_designer( N = 100, type_probs = c(1/3, 1/3, 1/3, 0), assignment_probs = c(0.5, 0.5, 0.5, 0.5), a_Y = 1, b_Y = 0, d_Y = 0, outcome_sd = 1, a = c(1, 0, 0, 0) * a_Y, b = rep(b_Y, 4), d = rep(d_Y, 4), args_to_fix = NULL )
binary_iv_designer( N = 100, type_probs = c(1/3, 1/3, 1/3, 0), assignment_probs = c(0.5, 0.5, 0.5, 0.5), a_Y = 1, b_Y = 0, d_Y = 0, outcome_sd = 1, a = c(1, 0, 0, 0) * a_Y, b = rep(b_Y, 4), d = rep(d_Y, 4), args_to_fix = NULL )
N |
An integer. Sample size. |
type_probs |
A vector of four numbers in [0,1]. Probability of each complier type (always-taker, never-taker, complier, defier). |
assignment_probs |
A vector of four numbers in [0,1]. Probability of assignment to encouragement (Z) for each complier type (always-taker, never-taker, complier, defier). Under random assignment these are normally identical since complier status is not known to researchers in advance. |
a_Y |
A real number. Constant in Y equation. Assumed constant across types. Overridden by |
b_Y |
A real number. Effect of X on Y equation. Assumed constant across types. Overridden by |
d_Y |
A real number. Effect of Z on Y. Assumed constant across types. Overridden by |
outcome_sd |
A real number. The standard deviation of the outcome. |
a |
A vector of four numbers. Constant in Y equation for each complier type (always-taker, never-taker, complier, defier). |
b |
A vector of four numbers. Slope on X in Y equation for each complier type (always-taker, never-taker, complier, defier). |
d |
A vector of four numbers. Slope on Z in Y equation for each complier type (non zero implies violation of exclusion restriction). |
args_to_fix |
A character vector. Names of arguments to be args_to_fix in design. |
A researcher is interested in the effect of binary X on outcome Y. The relationship is confounded because units that are more likely to be assigned to X=1 have higher Y outcomes. A potential instrument Z is examined, which plausibly causes X. The instrument can be used to assess the effect of X on Y for units whose value of X depends on Z if Z does not negatively affect X for some cases, affects X positively for some, and affects Y only through X.
See vignette online for more details on estimands.
A simple instrumental variables design with binary instrument, treatment, and outcome variables.
# Generate a simple iv design: iv identifies late not ate binary_iv_design_1 <- binary_iv_designer(N = 1000, b = c(.1, .2, .3, .4)) ## Not run: diagnose_design(binary_iv_design_1) ## End(Not run) # Generates a simple iv design with violation of monotonicity binary_iv_design_2 <- binary_iv_designer(type_probs = c(.1,.1,.6, .2), b_Y = .5) ## Not run: diagnose_design(binary_iv_design_2) ## End(Not run) # Generates a simple iv design with violation of exclusion restriction binary_iv_design_3 <- binary_iv_designer(d_Y = .5, b_Y = .5) ## Not run: diagnose_design(binary_iv_design_3) ## End(Not run) # Generates a simple iv design with violation of randomization binary_iv_design_4 <- binary_iv_designer(N = 1000, assignment_probs = c(.2, .3, .7, .5), b_Y = .5) ## Not run: diagnose_design(binary_iv_design_4) ## End(Not run) # Generates a simple iv design with violation of first stage binary_iv_design_5 <- binary_iv_designer(type_probs = c(.5,.5, 0, 0), b_Y = .5) ## Not run: diagnose_design(binary_iv_design_5) ## End(Not run)
# Generate a simple iv design: iv identifies late not ate binary_iv_design_1 <- binary_iv_designer(N = 1000, b = c(.1, .2, .3, .4)) ## Not run: diagnose_design(binary_iv_design_1) ## End(Not run) # Generates a simple iv design with violation of monotonicity binary_iv_design_2 <- binary_iv_designer(type_probs = c(.1,.1,.6, .2), b_Y = .5) ## Not run: diagnose_design(binary_iv_design_2) ## End(Not run) # Generates a simple iv design with violation of exclusion restriction binary_iv_design_3 <- binary_iv_designer(d_Y = .5, b_Y = .5) ## Not run: diagnose_design(binary_iv_design_3) ## End(Not run) # Generates a simple iv design with violation of randomization binary_iv_design_4 <- binary_iv_designer(N = 1000, assignment_probs = c(.2, .3, .7, .5), b_Y = .5) ## Not run: diagnose_design(binary_iv_design_4) ## End(Not run) # Generates a simple iv design with violation of first stage binary_iv_design_5 <- binary_iv_designer(type_probs = c(.5,.5, 0, 0), b_Y = .5) ## Not run: diagnose_design(binary_iv_design_5) ## End(Not run)
Builds a two-arm design with blocks and clusters.
block_cluster_two_arm_designer( N = NULL, N_blocks = 1, N_clusters_in_block = ifelse(is.null(N), 100, round(N/N_blocks)), N_i_in_cluster = ifelse(is.null(N), 1, round(N/mean(N_blocks * N_clusters_in_block))), sd = 1, sd_block = 0.5773 * sd, sd_cluster = max(0, (sd^2 - sd_block^2)/2)^0.5, sd_i_0 = max(0, sd^2 - sd_block^2 - sd_cluster^2)^0.5, sd_i_1 = sd_i_0, rho = 1, assignment_probs = 0.5, control_mean = 0, ate = 0, treatment_mean = control_mean + ate, verbose = TRUE, args_to_fix = NULL )
block_cluster_two_arm_designer( N = NULL, N_blocks = 1, N_clusters_in_block = ifelse(is.null(N), 100, round(N/N_blocks)), N_i_in_cluster = ifelse(is.null(N), 1, round(N/mean(N_blocks * N_clusters_in_block))), sd = 1, sd_block = 0.5773 * sd, sd_cluster = max(0, (sd^2 - sd_block^2)/2)^0.5, sd_i_0 = max(0, sd^2 - sd_block^2 - sd_cluster^2)^0.5, sd_i_1 = sd_i_0, rho = 1, assignment_probs = 0.5, control_mean = 0, ate = 0, treatment_mean = control_mean + ate, verbose = TRUE, args_to_fix = NULL )
N |
An integer. Total number of units. Usually not specified as |
N_blocks |
An integer. Number of blocks. Defaults to 1 for no blocks. |
N_clusters_in_block |
An integer or vector of integers of length |
N_i_in_cluster |
An integer or vector of integers of length |
sd |
A nonnegative number. Overall standard deviation (combining individual level, cluster level, and block level shocks). Defaults to 1. Overridden if incompatible with other user-specified shocks. |
sd_block |
A nonnegative number. Standard deviation of block level shocks. |
sd_cluster |
A nonnegative number. Standard deviation of cluster level shock. |
sd_i_0 |
A nonnegative number. Standard deviation of individual level shock in control. If not specified, and when possible given |
sd_i_1 |
A nonnegative number. Standard deviation of individual level shock in treatment. Defaults to |
rho |
A number in [-1,1]. Correlation in individual shock between potential outcomes for treatment and control. |
assignment_probs |
A number or vector of numbers in (0,1). Treatment assignment probability for each block (specified in order of |
control_mean |
A number. Average outcome in control. |
ate |
A number. Average treatment effect. Alternative to specifying |
treatment_mean |
A number. Average outcome in treatment. If |
verbose |
Logical. If TRUE, prints intra-cluster correlation implied by design parameters. |
args_to_fix |
A character vector. Names of arguments to be args_to_fix in design. |
Units are assigned to treatment using complete block cluster random assignment. Treatment effects can be specified either by providing control_mean
and treatment_mean
or by specifying an ate
. Estimation uses differences in means accounting for blocks and clusters.
In the usual case N
is not provided by the user but is determined by N_blocks
, N_clusters_in_block
, N_i_in_cluster
(when these are integers N
is the product of these three numbers).
Normal shocks can be specified at the individual, cluster, and block levels. If individual level shocks are not specified and cluster and block level variances sum to less than 1, then individual level shocks are set such that total variance in outcomes equals 1.
Key limitations: The designer assumes covariance between potential outcomes at the individual level only.
See vignette online.
A block cluster two-arm design.
# Generate a design using default arguments: block_cluster_two_arm_design <- block_cluster_two_arm_designer() block_cluster_uneven <- block_cluster_two_arm_designer( N_blocks = 3, N_clusters_in_block = 2:4, N_i_in_cluster = 1:9) # A design in which number of clusters of cluster size is not specified # but N and block size are: block_cluster_guess <- block_cluster_two_arm_designer(N = 24, N_blocks = 3)
# Generate a design using default arguments: block_cluster_two_arm_design <- block_cluster_two_arm_designer() block_cluster_uneven <- block_cluster_two_arm_designer( N_blocks = 3, N_clusters_in_block = 2:4, N_i_in_cluster = 1:9) # A design in which number of clusters of cluster size is not specified # but N and block size are: block_cluster_guess <- block_cluster_two_arm_designer(N = 24, N_blocks = 3)
Builds a cluster sampling design for an ordinal outcome variable for a population with N_blocks
strata, each with N_clusters_in_block
clusters, each of which contains N_i_in_cluster
units. The sampling strategy involves sampling n_clusters_in_block
clusters in each stratum, and then sampling n_i_in_cluster
units in each cluster. Outcomes within clusters have intra-cluster correlation approximately equal to ICC
.
cluster_sampling_designer( N_blocks = 1, N_clusters_in_block = 1000, N_i_in_cluster = 50, n_clusters_in_block = 100, n_i_in_cluster = 10, icc = 0.2, args_to_fix = NULL )
cluster_sampling_designer( N_blocks = 1, N_clusters_in_block = 1000, N_i_in_cluster = 50, n_clusters_in_block = 100, n_i_in_cluster = 10, icc = 0.2, args_to_fix = NULL )
N_blocks |
An integer. Number of blocks (strata). Defaults to 1 for no blocks. |
N_clusters_in_block |
An integer or vector of integers of length |
N_i_in_cluster |
An integer or vector of integers of length |
n_clusters_in_block |
An integer. Number of clusters to sample in each block (stratum). |
n_i_in_cluster |
An integer. Number of units to sample in each cluster. |
icc |
A number in [0,1]. Intra-cluster Correlation Coefficient (ICC). |
args_to_fix |
A character vector. Names of arguments to be args_to_fix in design. |
Key limitations: The design assumes a args_to_fix number of clusters drawn in each stratum and a args_to_fix number of individuals drawn from each cluster.
See vignette online.
A stratified cluster sampling design.
# To make a design using default arguments: cluster_sampling_design <- cluster_sampling_designer() # A design with varying block size and varying cluster size cluster_sampling_design <- cluster_sampling_designer( N_blocks = 2, N_clusters_in_block = 6:7, N_i_in_cluster = 3:15, n_clusters_in_block = 5, n_i_in_cluster = 2)
# To make a design using default arguments: cluster_sampling_design <- cluster_sampling_designer() # A design with varying block size and varying cluster size cluster_sampling_design <- cluster_sampling_designer( N_blocks = 2, N_clusters_in_block = 6:7, N_i_in_cluster = 3:15, n_clusters_in_block = 5, n_i_in_cluster = 2)
Substitute approach
code_fixer(design_expr, list_fixed_str, eval_envir)
code_fixer(design_expr, list_fixed_str, eval_envir)
design_expr |
A string. The text of the expression in which you wish to substitute symbols for their set values. |
list_fixed_str |
A string. The string of code that generates a named list of arguments that will be substituted in the evaluated |
eval_envir |
The evaluation environment. Defaults to environment in which design arguments are already evaluated. |
Generates clean code string that reproduces design
construct_design_code( designer, args, args_to_fix = NULL, arguments_as_values = FALSE, exclude_args = NULL )
construct_design_code( designer, args, args_to_fix = NULL, arguments_as_values = FALSE, exclude_args = NULL )
designer |
Designer function. |
args |
Named list of arguments to be passed to designer function. |
args_to_fix |
Vector of strings. Designer arguments to fix in design code. |
arguments_as_values |
Logical. Whether to replace argument names for value. |
exclude_args |
Vector of strings. Name of arguments to be excluded from argument definition at top of design code. |
A 2^k
factorial designer with k
factors assigned with independent probabilities. Results in 2^k
treatment combinations, each with independent, normally distributed shocks. Estimands are average effects and average interactions of given conditions, averaged over other conditions. Estimation uses regression of demeaned variables with propensity weights.
factorial_designer( N = 256, k = 3, outcome_means = rep(0, 2^k), sd = 1, outcome_sds = rep(sd, 2^k), assignment_probs = rep(0.5, k), outcome_name = "Y", treatment_names = NULL, args_to_fix = NULL )
factorial_designer( N = 256, k = 3, outcome_means = rep(0, 2^k), sd = 1, outcome_sds = rep(sd, 2^k), assignment_probs = rep(0.5, k), outcome_name = "Y", treatment_names = NULL, args_to_fix = NULL )
N |
An integer. Size of sample. |
k |
An integer. The number of factors in the design. |
outcome_means |
A numeric vector of length |
sd |
A nonnegative number. Standard deviation for outcomes when all outcomes have identical standard deviations. For outcome-specific standard deviations use |
outcome_sds |
A non negative numeric vector of length |
assignment_probs |
A numeric vector of length |
outcome_name |
A character. Name of outcome variable (defaults to "Y"). Must be provided without spacing inside the function |
treatment_names |
A character vector of length |
args_to_fix |
A character vector. Names of arguments to be args_to_fix in design. By default |
factorial_designer
creates a factorial design with 2^k
treatment combinations resulting from k
factors, each with two conditions each (c(0,1)
). The order of the scalar arguments outcome_means
and outcome_sds
must follow the one returned by expand.grid(rep(list(c(0,1)), k))
, where each of the columns is a treatment.
Estimands are defined for each combination of treatment assignment as linear combinations of potential outcomes, typically weighted averages of differences. Note that the weighting for the estimand does not reflect treatment assignment probabilities but rather weights each possible condition equally.
For example, in a design with factors, the treatment effect of A, (TE_A), averaged over conditions defined by B and C, is given by:
The "average interaction of A and B" — that is the average effect (for a single unit) of A on the effect of B across conditions defined by C — is:
And the triple interaction—that is, the effect of C on the the effect of B on the effect of A is:
where is short for the potential outcome of Y when A is a, B is b, and C is c.
Estimates draw from a regression in which all treatments are demeaned and weighted by the inverse probability of being in the condition they are in. Note that in this demeaned regression the constant captures the average outcome across all conditions — not the outcome when all units are in the control condition. The coefficient on T1 captures the average effect of T1 across other conditions—not the effect of T1 when other conditions are at 0. And so on.
A factorial design.
# A factorial design using default arguments factorial_design <- factorial_designer() # A 2 x 2 x 2 factorial design with unequal probabilities of assignment to # each treatment condition. In this case the estimator weights up by the # conditional probabilities of assignment. factorial_design_2 <- factorial_designer(k = 3, assignment_probs = c(1/2, 1/4, 1/8), outcome_means = c(0,0,0,0,0,0,0,4)) ## Not run: diagnose_design(factorial_design_2) ## End(Not run) # Mapping from outcomes to estimands # The mapping between the potential outcomes schedule and the estimands of # interest is not always easy. To help with intuition consider a 2^3 # factorial design. You might like to think of a data generating process as # a collection of marginal effects and interaction effects mapping from # treatments to outcomes. # For instance: Y = -.25 + .75*X1 - .25*X2 -.25*X3 + X1*X2*X3 # The vector of implied potential outcome means as a function of conditions # could then be generated like this: X <- expand.grid(rep(list(c(0,1)), 3)) outcome_means = -.25 + X[,1]*3/4 - X[,2]/4 - X[,3]/4 + X[,1]*X[,2]*X[,3] outcomes <- cbind(X, outcome_means) colnames(outcomes) <- c("X1", "X2", "X3", "mean") outcomes # Examination of the outcomes in this table reveals that there is an # average outcome of 0 (over all conditions), an average effect of treatment # X1 of 1, an average effects for X2 and X3 of 0, the two way interactions # are .5 (averaged over conditions of the third treatment) and the triple # interaction is 1. # These are exactly the estimands calculated by the designer and returned in # diagnosis. factorial_design_3 <- factorial_designer(k = 3, outcome_means = outcome_means, outcome_sds = rep(.01, 8)) ## Not run: library(DeclareDesign) diagnose_design(factorial_design_3, sims = 10) ## End(Not run)
# A factorial design using default arguments factorial_design <- factorial_designer() # A 2 x 2 x 2 factorial design with unequal probabilities of assignment to # each treatment condition. In this case the estimator weights up by the # conditional probabilities of assignment. factorial_design_2 <- factorial_designer(k = 3, assignment_probs = c(1/2, 1/4, 1/8), outcome_means = c(0,0,0,0,0,0,0,4)) ## Not run: diagnose_design(factorial_design_2) ## End(Not run) # Mapping from outcomes to estimands # The mapping between the potential outcomes schedule and the estimands of # interest is not always easy. To help with intuition consider a 2^3 # factorial design. You might like to think of a data generating process as # a collection of marginal effects and interaction effects mapping from # treatments to outcomes. # For instance: Y = -.25 + .75*X1 - .25*X2 -.25*X3 + X1*X2*X3 # The vector of implied potential outcome means as a function of conditions # could then be generated like this: X <- expand.grid(rep(list(c(0,1)), 3)) outcome_means = -.25 + X[,1]*3/4 - X[,2]/4 - X[,3]/4 + X[,1]*X[,2]*X[,3] outcomes <- cbind(X, outcome_means) colnames(outcomes) <- c("X1", "X2", "X3", "mean") outcomes # Examination of the outcomes in this table reveals that there is an # average outcome of 0 (over all conditions), an average effect of treatment # X1 of 1, an average effects for X2 and X3 of 0, the two way interactions # are .5 (averaged over conditions of the third treatment) and the triple # interaction is 1. # These are exactly the estimands calculated by the designer and returned in # diagnosis. factorial_design_3 <- factorial_designer(k = 3, outcome_means = outcome_means, outcome_sds = rep(.01, 8)) ## Not run: library(DeclareDesign) diagnose_design(factorial_design_3, sims = 10) ## End(Not run)
Get the code from a design
get_design_code(design)
get_design_code(design)
design |
A design that has code as an attribute. |
This is a version of match.call
which also includes default arguments.
match.call.defaults( definition = sys.function(sys.parent()), call = sys.call(sys.parent()), expand.dots = TRUE, envir = parent.frame(2L) )
match.call.defaults( definition = sys.function(sys.parent()), call = sys.call(sys.parent()), expand.dots = TRUE, envir = parent.frame(2L) )
definition |
a function, by default the function from which match.call is called. See details. |
call |
an unevaluated call to the function specified by definition, as generated by call. |
expand.dots |
ogical. Should arguments matching |
envir |
an environment, from which the |
An object of class call.
Neal Fultz
foo <- function(x=NULL,y=NULL,z=4, dots=TRUE, ...) { match.call.defaults(expand.dots=dots) }
foo <- function(x=NULL,y=NULL,z=4, dots=TRUE, ...) { match.call.defaults(expand.dots=dots) }
A mediation analysis design that examines the effect of treatment (Z) on mediator (M) and the effect of mediator (M) on outcome (Y) (given Z=0) as well as direct effect of treatment (Z) on outcome (Y) (given M=0). Analysis is implemented using an interacted regression model. Note this model is not guaranteed to be unbiased despite randomization of Z because of possible violations of sequential ignorability.
mediation_analysis_designer( N = 200, a = 1, b = 0.4, c = 0, d = 0.5, rho = 0, args_to_fix = NULL )
mediation_analysis_designer( N = 200, a = 1, b = 0.4, c = 0, d = 0.5, rho = 0, args_to_fix = NULL )
N |
An integer. Size of sample. |
a |
A number. Parameter governing effect of treatment (Z) on mediator (M). |
b |
A number. Effect of mediator (M) on outcome (Y) when Z = 0. |
c |
A number. Interaction between mediator (M) and (Z) for outcome (Y). |
d |
A number. Direct effect of treatment (Z) on outcome (Y), when M = 0. |
rho |
A number in [-1,1]. Correlation between mediator (M) and outcome (Y) error terms. Non zero correlation implies a violation of sequential ignorability. |
args_to_fix |
A character vector. Names of arguments to be args_to_fix in design. |
See vignette online.
A mediation analysis design.
# Generate a mediation analysis design using default arguments: mediation_1 <- mediation_analysis_designer() draw_estimands(mediation_1) ## Not run: diagnose_design(mediation_1, sims = 1000) ## End(Not run) # A design with a violation of sequential ignorability and heterogeneous effects: mediation_2 <- mediation_analysis_designer(a = 1, rho = .5, c = 1, d = .75) draw_estimands(mediation_2) ## Not run: diagnose_design(mediation_2, sims = 1000) ## End(Not run)
# Generate a mediation analysis design using default arguments: mediation_1 <- mediation_analysis_designer() draw_estimands(mediation_1) ## Not run: diagnose_design(mediation_1, sims = 1000) ## End(Not run) # A design with a violation of sequential ignorability and heterogeneous effects: mediation_2 <- mediation_analysis_designer(a = 1, rho = .5, c = 1, d = .75) draw_estimands(mediation_2) ## Not run: diagnose_design(mediation_2, sims = 1000) ## End(Not run)
Creates a design with m_arms
experimental arms, each assigned with equal probability.
multi_arm_designer( N = 30, m_arms = 3, outcome_means = rep(0, m_arms), sd_i = 1, outcome_sds = rep(0, m_arms), conditions = 1:m_arms, args_to_fix = NULL )
multi_arm_designer( N = 30, m_arms = 3, outcome_means = rep(0, m_arms), sd_i = 1, outcome_sds = rep(0, m_arms), conditions = 1:m_arms, args_to_fix = NULL )
N |
An integer. Sample size. |
m_arms |
An integer. Number of arms. |
outcome_means |
A numeric vector of length |
sd_i |
A nonnegative scalar. Standard deviation of individual-level shock (common across arms). |
outcome_sds |
A nonnegative numeric vector of length |
conditions |
A vector of length |
args_to_fix |
A character vector. Names of arguments to be args_to_fix in design. By default, |
See vignette online.
A function that returns a design.
# To make a design using default arguments: design <- multi_arm_designer() # A design with different means and standard deviations in each arm design <- multi_arm_designer(outcome_means = c(0, 0.5, 2), outcome_sds = c(1, 0.1, 0.5)) design <- multi_arm_designer(N = 80, m_arms = 4, outcome_means = 1:4, args_to_fix = c("outcome_means", "outcome_sds"))
# To make a design using default arguments: design <- multi_arm_designer() # A design with different means and standard deviations in each arm design <- multi_arm_designer(outcome_means = c(0, 0.5, 2), outcome_sds = c(1, 0.1, 0.5)) design <- multi_arm_designer(N = 80, m_arms = 4, outcome_means = 1:4, args_to_fix = c("outcome_means", "outcome_sds"))
Produces a design in which an outcome Y is observed pre- and post-treatment. The design allows for individual post-treatment outcomes to be correlated with pre-treatment outcomes and for at-random missingness in the observation of post-treatment outcomes.
pretest_posttest_designer( N = 100, ate = 0.25, sd_1 = 1, sd_2 = 1, rho = 0.5, attrition_rate = 0.1, args_to_fix = NULL )
pretest_posttest_designer( N = 100, ate = 0.25, sd_1 = 1, sd_2 = 1, rho = 0.5, attrition_rate = 0.1, args_to_fix = NULL )
N |
An integer. Size of sample. |
ate |
A number. Average treatment effect. |
sd_1 |
Nonnegative number. Standard deviation of period 1 shocks. |
sd_2 |
Nonnegative number. Standard deviation of period 2 shocks. |
rho |
A number in [-1,1]. Correlation in outcomes between pre- and post-test. |
attrition_rate |
A number in [0,1]. Proportion of respondents in pre-test data that appear in post-test data. |
args_to_fix |
A character vector. Names of arguments to be args_to_fix in design. |
See vignette online.
A pretest-posttest design.
# Generate a pre-test post-test design using default arguments: pretest_posttest_design <- pretest_posttest_designer()
# Generate a pre-test post-test design using default arguments: pretest_posttest_design <- pretest_posttest_designer()
Builds a design in which two pieces of evidence are sought and used to update about whether X caused Y using Bayes' rule.
process_tracing_designer( N = 100, prob_X = 0.5, process_proportions = c(0.25, 0.25, 0.25, 0.25), prior_H = 0.5, p_E1_H = 0.8, p_E1_not_H = 0.2, p_E2_H = 0.3, p_E2_not_H = 0, cor_E1E2_H = 0, cor_E1E2_not_H = 0, label_E1 = "Straw in the Wind", label_E2 = "Smoking Gun", args_to_fix = NULL )
process_tracing_designer( N = 100, prob_X = 0.5, process_proportions = c(0.25, 0.25, 0.25, 0.25), prior_H = 0.5, p_E1_H = 0.8, p_E1_not_H = 0.2, p_E2_H = 0.3, p_E2_not_H = 0, cor_E1E2_H = 0, cor_E1E2_not_H = 0, label_E1 = "Straw in the Wind", label_E2 = "Smoking Gun", args_to_fix = NULL )
N |
An integer. Size of population of cases from which a single case is selected. |
prob_X |
A number in [0,1]. Probability that X = 1 for a given case (equal throughout population of cases). |
process_proportions |
A vector of numbers in [0,1] that sums to 1. Simplex denoting the proportion of cases in the population in which, respectively: 1) X causes Y; 2) Y occurs regardless of X; 3) X causes the absence of Y; 4) Y is absent regardless of X. |
prior_H |
A number in [0,1]. Prior probability that X causes Y in a given case in which X and Y are both present. |
p_E1_H |
A number in [0,1]. Probability of observing first piece of evidence given hypothesis that X caused Y is true. |
p_E1_not_H |
A number in [0,1]. Probability of observing first piece of evidence given hypothesis that X caused Y is not true. |
p_E2_H |
A number in [0,1]. Probability of observing second piece of evidence given hypothesis that X caused Y is true. |
p_E2_not_H |
A number in [0,1]. Probability of observing second piece of evidence given hypothesis that X caused Y is not true. |
cor_E1E2_H |
A number in [-1,1]. Correlation between first and second pieces of evidence given hypothesis that X caused Y is true. |
cor_E1E2_not_H |
A number in [-1,1]. Correlation between first and second pieces of evidence given hypothesis that X caused Y is not true. |
label_E1 |
A string. Label for the first piece of evidence (e.g., "Straw in the Wind"). |
label_E2 |
A string. Label for the second piece of evidence (e.g., "Smoking Gun"). |
args_to_fix |
A character vector. Names of arguments to be args_to_fix in design. |
The model posits a population of N
cases, each of which does or does not exhibit the presence of some outcome, Y. With probability prob_X
, each case also exhibits the presence or absence of some potential cause, X. The outcome Y can be realized through four distinct causal relations, distributed through the population of cases according to process_proportions
. First, the presence of X might cause Y. Second, the absence of X might cause Y. Third, Y might be present irrespective of X. Fourth, Y might be absent irrespective of X.
Our inquiry is a "cause of effects" question. We wish to know whether a specific case was one in which the presence (absence) of X caused the presence (absence) of Y.
Our data strategy consists of selecting one case at random in which both X and Y are present. As part of the data strategy we seek two pieces of evidence in favor of or against the hypothesized causal relationship, H, in which X causes Y.
The first (second) piece of evidence is observed with probability p_E1_H
(p_E2_H
) when H is true, and with probability p_E1_not_H
(p_E2_not_H
) when H is false.
Conditional on H being true (false), the correlation between the two pieces of evidence is given by cor_E1E2_H
(cor_E1E2_not_H
).
The researcher uses Bayes’ rule to update about the probability that X caused Y given the evidence. In other words, they form a posterior inference, Pr(H|E). We specify four answer strategies for forming this inference. The first simply ignores the evidence and is equivalent to stating a prior belief without doing any causal process tracing. The second conditions inferences only on the first piece of evidence, and the third only on the second piece of evidence. The fourth strategy conditions posterior inferences on both pieces of evidence simultaneously.
We specify as diagnosands for this design the bias, RMSE, mean(estimand), mean(estimate) and sd(estimate).
A process-tracing design.
# Generate a process-tracing design using default arguments: pt_1 <- process_tracing_designer() draw_estimands(pt_1) draw_estimates(pt_1) draw_data(pt_1) ## Not run: diagnose_design(pt_1, sims = 1000) ## End(Not run) # A design in which the smoking gun and straw-in-the-wind are correlated pt_2 <- process_tracing_designer(cor_E1E2_H = .32) ## Not run: diagnose_design(pt_2, sims = 1000) ## End(Not run) # A design with two doubly-decisive tests pointing in opposite directions pt_3 <- process_tracing_designer(p_E1_H = .80,p_E1_not_H = .05, label_E1 = "Doubly-Decisive: H", p_E2_H = .05,p_E2_not_H = .80, label_E2 = "Doubly-Decisive: Not H") draw_estimates(pt_3) ## Not run: diagnose_design(pt_3, sims = 1000) ## End(Not run)
# Generate a process-tracing design using default arguments: pt_1 <- process_tracing_designer() draw_estimands(pt_1) draw_estimates(pt_1) draw_data(pt_1) ## Not run: diagnose_design(pt_1, sims = 1000) ## End(Not run) # A design in which the smoking gun and straw-in-the-wind are correlated pt_2 <- process_tracing_designer(cor_E1E2_H = .32) ## Not run: diagnose_design(pt_2, sims = 1000) ## End(Not run) # A design with two doubly-decisive tests pointing in opposite directions pt_3 <- process_tracing_designer(p_E1_H = .80,p_E1_not_H = .05, label_E1 = "Doubly-Decisive: H", p_E2_H = .05,p_E2_not_H = .80, label_E2 = "Doubly-Decisive: Not H") draw_estimates(pt_3) ## Not run: diagnose_design(pt_3, sims = 1000) ## End(Not run)
Produces a (forced) randomized response design that measures the share of individuals with a given trait prevalence_trait
in a population of size N
. Probability of forced response ("Yes") is given by prob_forced_yes
, and rate at which individuals with trait lie is given by withholding_rate
.
randomized_response_designer( N = 1000, prob_forced_yes = 0.6, prevalence_rate = 0.1, withholding_rate = 0.5, args_to_fix = NULL )
randomized_response_designer( N = 1000, prob_forced_yes = 0.6, prevalence_rate = 0.1, withholding_rate = 0.5, args_to_fix = NULL )
N |
An integer. Size of sample. |
prob_forced_yes |
A number in [0,1]. Probability of a forced yes. |
prevalence_rate |
A number in [0,1]. Probability that individual has the sensitive trait. |
withholding_rate |
A number in [0,1]. Probability that an individual with the sensitive trait hides it. |
args_to_fix |
A character vector. Names of arguments to be args_to_fix in design. |
randomized_response_designer
employs a specific variation of randomized response designs in which respondents are required to report a args_to_fix answer to the sensitive question with a given probability (see Blair, Imai, and Zhou (2015) for alternative applications and estimation strategies).
See vignette online.
A randomized response design.
# Generate a randomized response design using default arguments: randomized_response_design <- randomized_response_designer()
# Generate a randomized response design using default arguments: randomized_response_design <- randomized_response_designer()
Builds a design with sample from population of size N
. The average treatment effect local to the cutpoint is equal to tau
. It allows for specification of the order of the polynomial regression (poly_reg_order
), cutoff value on the running variable (cutoff
), and size of bandwidth around the cutoff (bandwidth
). By providing a vector of numbers to control_coefs
and treatment_coefs
, users can also specify polynomial regression coefficients that generate the expected control and treatment potential outcomes given the running variable.
regression_discontinuity_designer( N = 1000, tau = 0.15, outcome_sd = 0.1, cutoff = 0.5, bandwidth = 0.5, control_coefs = c(0.5, 0.5), treatment_coefs = c(-5, 1), poly_reg_order = 4, args_to_fix = NULL )
regression_discontinuity_designer( N = 1000, tau = 0.15, outcome_sd = 0.1, cutoff = 0.5, bandwidth = 0.5, control_coefs = c(0.5, 0.5), treatment_coefs = c(-5, 1), poly_reg_order = 4, args_to_fix = NULL )
N |
An integer. Size of population to sample from. |
tau |
A number. Difference in potential outcomes functions at the threshold. |
outcome_sd |
A positive number. The standard deviation of the outcome. |
cutoff |
A number in (0,1). Threshold on running variable beyond which units are treated. |
bandwidth |
A number. The value of the bandwidth on both sides of the threshold from which to include units. |
control_coefs |
A vector of numbers. Coefficients for polynomial regression function that generates control potential outcomes. Order of polynomial is equal to length. |
treatment_coefs |
A vector of numbers. Coefficients for polynomial regression function that generates treatment potential outcomes. Order of polynomial is equal to length. |
poly_reg_order |
Integer greater than or equal to 1. Order of the polynomial regression used to estimate the jump at the cutoff. |
args_to_fix |
A character vector. Names of arguments to be args_to_fix in design. |
See vignette online.
A regression discontinuity design.
# Generate a regression discontinuity design using default arguments: regression_discontinuity_design <- regression_discontinuity_designer()
# Generate a regression discontinuity design using default arguments: regression_discontinuity_design <- regression_discontinuity_designer()
Generates character string for non-fixed arguments in a designer using substitution approach.
return_args(args, fixes)
return_args(args, fixes)
args |
Function arguments. |
fixes |
Function arguments that are fixed (i.e., already evaluated in body of function) |
Builds a design with N_groups
groups each containing N_i_group
individuals.
Potential outcomes exhibit spillovers: if any individual in a group receives treatment,
the effect is spread equally among members of the group.
spillover_designer( N_groups = 80, N_i_group = 3, sd_i = 0.2, gamma = 2, args_to_fix = NULL )
spillover_designer( N_groups = 80, N_i_group = 3, sd_i = 0.2, gamma = 2, args_to_fix = NULL )
N_groups |
An integer. Number of groups. |
N_i_group |
Number of units in each group. Can be scalar or vector of length |
sd_i |
A nonnegative number. Standard deviation of individual-level shock. |
gamma |
A number. Parameter that controls whether spillovers within groups substitute or complement each other. See 'Details'. |
args_to_fix |
A character vector. Names of arguments to be args_to_fix in design. |
Parameter gamma
controls interactions between spillover effects.For gamma
=1 for every $1 given to a member of a group, each member receives $1*N_i_group
no matter how many others are already treated.
For gamma
>1 (<1) for every $1 given to a member of a group, each member receives an amount that depends negatively (positively) on the number already treated.
The default estimand is the average difference across subjects between no one treated and only that subject treated.
A simple spillover design.
# Generate a simple spillover design using default arguments: spillover_design <- spillover_designer()
# Generate a simple spillover design using default arguments: spillover_design <- spillover_designer()
Takes substring between matched strings. Avoids dependency on stringr package.
str_within(string, pattern = "^(structure\\()|(, \\.Names)")
str_within(string, pattern = "^(structure\\()|(, \\.Names)")
string |
A string. String from which substring is extracted. |
pattern |
A regular expression that matches the beggining and end of a substring |
Substitute text from expressions in design code
sub_expr_text(code, ...)
sub_expr_text(code, ...)
code |
List contaitining design code. |
... |
List of expressions to be substituted for their text. |
Code with expression text.
Creates a two-arm design with application for when estimand of interest is conditional on a post-treatment outcome (the effect on Y given R) or data is conditionally observed (Y given R). See 'Details' for more information on the data generating process.
two_arm_attrition_designer( N = 100, a_R = 0, b_R = 1, a_Y = 0, b_Y = 1, rho = 0, args_to_fix = NULL )
two_arm_attrition_designer( N = 100, a_R = 0, b_R = 1, a_Y = 0, b_Y = 1, rho = 0, args_to_fix = NULL )
N |
An integer. Size of sample. |
a_R |
A number. Constant in equation relating treatment to responses. |
b_R |
A number. Slope coefficient in equation relating treatment to responses. |
a_Y |
A number. Constant in equation relating treatment to outcome. |
b_Y |
A number. Slope coefficient in equation relating treatment to outcome. |
rho |
A number in [0,1]. Correlation between shocks in equations for R and Y. |
args_to_fix |
A character vector. Names of arguments to be args_to_fix in design. |
The data generating process is of the form:
R ~ (a_R + b_R*Z > u_R)
Y ~ (a_Y + b_Y*Z > u_Y)
where u_R
and u_Y
are joint normally distributed with correlation rho
.
A post-treatment design.
# To make a design using default argument (missing completely at random): two_arm_attrition_design <- two_arm_attrition_designer() ## Not run: diagnose_design(two_arm_attrition_design) ## End(Not run) # Attrition can produce bias even for unconditional ATE even when not # associated with treatment ## Not run: diagnose_design(two_arm_attrition_designer(b_R = 0, rho = .3)) ## End(Not run) # Here the linear estimate using R=1 data is unbiased for # "ATE on Y (Given R)" with b_R = 0 but not when b_R = 1 ## Not run: diagnose_design(redesign(two_arm_attrition_design, b_R = 0:1, rho = .2)) ## End(Not run)
# To make a design using default argument (missing completely at random): two_arm_attrition_design <- two_arm_attrition_designer() ## Not run: diagnose_design(two_arm_attrition_design) ## End(Not run) # Attrition can produce bias even for unconditional ATE even when not # associated with treatment ## Not run: diagnose_design(two_arm_attrition_designer(b_R = 0, rho = .3)) ## End(Not run) # Here the linear estimate using R=1 data is unbiased for # "ATE on Y (Given R)" with b_R = 0 but not when b_R = 1 ## Not run: diagnose_design(redesign(two_arm_attrition_design, b_R = 0:1, rho = .2)) ## End(Not run)
Builds a design with one treatment and one control arm.
Treatment effects can be specified either by providing control_mean
and treatment_mean
or by specifying a control_mean
and ate
.
Non random assignment is specified by a possible correlation, rho_WZ
, between W
and a latent variable that determines the probability of Z
.
Nonignorability is specified by a possible correlation, rho_WY
, between W
and outcome Y
.
two_arm_covariate_designer( N = 100, prob = 0.5, control_mean = 0, sd = 1, ate = 1, h = 0, treatment_mean = control_mean + ate, rho_WY = 0, rho_WZ = 0, args_to_fix = NULL )
two_arm_covariate_designer( N = 100, prob = 0.5, control_mean = 0, sd = 1, ate = 1, h = 0, treatment_mean = control_mean + ate, rho_WY = 0, rho_WZ = 0, args_to_fix = NULL )
N |
An integer. Sample size. |
prob |
A number in [0,1]. Probability of assignment to treatment. |
control_mean |
A number. Average outcome in control. |
sd |
A positive number. Standard deviation of shock on Y. |
ate |
A number. Average treatment effect. |
h |
A number. Controls heterogeneous treatment effects by W. Defaults to 0. |
treatment_mean |
A number. Average outcome in treatment. Overrides |
rho_WY |
A number in [-1,1]. Correlation between W and Y. |
rho_WZ |
A number in [-1,1]. Correlation between W and Z. |
args_to_fix |
A character vector. Names of arguments to be args_to_fix in design. |
Units are assigned to treatment using complete random assignment. Potential outcomes are normally distributed according to the mean and sd arguments.
See vignette online.
A simple two-arm design with covariate W.
#Generate a simple two-arm design using default arguments two_arm_covariate_design <- two_arm_covariate_designer() # Design with no confounding but a prognostic covariate prognostic <- two_arm_covariate_designer(N = 40, ate = .2, rho_WY = .9, h = .5) ## Not run: diagnose_design(prognostic) ## End(Not run) # Design with confounding confounding <- two_arm_covariate_designer(N = 40, ate = 0, rho_WZ = .9, rho_WY = .9, h = .5) ## Not run: diagnose_design(confounding, sims = 2000) ## End(Not run) # Curse of power: A biased design may be more likely to mislead the larger it is curses <- expand_design(two_arm_covariate_designer, N = c(50, 500, 5000), ate = 0, rho_WZ = .2, rho_WY = .2) ## Not run: diagnoses <- diagnose_design(curses) subset(diagnoses$diagnosands_df, estimator == "No controls")[,c("N", "power")] ## End(Not run)
#Generate a simple two-arm design using default arguments two_arm_covariate_design <- two_arm_covariate_designer() # Design with no confounding but a prognostic covariate prognostic <- two_arm_covariate_designer(N = 40, ate = .2, rho_WY = .9, h = .5) ## Not run: diagnose_design(prognostic) ## End(Not run) # Design with confounding confounding <- two_arm_covariate_designer(N = 40, ate = 0, rho_WZ = .9, rho_WY = .9, h = .5) ## Not run: diagnose_design(confounding, sims = 2000) ## End(Not run) # Curse of power: A biased design may be more likely to mislead the larger it is curses <- expand_design(two_arm_covariate_designer, N = c(50, 500, 5000), ate = 0, rho_WZ = .2, rho_WY = .2) ## Not run: diagnoses <- diagnose_design(curses) subset(diagnoses$diagnosands_df, estimator == "No controls")[,c("N", "power")] ## End(Not run)
Builds a design with one treatment and one control arm.
Treatment effects can be specified either by providing control_mean
and treatment_mean
or by specifying a control_mean
and ate
.
two_arm_designer( N = 100, assignment_prob = 0.5, control_mean = 0, control_sd = 1, ate = 1, treatment_mean = control_mean + ate, treatment_sd = control_sd, rho = 1, args_to_fix = NULL )
two_arm_designer( N = 100, assignment_prob = 0.5, control_mean = 0, control_sd = 1, ate = 1, treatment_mean = control_mean + ate, treatment_sd = control_sd, rho = 1, args_to_fix = NULL )
N |
An integer. Sample size. |
assignment_prob |
A number in [0,1]. Probability of assignment to treatment. |
control_mean |
A number. Average outcome in control. |
control_sd |
A positive number. Standard deviation in control. |
ate |
A number. Average treatment effect. |
treatment_mean |
A number. Average outcome in treatment. Overrides |
treatment_sd |
A nonnegative number. Standard deviation in treatment. By default equals |
rho |
A number in [-1,1]. Correlation between treatment and control outcomes. |
args_to_fix |
A character vector. Names of arguments to be args_to_fix in design. |
Units are assigned to treatment using complete random assignment. Potential outcomes are normally distributed according to the mean and sd arguments.
A simple two-arm design.
# Generate a simple two-arm design using default arguments two_arm_design <- two_arm_designer()
# Generate a simple two-arm design using default arguments two_arm_design <- two_arm_designer()
Builds a two-by-two factorial design in which assignments to each factor are independent of each other.
two_by_two_designer( N = 100, prob_A = 0.5, prob_B = 0.5, weight_A = 0.5, weight_B = 0.5, outcome_means = rep(0, 4), mean_A0B0 = outcome_means[1], mean_A0B1 = outcome_means[2], mean_A1B0 = outcome_means[3], mean_A1B1 = outcome_means[4], sd_i = 1, outcome_sds = rep(0, 4), args_to_fix = NULL )
two_by_two_designer( N = 100, prob_A = 0.5, prob_B = 0.5, weight_A = 0.5, weight_B = 0.5, outcome_means = rep(0, 4), mean_A0B0 = outcome_means[1], mean_A0B1 = outcome_means[2], mean_A1B0 = outcome_means[3], mean_A1B1 = outcome_means[4], sd_i = 1, outcome_sds = rep(0, 4), args_to_fix = NULL )
N |
An integer. Size of sample. |
prob_A |
A number in [0,1]. Probability of assignment to treatment A. |
prob_B |
A number in [0,1]. Probability of assignment to treatment B. |
weight_A |
A number. Weight placed on A=1 condition in definition of "average effect of B" estimand. |
weight_B |
A number. Weight placed on B=1 condition in definition of "average effect of A" estimand. |
outcome_means |
A vector of length 4. Average outcome in each A,B condition, in order AB = 00, 01, 10, 11. Values overridden by mean_A0B0, mean_A0B1, mean_A1B0, mean_A1B1, if provided. |
mean_A0B0 |
A number. Mean outcome in A=0, B=0 condition. |
mean_A0B1 |
A number. Mean outcome in A=0, B=1 condition. |
mean_A1B0 |
A number. Mean outcome in A=1, B=0 condition. |
mean_A1B1 |
A number. Mean outcome in A=1, B=1 condition. |
sd_i |
A nonnegative scalar. Standard deviation of individual-level shock (common across arms). |
outcome_sds |
A nonnegative vector of length 4. Standard deviation of (additional) unit level shock in each condition, in order AB = 00, 01, 10, 11. |
args_to_fix |
A character vector. Names of arguments to be args_to_fix in design. |
Three types of estimand are declared. First, weighted averages of the average treatment effects of each treatment, given the two conditions of the other treatments. Second and third, the difference in treatment effects of each treatment, given the conditions of the other treatment.
Units are assigned to treatment using complete random assignment. Potential outcomes follow a normal distribution.
Treatment A is assigned first and then Treatment B within blocks defined by treatment A. Thus, if there are 6 units 3 are guaranteed to receive treatment A but the number receiving treatment B is stochastic.
See multi_arm_designer
for a factorial design with non independent assignments.
A two-by-two factorial design.
design <- two_by_two_designer(outcome_means = c(0,0,0,1)) # A design biased for the specified estimands: design <- two_by_two_designer(outcome_means = c(0,0,0,1), prob_A = .8, prob_B = .2) ## Not run: diagnose_design(design) ## End(Not run) # A design with estimands that "match" the assignment: design <- two_by_two_designer(outcome_means = c(0,0,0,1), prob_A = .8, prob_B = .2, weight_A = .8, weight_B = .2) ## Not run: diagnose_design(design) ## End(Not run) # Compare power with and without interactions, given same average effects in each arm designs <- redesign(two_by_two_designer(), outcome_means = list(c(0,0,0,1), c(0,.5,.5,1))) ## Not run: diagnose_design(designs) ## End(Not run)
design <- two_by_two_designer(outcome_means = c(0,0,0,1)) # A design biased for the specified estimands: design <- two_by_two_designer(outcome_means = c(0,0,0,1), prob_A = .8, prob_B = .2) ## Not run: diagnose_design(design) ## End(Not run) # A design with estimands that "match" the assignment: design <- two_by_two_designer(outcome_means = c(0,0,0,1), prob_A = .8, prob_B = .2, weight_A = .8, weight_B = .2) ## Not run: diagnose_design(design) ## End(Not run) # Compare power with and without interactions, given same average effects in each arm designs <- redesign(two_by_two_designer(), outcome_means = list(c(0,0,0,1), c(0,.5,.5,1))) ## Not run: diagnose_design(designs) ## End(Not run)