Title: | Fast Estimators for Design-Based Inference |
---|---|
Description: | Fast procedures for small set of commonly-used, design-appropriate estimators with robust standard errors and confidence intervals. Includes estimators for linear regression, instrumental variables regression, difference-in-means, Horvitz-Thompson estimation, and regression improving precision of experimental estimates by interacting treatment with centered pre-treatment covariates introduced by Lin (2013) <doi:10.1214/12-AOAS583>. |
Authors: | Graeme Blair [aut, cre], Jasper Cooper [aut], Alexander Coppock [aut], Macartan Humphreys [aut], Luke Sonnet [aut], Neal Fultz [ctb], Lily Medina [ctb], Russell Lenth [ctb] |
Maintainer: | Graeme Blair <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.2 |
Built: | 2024-10-12 06:12:07 UTC |
Source: | https://github.com/declaredesign/estimatr |
A dataset containing the data to replicate: Lin, Winston. 2013. "Agnostic notes on regression adjustments to experimental data: Reexamining Freedman's critique." The Annals of Applied Statistics. Stat. 7(1): 295-318. doi:10.1214/12-AOAS583. https://projecteuclid.org/euclid.aoas/1365527200.
alo_star_men
alo_star_men
A data frame with educational treatments and outcomes:
high school GPA
financial incentives and support treatment
support only treatment
college GPA year 1
college GPA year 2
This data was originally taken from the following paper, subset to men who showed up to college, were in one of the arms with the support condition, and had GPA data for their first year in college.
Angrist, Joshua, Daniel Lang, and Philip Oreopoulos. 2009. "Incentives and Services for College Achievement: Evidence from a Randomized Trial." American Economic Journal: Applied Economics 1(1): 136-63. https://www.aeaweb.org/articles?id=10.1257/app.1.1.136
https://www.aeaweb.org/articles?id=10.1257/app.1.1.136
Build lm_robust object from lm fit
commarobust(model, se_type = NULL, clusters = NULL, ci = TRUE, alpha = 0.05)
commarobust(model, se_type = NULL, clusters = NULL, ci = TRUE, alpha = 0.05)
model |
an lm model object |
se_type |
The sort of standard error sought. If |
clusters |
A vector corresponding to the clusters in the data. |
ci |
logical. Whether to compute and return p-values and confidence intervals, TRUE by default. |
alpha |
The significance level, 0.05 by default. |
an lm_robust
object.
lmo <- lm(mpg ~ hp, data = mtcars) # Default HC2 commarobust(lmo) commarobust(lmo, se_type = "HC3") commarobust(lmo, se_type = "stata", clusters = mtcars$carb)
lmo <- lm(mpg ~ hp, data = mtcars) # Default HC2 commarobust(lmo) commarobust(lmo, se_type = "HC3") commarobust(lmo, se_type = "stata", clusters = mtcars$carb)
Builds condition probability matrices for Horvitz-Thompson estimation from randomizr declaration
declaration_to_condition_pr_mat( ra_declaration, condition1 = NULL, condition2 = NULL, prob_matrix = NULL )
declaration_to_condition_pr_mat( ra_declaration, condition1 = NULL, condition2 = NULL, prob_matrix = NULL )
ra_declaration |
An object of class |
condition1 |
The name of the first condition, often the control group. If |
condition2 |
The name of the second condition, often the treatment group. If |
prob_matrix |
An optional probability matrix to override the one in
|
This function takes a "ra_declaration"
, generated
by the declare_ra
function in randomizr and
returns a 2n*2n matrix that can be used to fully specify the design for
horvitz_thompson
estimation. This is done by passing this
matrix to the condition_pr_mat
argument of
horvitz_thompson
.
Currently, this function can learn the condition probability matrix for a wide variety of randomizations: simple, complete, simple clustered, complete clustered, blocked, block-clustered.
A condition probability matrix is made up of four submatrices, each of which corresponds to the joint and marginal probability that each observation is in one of the two treatment conditions.
The upper-left quadrant is an n*n matrix. On the diagonal is the marginal probability of being in condition 1, often control, for every unit (Pr(Z_i = Condition1) where Z represents the vector of treatment conditions). The off-diagonal elements are the joint probabilities of each unit being in condition 1 with each other unit, Pr(Z_i = Condition1, Z_j = Condition1) where i indexes the rows and j indexes the columns.
The upper-right quadrant is also an n*n matrix. On the diagonal is the joint probability of a unit being in condition 1 and condition 2, often the treatment, and thus is always 0. The off-diagonal elements are the joint probability of unit i being in condition 1 and unit j being in condition 2, Pr(Z_i = Condition1, Z_j = Condition2).
The lower-left quadrant is also an n*n matrix. On the diagonal is the joint probability of a unit being in condition 1 and condition 2, and thus is always 0. The off-diagonal elements are the joint probability of unit i being in condition 2 and unit j being in condition 1, Pr(Z_i = Condition2, Z_j = Condition1).
The lower-right quadrant is an n*n matrix. On the diagonal is the marginal probability of being in condition 2, often treatment, for every unit (Pr(Z_i = Condition2)). The off-diagonal elements are the joint probability of each unit being in condition 2 together, Pr(Z_i = Condition2, Z_j = Condition2).
a numeric 2n*2n matrix of marginal and joint condition treatment
probabilities to be passed to the condition_pr_mat
argument of
horvitz_thompson
. See details.
permutations_to_condition_pr_mat
# Learn condition probability matrix from complete blocked design library(randomizr) n <- 100 dat <- data.frame( blocks = sample(letters[1:10], size = n, replace = TRUE), y = rnorm(n) ) # Declare complete blocked randomization bl_declaration <- declare_ra(blocks = dat$blocks, prob = 0.4, simple = FALSE) # Get probabilities block_pr_mat <- declaration_to_condition_pr_mat(bl_declaration, 0, 1) # Do randomiztion dat$z <- conduct_ra(bl_declaration) horvitz_thompson(y ~ z, data = dat, condition_pr_mat = block_pr_mat) # When you pass a declaration to horvitz_thompson, this function is called # Equivalent to above call horvitz_thompson(y ~ z, data = dat, ra_declaration = bl_declaration)
# Learn condition probability matrix from complete blocked design library(randomizr) n <- 100 dat <- data.frame( blocks = sample(letters[1:10], size = n, replace = TRUE), y = rnorm(n) ) # Declare complete blocked randomization bl_declaration <- declare_ra(blocks = dat$blocks, prob = 0.4, simple = FALSE) # Get probabilities block_pr_mat <- declaration_to_condition_pr_mat(bl_declaration, 0, 1) # Do randomiztion dat$z <- conduct_ra(bl_declaration) horvitz_thompson(y ~ z, data = dat, condition_pr_mat = block_pr_mat) # When you pass a declaration to horvitz_thompson, this function is called # Equivalent to above call horvitz_thompson(y ~ z, data = dat, ra_declaration = bl_declaration)
Difference-in-means estimators that selects the appropriate point estimate, standard errors, and degrees of freedom for a variety of designs: unit randomized, cluster randomized, block randomized, block-cluster randomized, matched-pairs, and matched-pair cluster randomized designs
difference_in_means( formula, data, blocks, clusters, weights, subset, se_type = c("default", "none"), condition1 = NULL, condition2 = NULL, ci = TRUE, alpha = 0.05 )
difference_in_means( formula, data, blocks, clusters, weights, subset, se_type = c("default", "none"), condition1 = NULL, condition2 = NULL, ci = TRUE, alpha = 0.05 )
formula |
an object of class formula, as in |
data |
A |
blocks |
An optional bare (unquoted) name of the block variable. Use for blocked designs only. |
clusters |
An optional bare (unquoted) name of the variable that corresponds to the clusters in the data; used for cluster randomized designs. For blocked designs, clusters must nest within blocks. |
weights |
the bare (unquoted) names of the weights variable in the supplied data. |
subset |
An optional bare (unquoted) expression specifying a subset of observations to be used. |
se_type |
An optional string that can be one of |
condition1 |
value in the treatment vector of the condition
to be the control. Effects are
estimated with |
condition2 |
value in the treatment vector of the condition to be the
treatment. See |
ci |
logical. Whether to compute and return p-values and confidence intervals, TRUE by default. |
alpha |
The significance level, 0.05 by default. |
This function implements a difference-in-means estimator, with support for blocked, clustered, matched-pairs, block-clustered, and matched-pair clustered designs. One specifies their design by passing the blocks and clusters in their data and this function chooses which estimator is most appropriate.
If you pass only blocks
, if all blocks are of size two, we will
infer that the design is a matched-pairs design. If they are all size four
or larger, we will infer that it is a regular blocked design. If you pass
both blocks
and clusters
, we will similarly
infer whether it is a matched-pairs clustered design or a block-clustered
design the number of clusters per block. If the user passes only
clusters
, we will infer that the design was cluster-randomized. If
the user specifies neither the blocks
nor the clusters
,
a regular Welch's t-test will be performed.
Importantly, if the user specifies weights, the estimation is handed off
to lm_robust
with the appropriate robust standard errors
as weighted difference-in-means estimators are not implemented here.
More details of the about each of the estimators can be found in the
mathematical notes.
Returns an object of class "difference_in_means"
.
The post-estimation commands functions summary
and tidy
return results in a data.frame
. To get useful data out of the return,
you can use these data frames, you can use the resulting list directly, or
you can use the generic accessor functions coef
and
confint
.
An object of class "difference_in_means"
is a list containing at
least the following components:
coefficients |
the estimated difference in means |
std.error |
the estimated standard error |
statistic |
the t-statistic |
df |
the estimated degrees of freedom |
p.value |
the p-value from a two-sided t-test using |
conf.low |
the lower bound of the |
conf.high |
the upper bound of the |
term |
a character vector of coefficient names |
alpha |
the significance level specified by the user |
N |
the number of observations used |
outcome |
the name of the outcome variable |
design |
the name of the design learned from the arguments passed |
Gerber, Alan S, and Donald P Green. 2012. Field Experiments: Design, Analysis, and Interpretation. New York: W.W. Norton.
Imai, Kosuke, Gary King, Clayton Nall. 2009. "The Essential Role of Pair Matching in Cluster-Randomized Experiments, with Application to the Mexican Universal Health Insurance Evaluation." Statistical Science 24 (1). Institute of Mathematical Statistics: 29-53. doi:10.1214/08-STS274.
library(fabricatr) library(randomizr) # Get appropriate standard errors for unit-randomized designs # ---------- # 1. Unit randomized # ---------- dat <- fabricate( N = 100, Y = rnorm(100), Z_comp = complete_ra(N, prob = 0.4), ) table(dat$Z_comp) difference_in_means(Y ~ Z_comp, data = dat) # ---------- # 2. Cluster randomized # ---------- # Accurates estimates and standard errors for clustered designs dat$clust <- sample(20, size = nrow(dat), replace = TRUE) dat$Z_clust <- cluster_ra(dat$clust, prob = 0.6) table(dat$Z_clust, dat$clust) summary(difference_in_means(Y ~ Z_clust, clusters = clust, data = dat)) # ---------- # 3. Block randomized # ---------- dat$block <- rep(1:10, each = 10) dat$Z_block <- block_ra(dat$block, prob = 0.5) table(dat$Z_block, dat$block) difference_in_means(Y ~ Z_block, blocks = block, data = dat) # ---------- # 4. Block cluster randomized # ---------- # Learns this design if there are two clusters per block dat$small_clust <- rep(1:50, each = 2) dat$big_blocks <- rep(1:5, each = 10) dat$Z_blcl <- block_and_cluster_ra( blocks = dat$big_blocks, clusters = dat$small_clust ) difference_in_means( Y ~ Z_blcl, blocks = big_blocks, clusters = small_clust, data = dat ) # ---------- # 5. Matched-pairs # ---------- # Matched-pair estimates and standard errors are also accurate # Specified same as blocked design, function learns that # it is matched pair from size of blocks! dat$pairs <- rep(1:50, each = 2) dat$Z_pairs <- block_ra(dat$pairs, prob = 0.5) table(dat$pairs, dat$Z_pairs) difference_in_means(Y ~ Z_pairs, blocks = pairs, data = dat) # ---------- # 6. Matched-pair cluster randomized # ---------- # Learns this design if there are two clusters per block dat$small_clust <- rep(1:50, each = 2) dat$cluster_pairs <- rep(1:25, each = 4) table(dat$cluster_pairs, dat$small_clust) dat$Z_mpcl <- block_and_cluster_ra( blocks = dat$cluster_pairs, clusters = dat$small_clust ) difference_in_means( Y ~ Z_mpcl, blocks = cluster_pairs, clusters = small_clust, data = dat ) # ---------- # Other examples # ---------- # Also works with multi-valued treatments if users specify # comparison of interest dat$Z_multi <- simple_ra( nrow(dat), conditions = c("Treatment 2", "Treatment 1", "Control"), prob_each = c(0.4, 0.4, 0.2) ) # Only need to specify which condition is treated `condition2` and # which is control `condition1` difference_in_means( Y ~ Z_multi, condition1 = "Treatment 2", condition2 = "Control", data = dat ) difference_in_means( Y ~ Z_multi, condition1 = "Treatment 1", condition2 = "Control", data = dat ) # Specifying weights will result in estimation via lm_robust() dat$w <- runif(nrow(dat)) difference_in_means(Y ~ Z_comp, weights = w, data = dat) lm_robust(Y ~ Z_comp, weights = w, data = dat)
library(fabricatr) library(randomizr) # Get appropriate standard errors for unit-randomized designs # ---------- # 1. Unit randomized # ---------- dat <- fabricate( N = 100, Y = rnorm(100), Z_comp = complete_ra(N, prob = 0.4), ) table(dat$Z_comp) difference_in_means(Y ~ Z_comp, data = dat) # ---------- # 2. Cluster randomized # ---------- # Accurates estimates and standard errors for clustered designs dat$clust <- sample(20, size = nrow(dat), replace = TRUE) dat$Z_clust <- cluster_ra(dat$clust, prob = 0.6) table(dat$Z_clust, dat$clust) summary(difference_in_means(Y ~ Z_clust, clusters = clust, data = dat)) # ---------- # 3. Block randomized # ---------- dat$block <- rep(1:10, each = 10) dat$Z_block <- block_ra(dat$block, prob = 0.5) table(dat$Z_block, dat$block) difference_in_means(Y ~ Z_block, blocks = block, data = dat) # ---------- # 4. Block cluster randomized # ---------- # Learns this design if there are two clusters per block dat$small_clust <- rep(1:50, each = 2) dat$big_blocks <- rep(1:5, each = 10) dat$Z_blcl <- block_and_cluster_ra( blocks = dat$big_blocks, clusters = dat$small_clust ) difference_in_means( Y ~ Z_blcl, blocks = big_blocks, clusters = small_clust, data = dat ) # ---------- # 5. Matched-pairs # ---------- # Matched-pair estimates and standard errors are also accurate # Specified same as blocked design, function learns that # it is matched pair from size of blocks! dat$pairs <- rep(1:50, each = 2) dat$Z_pairs <- block_ra(dat$pairs, prob = 0.5) table(dat$pairs, dat$Z_pairs) difference_in_means(Y ~ Z_pairs, blocks = pairs, data = dat) # ---------- # 6. Matched-pair cluster randomized # ---------- # Learns this design if there are two clusters per block dat$small_clust <- rep(1:50, each = 2) dat$cluster_pairs <- rep(1:25, each = 4) table(dat$cluster_pairs, dat$small_clust) dat$Z_mpcl <- block_and_cluster_ra( blocks = dat$cluster_pairs, clusters = dat$small_clust ) difference_in_means( Y ~ Z_mpcl, blocks = cluster_pairs, clusters = small_clust, data = dat ) # ---------- # Other examples # ---------- # Also works with multi-valued treatments if users specify # comparison of interest dat$Z_multi <- simple_ra( nrow(dat), conditions = c("Treatment 2", "Treatment 1", "Control"), prob_each = c(0.4, 0.4, 0.2) ) # Only need to specify which condition is treated `condition2` and # which is control `condition1` difference_in_means( Y ~ Z_multi, condition1 = "Treatment 2", condition2 = "Control", data = dat ) difference_in_means( Y ~ Z_multi, condition1 = "Treatment 1", condition2 = "Control", data = dat ) # Specifying weights will result in estimation via lm_robust() dat$w <- runif(nrow(dat)) difference_in_means(Y ~ Z_comp, weights = w, data = dat) lm_robust(Y ~ Z_comp, weights = w, data = dat)
Fast procedures for small set of commonly-used, design-appropriate estimators with robust standard errors and confidence intervals. Includes estimators for linear regression, instrumental variables regression, difference-in-means, Horvitz-Thompson estimation, and regression improving precision of experimental estimates by interacting treatment with centered pre-treatment covariates introduced by Lin (2013) <doi:10.1214/12-AOAS583>.
Maintainer: Graeme Blair [email protected]
Authors:
Jasper Cooper [email protected]
Alexander Coppock [email protected]
Macartan Humphreys [email protected]
Luke Sonnet [email protected]
Other contributors:
Neal Fultz [email protected] [contributor]
Lily Medina [email protected] [contributor]
Russell Lenth [email protected] [contributor]
Useful links:
Report bugs at https://github.com/DeclareDesign/estimatr/issues
Glance at an estimatr object
## S3 method for class 'lm_robust' glance(x, ...) ## S3 method for class 'lh_robust' glance(x, ...) ## S3 method for class 'iv_robust' glance(x, ...) ## S3 method for class 'difference_in_means' glance(x, ...) ## S3 method for class 'horvitz_thompson' glance(x, ...)
## S3 method for class 'lm_robust' glance(x, ...) ## S3 method for class 'lh_robust' glance(x, ...) ## S3 method for class 'iv_robust' glance(x, ...) ## S3 method for class 'difference_in_means' glance(x, ...) ## S3 method for class 'horvitz_thompson' glance(x, ...)
x |
An object returned by one of the estimators |
... |
extra arguments (not used) |
For glance.lm_robust
, a data.frame with columns:
r.squared |
the
where |
adj.r.squared |
the |
se_type |
the standard error type specified by the user |
statistic |
the value of the F-statistic |
p.value |
p-value from the F test |
df.residual |
residual degrees of freedom |
nobs |
the number of observations used |
For glance.lh_robust
, we glance the lm_robust
component only. You can access the linear hypotheses as a data.frame directy from the lh
component of the lh_robust
object
For glance.iv_robust
, a data.frame with columns:
r.squared |
The |
adj.r.squared |
The |
df.residual |
residual degrees of freedom |
N |
the number of observations used |
se_type |
the standard error type specified by the user |
statistic |
the value of the F-statistic |
p.value |
p-value from the F test |
statistic.weakinst |
the value of the first stage F-statistic, useful for the weak instruments test; only reported if there is only one endogenous variable |
p.value.weakinst |
p-value from the first-stage F test, a test of weak instruments; only reported if there is only one endogenous variable |
statistic.endogeneity |
the value of the F-statistic for the test of endogeneity; often called the Wu-Hausman statistic, with robust standard errors, we employ the regression based test |
p.value.endogeneity |
p-value from the F-test for endogeneity |
statistic.overid |
the value of the chi-squared statistic for the test of instrument correlation with the error term; only reported with overidentification |
p.value.overid |
p-value from the chi-squared test; only reported with overidentification |
For glance.difference_in_means
, a data.frame with columns:
design |
the design used, and therefore the estimator used |
df |
the degrees of freedom |
nobs |
the number of observations used |
nblocks |
the number of blocks, if used |
nclusters |
the number of clusters, if used |
condition2 |
the second, "treatment", condition |
condition1 |
the first, "control", condition |
For glance.horvitz_thompson
, a data.frame with columns:
nobs |
the number of observations used |
se_type |
the type of standard error estimator used |
condition2 |
the second, "treatment", condition |
condition1 |
the first, "control", condition |
generics::glance()
, lm_robust()
, lm_lin()
, iv_robust()
, difference_in_means()
, horvitz_thompson()
Tidy an estimatr object
## S3 method for class 'lm_robust' tidy(x, conf.int = TRUE, conf.level = NULL, ...) ## S3 method for class 'iv_robust' tidy(x, conf.int = TRUE, conf.level = NULL, ...) ## S3 method for class 'difference_in_means' tidy(x, conf.int = TRUE, conf.level = NULL, ...) ## S3 method for class 'horvitz_thompson' tidy(x, conf.int = TRUE, conf.level = NULL, ...) ## S3 method for class 'lh_robust' tidy(x, conf.int = TRUE, conf.level = NULL, ...) ## S3 method for class 'lh' tidy(x, conf.int = TRUE, conf.level = NULL, ...)
## S3 method for class 'lm_robust' tidy(x, conf.int = TRUE, conf.level = NULL, ...) ## S3 method for class 'iv_robust' tidy(x, conf.int = TRUE, conf.level = NULL, ...) ## S3 method for class 'difference_in_means' tidy(x, conf.int = TRUE, conf.level = NULL, ...) ## S3 method for class 'horvitz_thompson' tidy(x, conf.int = TRUE, conf.level = NULL, ...) ## S3 method for class 'lh_robust' tidy(x, conf.int = TRUE, conf.level = NULL, ...) ## S3 method for class 'lh' tidy(x, conf.int = TRUE, conf.level = NULL, ...)
x |
An object returned by one of the estimators |
conf.int |
Logical indicating whether or not to include a confidence interval in the tidied output. Defaults to ‘TRUE’. |
conf.level |
The confidence level to use for the confidence interval if ‘conf.int = TRUE’. Must be strictly greater than 0 and less than 1. Defaults to 0.95, which corresponds to a 95 percent confidence interval. |
... |
extra arguments (not used) |
A data.frame with columns for coefficient names, estimates, standard errors, confidence intervals, p-values, degrees of freedom, and the name of the outcome variable
generics::tidy()
, lm_robust()
, iv_robust()
, difference_in_means()
, horvitz_thompson()
Prepares a "lm_robust"
or "iv_robust"
object for the texreg
package. This is largely a clone of the extract.lm
method.
extract.robust_default( model, include.ci = TRUE, include.rsquared = TRUE, include.adjrs = TRUE, include.nobs = TRUE, include.fstatistic = FALSE, include.rmse = TRUE, include.nclusts = TRUE, ... ) extract.lm_robust( model, include.ci = TRUE, include.rsquared = TRUE, include.adjrs = TRUE, include.nobs = TRUE, include.fstatistic = FALSE, include.rmse = TRUE, include.nclusts = TRUE, ... ) extract.iv_robust( model, include.ci = TRUE, include.rsquared = TRUE, include.adjrs = TRUE, include.nobs = TRUE, include.fstatistic = FALSE, include.rmse = TRUE, include.nclusts = TRUE, ... )
extract.robust_default( model, include.ci = TRUE, include.rsquared = TRUE, include.adjrs = TRUE, include.nobs = TRUE, include.fstatistic = FALSE, include.rmse = TRUE, include.nclusts = TRUE, ... ) extract.lm_robust( model, include.ci = TRUE, include.rsquared = TRUE, include.adjrs = TRUE, include.nobs = TRUE, include.fstatistic = FALSE, include.rmse = TRUE, include.nclusts = TRUE, ... ) extract.iv_robust( model, include.ci = TRUE, include.rsquared = TRUE, include.adjrs = TRUE, include.nobs = TRUE, include.fstatistic = FALSE, include.rmse = TRUE, include.nclusts = TRUE, ... )
model |
an object of class |
include.ci |
logical. Defaults to TRUE |
include.rsquared |
logical. Defaults to TRUE |
include.adjrs |
logical. Defaults to TRUE |
include.nobs |
logical. Defaults to TRUE |
include.fstatistic |
logical. Defaults to TRUE |
include.rmse |
logical. Defaults to TRUE |
include.nclusts |
logical. Defaults to TRUE if clusters in |
... |
unused |
Generate condition probability matrix given clusters and probabilities
gen_pr_matrix_cluster(clusters, treat_probs, simple)
gen_pr_matrix_cluster(clusters, treat_probs, simple)
clusters |
A vector of clusters |
treat_probs |
A vector of treatment (condition 2) probabilities |
simple |
A boolean for whether the assignment is a random sample assignment (TRUE, default) or complete random assignment (FALSE) |
a numeric 2n*2n matrix of marginal and joint condition treatment
probabilities to be passed to the condition_pr_mat
argument of
horvitz_thompson
.
declaration_to_condition_pr_mat
Horvitz-Thompson estimators that are unbiased for designs in which the randomization scheme is known
horvitz_thompson( formula, data, blocks, clusters, simple = NULL, condition_prs, condition_pr_mat = NULL, ra_declaration = NULL, subset, condition1 = NULL, condition2 = NULL, se_type = c("youngs", "constant", "none"), ci = TRUE, alpha = 0.05, return_condition_pr_mat = FALSE )
horvitz_thompson( formula, data, blocks, clusters, simple = NULL, condition_prs, condition_pr_mat = NULL, ra_declaration = NULL, subset, condition1 = NULL, condition2 = NULL, se_type = c("youngs", "constant", "none"), ci = TRUE, alpha = 0.05, return_condition_pr_mat = FALSE )
formula |
an object of class formula, as in |
data |
A data.frame. |
blocks |
An optional bare (unquoted) name of the block variable. Use for blocked designs only. See details. |
clusters |
An optional bare (unquoted) name of the variable that corresponds to the clusters in the data; used for cluster randomized designs. For blocked designs, clusters must be within blocks. |
simple |
logical, optional. Whether the randomization is simple
(TRUE) or complete (FALSE). This is ignored if |
condition_prs |
An optional bare (unquoted) name of the variable with the condition 2 (treatment) probabilities. See details. May also use a single number for the condition 2 probability if it is constant. |
condition_pr_mat |
An optional 2n * 2n matrix of marginal and joint probabilities of all units in condition1 and condition2. See details. |
ra_declaration |
An object of class |
subset |
An optional bare (unquoted) expression specifying a subset of observations to be used. |
condition1 |
value in the treatment vector of the condition
to be the control. Effects are
estimated with |
condition2 |
value in the treatment vector of the condition to be the
treatment. See |
se_type |
can be one of |
ci |
logical. Whether to compute and return p-values and confidence intervals, TRUE by default. |
alpha |
The significance level, 0.05 by default. |
return_condition_pr_mat |
logical. Whether to return the condition probability matrix. Returns NULL if the design is simple randomization, FALSE by default. |
This function implements the Horvitz-Thompson estimator for treatment effects for two-armed trials. This estimator is useful for estimating unbiased treatment effects given any randomization scheme as long as the randomization scheme is known.
In short, the Horvitz-Thompson estimator essentially reweights each unit by the probability of it being in its observed condition. Pivotal to the estimation of treatment effects using this estimator are the marginal condition probabilities (i.e., the probability that any one unit is in a particular treatment condition). Pivotal to the estimating the variance variance whenever the design is more complicated than simple randomization, are the joint condition probabilities (i.e., the probabilities that any two units have a particular set of treatment conditions, either the same or different). The estimator we provide here considers the case with two treatment conditions.
Users interested in more details can see the mathematical notes for more information and references, or see the references below.
There are three distinct ways that users can specify the design to the
function. The preferred way is to use
the declare_ra
function in the randomizr
package. This function takes several arguments, including blocks, clusters,
treatment probabilities, whether randomization is simple or not, and more.
Passing the outcome of that function, an object of class
"ra_declaration"
to the ra_declaration
argument in this function,
will lead to a call of the declaration_to_condition_pr_mat
function which generates the condition probability matrix needed to
estimate treatment effects and standard errors. We provide many examples
below of how this could be done.
The second way is to pass the names of vectors in your data
to
condition_prs
, blocks
, and clusters
. You can further
specify whether the randomization was simple or complete using the simple
argument. Note that if blocks
are specified the randomization is
always treated as complete. From these vectors, the function learns how to
build the condition probability matrix that is used in estimation.
In the case
where condition_prs
is specified, this function assumes those
probabilities are the marginal probability that each unit is in condition2
and then uses the other arguments (blocks
, clusters
,
simple
) to learn the rest of the design. If users do not pass
condition_prs
, this function learns the probability of being in
condition2 from the data. That is, none of these arguments are specified,
we assume that there was a simple randomization where the probability
of each unit being in condition2 was the average of all units in condition2.
Similarly, we learn the block-level probability of treatment within
blocks
by looking at the mean number of units in condition2 if
condition_prs
is not specified.
The third way is to pass a condition_pr_mat
directly. One can
see more about this object in the documentation for
declaration_to_condition_pr_mat
and
permutations_to_condition_pr_mat
. Essentially, this 2n * 2n
matrix allows users to specify marginal and joint marginal probabilities
of units being in conditions 1 and 2 of arbitrary complexity. Users should
only use this option if they are certain they know what they are doing.
Returns an object of class "horvitz_thompson"
.
The post-estimation commands functions summary
and tidy
return results in a data.frame
. To get useful data out of the return,
you can use these data frames, you can use the resulting list directly, or
you can use the generic accessor functions coef
and
confint
.
An object of class "horvitz_thompson"
is a list containing at
least the following components:
coefficients |
the estimated difference in totals |
std.error |
the estimated standard error |
statistic |
the z-statistic |
df |
the estimated degrees of freedom |
p.value |
the p-value from a two-sided z-test using |
conf.low |
the lower bound of the |
conf.high |
the upper bound of the |
term |
a character vector of coefficient names |
alpha |
the significance level specified by the user |
nobs |
the number of observations used |
outcome |
the name of the outcome variable |
condition_pr_mat |
the condition probability matrix if |
Aronow, Peter M, and Joel A Middleton. 2013. "A Class of Unbiased Estimators of the Average Treatment Effect in Randomized Experiments." Journal of Causal Inference 1 (1): 135-54. doi:10.1515/jci-2012-0009.
Aronow, Peter M, and Cyrus Samii. 2017. "Estimating Average Causal Effects Under Interference Between Units." Annals of Applied Statistics, forthcoming. https://arxiv.org/abs/1305.6156v3.
Middleton, Joel A, and Peter M Aronow. 2015. "Unbiased Estimation of the Average Treatment Effect in Cluster-Randomized Experiments." Statistics, Politics and Policy 6 (1-2): 39-75. doi:10.1515/spp-2013-0002.
# Set seed set.seed(42) # Simulate data n <- 10 dat <- data.frame(y = rnorm(n)) library(randomizr) #---------- # 1. Simple random assignment #---------- dat$p <- 0.5 dat$z <- rbinom(n, size = 1, prob = dat$p) # If you only pass condition_prs, we assume simple random sampling horvitz_thompson(y ~ z, data = dat, condition_prs = p) # Assume constant effects instead horvitz_thompson(y ~ z, data = dat, condition_prs = p, se_type = "constant") # Also can use randomizr to pass a declaration srs_declaration <- declare_ra(N = nrow(dat), prob = 0.5, simple = TRUE) horvitz_thompson(y ~ z, data = dat, ra_declaration = srs_declaration) #---------- # 2. Complete random assignment #---------- dat$z <- sample(rep(0:1, each = n/2)) # Can use a declaration crs_declaration <- declare_ra(N = nrow(dat), prob = 0.5, simple = FALSE) horvitz_thompson(y ~ z, data = dat, ra_declaration = crs_declaration) # Can precompute condition_pr_mat and pass it # (faster for multiple runs with same condition probability matrix) crs_pr_mat <- declaration_to_condition_pr_mat(crs_declaration) horvitz_thompson(y ~ z, data = dat, condition_pr_mat = crs_pr_mat) #---------- # 3. Clustered treatment, complete random assigment #----------- # Simulating data dat$cl <- rep(1:4, times = c(2, 2, 3, 3)) dat$prob <- 0.5 clust_crs_decl <- declare_ra(N = nrow(dat), clusters = dat$cl, prob = 0.5) dat$z <- conduct_ra(clust_crs_decl) # Easiest to specify using declaration ht_cl <- horvitz_thompson(y ~ z, data = dat, ra_declaration = clust_crs_decl) # Also can pass the condition probability and the clusters ht_cl_manual <- horvitz_thompson( y ~ z, data = dat, clusters = cl, condition_prs = prob, simple = FALSE ) ht_cl ht_cl_manual # Blocked estimators specified similarly #---------- # More complicated assignment #---------- # arbitrary permutation matrix possible_treats <- cbind( c(1, 1, 0, 1, 0, 0, 0, 1, 1, 0), c(0, 1, 1, 0, 1, 1, 0, 1, 0, 1), c(1, 0, 1, 1, 1, 1, 1, 0, 0, 0) ) arb_pr_mat <- permutations_to_condition_pr_mat(possible_treats) # Simulating a column to be realized treatment dat$z <- possible_treats[, sample(ncol(possible_treats), size = 1)] horvitz_thompson(y ~ z, data = dat, condition_pr_mat = arb_pr_mat)
# Set seed set.seed(42) # Simulate data n <- 10 dat <- data.frame(y = rnorm(n)) library(randomizr) #---------- # 1. Simple random assignment #---------- dat$p <- 0.5 dat$z <- rbinom(n, size = 1, prob = dat$p) # If you only pass condition_prs, we assume simple random sampling horvitz_thompson(y ~ z, data = dat, condition_prs = p) # Assume constant effects instead horvitz_thompson(y ~ z, data = dat, condition_prs = p, se_type = "constant") # Also can use randomizr to pass a declaration srs_declaration <- declare_ra(N = nrow(dat), prob = 0.5, simple = TRUE) horvitz_thompson(y ~ z, data = dat, ra_declaration = srs_declaration) #---------- # 2. Complete random assignment #---------- dat$z <- sample(rep(0:1, each = n/2)) # Can use a declaration crs_declaration <- declare_ra(N = nrow(dat), prob = 0.5, simple = FALSE) horvitz_thompson(y ~ z, data = dat, ra_declaration = crs_declaration) # Can precompute condition_pr_mat and pass it # (faster for multiple runs with same condition probability matrix) crs_pr_mat <- declaration_to_condition_pr_mat(crs_declaration) horvitz_thompson(y ~ z, data = dat, condition_pr_mat = crs_pr_mat) #---------- # 3. Clustered treatment, complete random assigment #----------- # Simulating data dat$cl <- rep(1:4, times = c(2, 2, 3, 3)) dat$prob <- 0.5 clust_crs_decl <- declare_ra(N = nrow(dat), clusters = dat$cl, prob = 0.5) dat$z <- conduct_ra(clust_crs_decl) # Easiest to specify using declaration ht_cl <- horvitz_thompson(y ~ z, data = dat, ra_declaration = clust_crs_decl) # Also can pass the condition probability and the clusters ht_cl_manual <- horvitz_thompson( y ~ z, data = dat, clusters = cl, condition_prs = prob, simple = FALSE ) ht_cl ht_cl_manual # Blocked estimators specified similarly #---------- # More complicated assignment #---------- # arbitrary permutation matrix possible_treats <- cbind( c(1, 1, 0, 1, 0, 0, 0, 1, 1, 0), c(0, 1, 1, 0, 1, 1, 0, 1, 0, 1), c(1, 0, 1, 1, 1, 1, 1, 0, 0, 0) ) arb_pr_mat <- permutations_to_condition_pr_mat(possible_treats) # Simulating a column to be realized treatment dat$z <- possible_treats[, sample(ncol(possible_treats), size = 1)] horvitz_thompson(y ~ z, data = dat, condition_pr_mat = arb_pr_mat)
This formula estimates an instrumental variables regression using two-stage least squares with a variety of options for robust standard errors
iv_robust( formula, data, weights, subset, clusters, fixed_effects, se_type = NULL, ci = TRUE, alpha = 0.05, diagnostics = FALSE, return_vcov = TRUE, try_cholesky = FALSE )
iv_robust( formula, data, weights, subset, clusters, fixed_effects, se_type = NULL, ci = TRUE, alpha = 0.05, diagnostics = FALSE, return_vcov = TRUE, try_cholesky = FALSE )
formula |
an object of class formula of the regression and the instruments.
For example, the formula |
data |
A |
weights |
the bare (unquoted) names of the weights variable in the supplied data. |
subset |
An optional bare (unquoted) expression specifying a subset of observations to be used. |
clusters |
An optional bare (unquoted) name of the variable that corresponds to the clusters in the data. |
fixed_effects |
An optional right-sided formula containing the fixed
effects that will be projected out of the data, such as |
se_type |
The sort of standard error sought. If |
ci |
logical. Whether to compute and return p-values and confidence intervals, TRUE by default. |
alpha |
The significance level, 0.05 by default. |
diagnostics |
logical. Whether to compute and return instrumental variable diagnostic statistics and tests. |
return_vcov |
logical. Whether to return the variance-covariance matrix for later usage, TRUE by default. |
try_cholesky |
logical. Whether to try using a Cholesky decomposition to solve least squares instead of a QR decomposition, FALSE by default. Using a Cholesky decomposition may result in speed gains, but should only be used if users are sure their model is full-rank (i.e., there is no perfect multi-collinearity) |
This function performs two-stage least squares estimation to fit
instrumental variables regression. The syntax is similar to that in
ivreg
from the AER
package. Regressors and instruments
should be specified in a two-part formula, such as
y ~ x1 + x2 | z1 + z2 + z3
, where x1
and x2
are
regressors and z1
, z2
, and z3
are instruments. Unlike
ivreg
, you must explicitly specify all exogenous regressors on
both sides of the bar.
The default variance estimators are the same as in lm_robust
.
Without clusters, we default to HC2
standard errors, and with clusters
we default to CR2
standard errors. 2SLS variance estimates are
computed using the same estimators as in lm_robust
, however the
design matrix used are the second-stage regressors, which includes the estimated
endogenous regressors, and the residuals used are the difference
between the outcome and a fit produced by the second-stage coefficients and the
first-stage (endogenous) regressors. More notes on this can be found at
the mathematical appendix.
If fixed_effects
are specified, both the outcome, regressors, and instruments
are centered using the method of alternating projections (Halperin 1962; Gaure 2013). Specifying
fixed effects in this way will result in large speed gains with standard error
estimators that do not need to invert the matrix of fixed effects. This means using
"classical", "HC0", "HC1", "CR0", or "stata" standard errors will be faster than other
standard error estimators. Be wary when specifying fixed effects that may result
in perfect fits for some observations or if there are intersecting groups across
multiple fixed effect variables (e.g. if you specify both "year" and "country" fixed effects
with an unbalanced panel where one year you only have data for one country).
If diagnostics
are requested, we compute and return three sets of diagnostics.
First, we return tests for weak instruments using first-stage F-statistics (diagnostic_first_stage_fstatistic
). Specifically,
the F-statistics reported compare the model regressing each endogeneous variable on both the
included exogenous variables and the instruments to a model where each endogenous variable is
regressed only on the included exogenous variables (without the instruments). A significant F-test
for weak instruments provides evidence against the null hypothesis that the instruments are weak.
Second, we return tests for the endogeneity of the endogenous variables, often called the Wu-Hausman
test (diagnostic_endogeneity_test
). We implement the regression test from Hausman (1978), which allows for robust variance estimation.
A significant endogeneity test provides evidence against the null that all the variables are exogenous.
Third, we return a test for the correlation between the instruments and the error term (diagnostic_overid_test
). We implement
the Wooldridge (1995) robust score test, which is identical to Sargan's (1958) test with classical
standard errors. This test is only reported if the model is overidentified (i.e. the number of
instruments is greater than the number of endogenous regressors), and if no weights are specified.
An object of class "iv_robust"
.
The post-estimation commands functions summary
and tidy
return results in a data.frame
. To get useful data out of the return,
you can use these data frames, you can use the resulting list directly, or
you can use the generic accessor functions coef
, vcov
,
confint
, and predict
.
An object of class "iv_robust"
is a list containing at least the
following components:
coefficients |
the estimated coefficients |
std.error |
the estimated standard errors |
statistic |
the t-statistic |
df |
the estimated degrees of freedom |
p.value |
the p-values from a two-sided t-test using |
conf.low |
the lower bound of the |
conf.high |
the upper bound of the |
term |
a character vector of coefficient names |
alpha |
the significance level specified by the user |
se_type |
the standard error type specified by the user |
res_var |
the residual variance |
nobs |
the number of observations used |
k |
the number of columns in the design matrix (includes linearly dependent columns!) |
rank |
the rank of the fitted model |
vcov |
the fitted variance covariance matrix |
r.squared |
the |
adj.r.squared |
the |
fstatistic |
a vector with the value of the second stage F-statistic with the numerator and denominator degrees of freedom |
firststage_fstatistic |
a vector with the value of the first stage F-statistic with the numerator and denominator degrees of freedom, useful for a test for weak instruments |
weighted |
whether or not weights were applied |
call |
the original function call |
fitted.values |
the matrix of predicted means |
We also return terms
with the second stage terms and terms_regressors
with the first stage terms, both of which used by predict
. If fixed_effects
are specified, then we return proj_fstatistic
, proj_r.squared
, and proj_adj.r.squared
, which are model fit statistics that are computed on the projected model (after demeaning the fixed effects).
We also return various diagnostics when `diagnostics` == TRUE
. These are stored in diagnostic_first_stage_fstatistic
, diagnostic_endogeneity_test
, and diagnostic_overid_test
. They have the test statistic, relevant degrees of freedom, and p.value in a named vector. See 'Details' for more. These are printed in a formatted table when the model object is passed to summary()
.
Gaure, Simon. 2013. "OLS with multiple high dimensional category variables." Computational Statistics & Data Analysis 66: 8-1. doi:10.1016/j.csda.2013.03.024
Halperin, I. 1962. "The product of projection operators." Acta Scientiarum Mathematicarum (Szeged) 23(1-2): 96-99.
library(fabricatr) dat <- fabricate( N = 40, Y = rpois(N, lambda = 4), Z = rbinom(N, 1, prob = 0.4), D = Z * rbinom(N, 1, prob = 0.8), X = rnorm(N), G = sample(letters[1:4], N, replace = TRUE) ) # Instrument for treatment `D` with encouragement `Z` tidy(iv_robust(Y ~ D + X | Z + X, data = dat)) # Instrument with Stata's `ivregress 2sls , small rob` HC1 variance tidy(iv_robust(Y ~ D | Z, data = dat, se_type = "stata")) # With clusters, we use CR2 errors by default dat$cl <- rep(letters[1:5], length.out = nrow(dat)) tidy(iv_robust(Y ~ D | Z, data = dat, clusters = cl)) # Again, easy to replicate Stata (again with `small` correction in Stata) tidy(iv_robust(Y ~ D | Z, data = dat, clusters = cl, se_type = "stata")) # We can also specify fixed effects, that will be taken as exogenous regressors # Speed gains with fixed effects are greatests with "stata" or "HC1" std.errors tidy(iv_robust(Y ~ D | Z, data = dat, fixed_effects = ~ G, se_type = "HC1"))
library(fabricatr) dat <- fabricate( N = 40, Y = rpois(N, lambda = 4), Z = rbinom(N, 1, prob = 0.4), D = Z * rbinom(N, 1, prob = 0.8), X = rnorm(N), G = sample(letters[1:4], N, replace = TRUE) ) # Instrument for treatment `D` with encouragement `Z` tidy(iv_robust(Y ~ D + X | Z + X, data = dat)) # Instrument with Stata's `ivregress 2sls , small rob` HC1 variance tidy(iv_robust(Y ~ D | Z, data = dat, se_type = "stata")) # With clusters, we use CR2 errors by default dat$cl <- rep(letters[1:5], length.out = nrow(dat)) tidy(iv_robust(Y ~ D | Z, data = dat, clusters = cl)) # Again, easy to replicate Stata (again with `small` correction in Stata) tidy(iv_robust(Y ~ D | Z, data = dat, clusters = cl, se_type = "stata")) # We can also specify fixed effects, that will be taken as exogenous regressors # Speed gains with fixed effects are greatests with "stata" or "HC1" std.errors tidy(iv_robust(Y ~ D | Z, data = dat, fixed_effects = ~ G, se_type = "HC1"))
This function fits a linear model with robust standard errors and performs linear hypothesis test.
lh_robust(..., data, linear_hypothesis)
lh_robust(..., data, linear_hypothesis)
... |
Other arguments to be passed to |
data |
A |
linear_hypothesis |
A character string or a matrix specifying combination, to be passed to the hypothesis.matrix argument of car::linearHypothesis
See |
This function is a wrapper for lm_robust
and for
linearHypothesis
. It first runs lm_robust
and
next passes "lm_robust"
object as an argument to linearHypothesis
.
An object of class "lh_robust"
containing the two following components:
lm_robust |
an object as returned by |
lh |
A data frame with most of its columns pulled from |
The only analyis directly performed by lh_robust
is a t-test
for the null hypothesis of no effects of the linear combination of coefficients as specified by the user.
All other output components are either extracted from linearHypothesis
or lm_robust
.
The original output returned by linearHypothesis
is added as an attribute under the "linear_hypothesis"
attribute.
library(fabricatr) dat <- fabricate( N = 40, y = rpois(N, lambda = 4), x = rnorm(N), z = rbinom(N, 1, prob = 0.4), clusterID = sample(1:4, 40, replace = TRUE) ) # Default variance estimator is HC2 robust standard errors lhro <- lh_robust(y ~ x + z, data = dat, linear_hypothesis = "z + 2x = 0") # The linear hypothesis argument can be specified equivalently as: lh_robust(y ~ x + z, data = dat, linear_hypothesis = "z = 2x") lh_robust(y ~ x + z, data = dat, linear_hypothesis = "2*x +1*z") lh_robust(y ~ x + z, data = dat, linear_hypothesis = "z + 2x = 0") # Also recovers other sorts of standard erorrs just as specified in \code{\link{lm_robust}} lh_robust(y ~ x + z, data = dat, linear_hypothesis = "z + 2x = 0", se_type = "classical") lh_robust(y ~ x + z, data = dat, linear_hypothesis = "z + 2x = 0", se_type = "HC1") # Can tidy() main output and subcomponents in to a data.frame lhro <- lh_robust(y ~ x + z, data = dat, linear_hypothesis = "z + 2x = 0") tidy(lhro ) tidy(lhro$lm_robust) tidy(lhro$lh) # Can use summary() to get more statistics on the main output and subcomponents. summary(lhro) summary(lhro$lm_robust) summary(lhro$lh)
library(fabricatr) dat <- fabricate( N = 40, y = rpois(N, lambda = 4), x = rnorm(N), z = rbinom(N, 1, prob = 0.4), clusterID = sample(1:4, 40, replace = TRUE) ) # Default variance estimator is HC2 robust standard errors lhro <- lh_robust(y ~ x + z, data = dat, linear_hypothesis = "z + 2x = 0") # The linear hypothesis argument can be specified equivalently as: lh_robust(y ~ x + z, data = dat, linear_hypothesis = "z = 2x") lh_robust(y ~ x + z, data = dat, linear_hypothesis = "2*x +1*z") lh_robust(y ~ x + z, data = dat, linear_hypothesis = "z + 2x = 0") # Also recovers other sorts of standard erorrs just as specified in \code{\link{lm_robust}} lh_robust(y ~ x + z, data = dat, linear_hypothesis = "z + 2x = 0", se_type = "classical") lh_robust(y ~ x + z, data = dat, linear_hypothesis = "z + 2x = 0", se_type = "HC1") # Can tidy() main output and subcomponents in to a data.frame lhro <- lh_robust(y ~ x + z, data = dat, linear_hypothesis = "z + 2x = 0") tidy(lhro ) tidy(lhro$lm_robust) tidy(lhro$lh) # Can use summary() to get more statistics on the main output and subcomponents. summary(lhro) summary(lhro$lm_robust) summary(lhro$lh)
This function is a wrapper for lm_robust
that
is useful for estimating treatment effects with pre-treatment covariate
data. This implements the method described by Lin (2013).
lm_lin( formula, covariates, data, weights, subset, clusters, se_type = NULL, ci = TRUE, alpha = 0.05, return_vcov = TRUE, try_cholesky = FALSE )
lm_lin( formula, covariates, data, weights, subset, clusters, se_type = NULL, ci = TRUE, alpha = 0.05, return_vcov = TRUE, try_cholesky = FALSE )
formula |
an object of class formula, as in |
covariates |
a right-sided formula with pre-treatment covariates on
the right hand side, such as |
data |
A |
weights |
the bare (unquoted) names of the weights variable in the supplied data. |
subset |
An optional bare (unquoted) expression specifying a subset of observations to be used. |
clusters |
An optional bare (unquoted) name of the variable that corresponds to the clusters in the data. |
se_type |
The sort of standard error sought. If |
ci |
logical. Whether to compute and return p-values and confidence intervals, TRUE by default. |
alpha |
The significance level, 0.05 by default. |
return_vcov |
logical. Whether to return the variance-covariance matrix for later usage, TRUE by default. |
try_cholesky |
logical. Whether to try using a Cholesky decomposition to solve least squares instead of a QR decomposition, FALSE by default. Using a Cholesky decomposition may result in speed gains, but should only be used if users are sure their model is full-rank (i.e., there is no perfect multi-collinearity) |
This function is simply a wrapper for lm_robust
and implements
the Lin estimator (see the reference below). This method
pre-processes the data by taking the covariates specified in the
`covariates`
argument, centering them by subtracting from each covariate
its mean, and interacting them with the treatment. If the treatment has
multiple values, a series of dummies for each value is created and each of
those is interacted with the demeaned covariates. More details can be found
in the
Getting Started vignette
and the
mathematical notes.
An object of class "lm_robust"
.
The post-estimation commands functions summary
and tidy
return results in a data.frame
. To get useful data out of the return,
you can use these data frames, you can use the resulting list directly, or
you can use the generic accessor functions coef
, vcov
,
confint
, and predict
. Marginal effects and uncertainty about
them can be gotten by passing this object to
margins
from the margins.
Users who want to print the results in TeX of HTML can use the
extract
function and the texreg package.
An object of class "lm_robust"
is a list containing at least the
following components:
coefficients |
the estimated coefficients |
std.error |
the estimated standard errors |
statistic |
the t-statistic |
df |
the estimated degrees of freedom |
p.value |
the p-values from a two-sided t-test using |
conf.low |
the lower bound of the |
conf.high |
the upper bound of the |
term |
a character vector of coefficient names |
alpha |
the significance level specified by the user |
se_type |
the standard error type specified by the user |
res_var |
the residual variance |
N |
the number of observations used |
k |
the number of columns in the design matrix (includes linearly dependent columns!) |
rank |
the rank of the fitted model |
vcov |
the fitted variance covariance matrix |
r.squared |
The
where |
adj.r.squared |
The |
weighted |
whether or not weights were applied |
call |
the original function call |
fitted.values |
the matrix of predicted means |
We also return terms
and contrasts
, used by predict
,
and scaled_center
(the means of each of the covariates used for centering them).
Freedman, David A. 2008. "On Regression Adjustments in Experiments with Several Treatments." The Annals of Applied Statistics. JSTOR, 176-96. doi:10.1214/07-AOAS143.
Lin, Winston. 2013. "Agnostic Notes on Regression Adjustments to Experimental Data: Reexamining Freedman's Critique." The Annals of Applied Statistics 7 (1). Institute of Mathematical Statistics: 295-318. doi:10.1214/12-AOAS583.
library(fabricatr) library(randomizr) dat <- fabricate( N = 40, x = rnorm(N, mean = 2.3), x2 = rpois(N, lambda = 2), x3 = runif(N), y0 = rnorm(N) + x, y1 = rnorm(N) + x + 0.35 ) dat$z <- complete_ra(N = nrow(dat)) dat$y <- ifelse(dat$z == 1, dat$y1, dat$y0) # Same specification as lm_robust() with one additional argument lmlin_out <- lm_lin(y ~ z, covariates = ~ x, data = dat) tidy(lmlin_out) # Works with multiple pre-treatment covariates lm_lin(y ~ z, covariates = ~ x + x2, data = dat) # Also centers data AFTER evaluating any functions in formula lmlin_out2 <- lm_lin(y ~ z, covariates = ~ x + log(x3), data = dat) lmlin_out2$scaled_center["log(x3)"] mean(log(dat$x3)) # Works easily with clusters dat$clusterID <- rep(1:20, each = 2) dat$z_clust <- cluster_ra(clusters = dat$clusterID) lm_lin(y ~ z_clust, covariates = ~ x, data = dat, clusters = clusterID) # Works with multi-valued treatments dat$z_multi <- sample(1:3, size = nrow(dat), replace = TRUE) lm_lin(y ~ z_multi, covariates = ~ x, data = dat) # Stratified estimator with blocks dat$blockID <- rep(1:5, each = 8) dat$z_block <- block_ra(blocks = dat$blockID) lm_lin(y ~ z_block, ~ factor(blockID), data = dat) ## Not run: # Can also use 'margins' package if you have it installed to get # marginal effects library(margins) lmlout <- lm_lin(y ~ z_block, ~ x, data = dat) summary(margins(lmlout)) # Can output results using 'texreg' library(texreg) texregobj <- extract(lmlout) ## End(Not run)
library(fabricatr) library(randomizr) dat <- fabricate( N = 40, x = rnorm(N, mean = 2.3), x2 = rpois(N, lambda = 2), x3 = runif(N), y0 = rnorm(N) + x, y1 = rnorm(N) + x + 0.35 ) dat$z <- complete_ra(N = nrow(dat)) dat$y <- ifelse(dat$z == 1, dat$y1, dat$y0) # Same specification as lm_robust() with one additional argument lmlin_out <- lm_lin(y ~ z, covariates = ~ x, data = dat) tidy(lmlin_out) # Works with multiple pre-treatment covariates lm_lin(y ~ z, covariates = ~ x + x2, data = dat) # Also centers data AFTER evaluating any functions in formula lmlin_out2 <- lm_lin(y ~ z, covariates = ~ x + log(x3), data = dat) lmlin_out2$scaled_center["log(x3)"] mean(log(dat$x3)) # Works easily with clusters dat$clusterID <- rep(1:20, each = 2) dat$z_clust <- cluster_ra(clusters = dat$clusterID) lm_lin(y ~ z_clust, covariates = ~ x, data = dat, clusters = clusterID) # Works with multi-valued treatments dat$z_multi <- sample(1:3, size = nrow(dat), replace = TRUE) lm_lin(y ~ z_multi, covariates = ~ x, data = dat) # Stratified estimator with blocks dat$blockID <- rep(1:5, each = 8) dat$z_block <- block_ra(blocks = dat$blockID) lm_lin(y ~ z_block, ~ factor(blockID), data = dat) ## Not run: # Can also use 'margins' package if you have it installed to get # marginal effects library(margins) lmlout <- lm_lin(y ~ z_block, ~ x, data = dat) summary(margins(lmlout)) # Can output results using 'texreg' library(texreg) texregobj <- extract(lmlout) ## End(Not run)
This formula fits a linear model, provides a variety of options for robust standard errors, and conducts coefficient tests
lm_robust( formula, data, weights, subset, clusters, fixed_effects, se_type = NULL, ci = TRUE, alpha = 0.05, return_vcov = TRUE, try_cholesky = FALSE )
lm_robust( formula, data, weights, subset, clusters, fixed_effects, se_type = NULL, ci = TRUE, alpha = 0.05, return_vcov = TRUE, try_cholesky = FALSE )
formula |
an object of class formula, as in |
data |
A |
weights |
the bare (unquoted) names of the weights variable in the supplied data. |
subset |
An optional bare (unquoted) expression specifying a subset of observations to be used. |
clusters |
An optional bare (unquoted) name of the variable that corresponds to the clusters in the data. |
fixed_effects |
An optional right-sided formula containing the fixed
effects that will be projected out of the data, such as |
se_type |
The sort of standard error sought. If |
ci |
logical. Whether to compute and return p-values and confidence intervals, TRUE by default. |
alpha |
The significance level, 0.05 by default. |
return_vcov |
logical. Whether to return the variance-covariance matrix for later usage, TRUE by default. |
try_cholesky |
logical. Whether to try using a Cholesky decomposition to solve least squares instead of a QR decomposition, FALSE by default. Using a Cholesky decomposition may result in speed gains, but should only be used if users are sure their model is full-rank (i.e., there is no perfect multi-collinearity) |
This function performs linear regression and provides a variety of standard
errors. It takes a formula and data much in the same was as lm
does, and all auxiliary variables, such as clusters and weights, can be
passed either as quoted names of columns, as bare column names, or
as a self-contained vector. Examples of usage can be seen below and in the
Getting Started vignette.
The mathematical notes in
this vignette
specify the exact estimators used by this function.
The default variance estimators have been chosen largely in accordance with the
procedures in
this manual.
The default for the case
without clusters is the HC2 estimator and the default with clusters is the
analogous CR2 estimator. Users can easily replicate Stata standard errors in
the clustered or non-clustered case by setting `se_type` = "stata"
.
The function estimates the coefficients and standard errors in C++, using
the RcppEigen
package. By default, we estimate the coefficients
using Column-Pivoting QR decomposition from the Eigen C++ library, although
users could get faster solutions by setting `try_cholesky` = TRUE
to
use a Cholesky decomposition instead. This will likely result in quicker
solutions, but the algorithm does not reliably detect when there are linear
dependencies in the model and may fail silently if they exist.
If `fixed_effects`
are specified, both the outcome and design matrix
are centered using the method of alternating projections (Halperin 1962; Gaure 2013). Specifying
fixed effects in this way will result in large speed gains with standard error
estimators that do not need to invert the matrix of fixed effects. This means using
"classical", "HC0", "HC1", "CR0", or "stata" standard errors will be faster than other
standard error estimators. Be wary when specifying fixed effects that may result
in perfect fits for some observations or if there are intersecting groups across
multiple fixed effect variables (e.g. if you specify both "year" and "country" fixed effects
with an unbalanced panel where one year you only have data for one country).
As with `lm()`
, multivariate regression (multiple outcomes) will only admit
observations into the estimation that have no missingness on any outcome.
An object of class "lm_robust"
.
The post-estimation commands functions summary
and tidy
return results in a data.frame
. To get useful data out of the return,
you can use these data frames, you can use the resulting list directly, or
you can use the generic accessor functions coef
, vcov
,
confint
, and predict
. Marginal effects and uncertainty about
them can be gotten by passing this object to
margins
from the margins,
or to emmeans
in the emmeans package.
Users who want to print the results in TeX of HTML can use the
extract
function and the texreg package.
If users specify a multivariate linear regression model (multiple outcomes), then some of the below components will be of higher dimension to accommodate the additional models.
An object of class "lm_robust"
is a list containing at least the
following components:
coefficients |
the estimated coefficients |
std.error |
the estimated standard errors |
statistic |
the t-statistic |
df |
the estimated degrees of freedom |
p.value |
the p-values from a two-sided t-test using |
conf.low |
the lower bound of the |
conf.high |
the upper bound of the |
term |
a character vector of coefficient names |
alpha |
the significance level specified by the user |
se_type |
the standard error type specified by the user |
res_var |
the residual variance |
N |
the number of observations used |
k |
the number of columns in the design matrix (includes linearly dependent columns!) |
rank |
the rank of the fitted model |
vcov |
the fitted variance covariance matrix |
r.squared |
The
where |
adj.r.squared |
The |
fstatistic |
a vector with the value of the F-statistic with the numerator and denominator degrees of freedom |
weighted |
whether or not weights were applied |
call |
the original function call |
fitted.values |
the matrix of predicted means |
We also return terms
and contrasts
, used by predict
. If fixed_effects
are specified, then we return proj_fstatistic
, proj_r.squared
, and proj_adj.r.squared
, which are model fit statistics that are computed on the projected model (after demeaning the fixed effects).
Abadie, Alberto, Susan Athey, Guido W Imbens, and Jeffrey Wooldridge. 2017. "A Class of Unbiased Estimators of the Average Treatment Effect in Randomized Experiments." arXiv Pre-Print. https://arxiv.org/abs/1710.02926v2.
Bell, Robert M, and Daniel F McCaffrey. 2002. "Bias Reduction in Standard Errors for Linear Regression with Multi-Stage Samples." Survey Methodology 28 (2): 169-82.
Gaure, Simon. 2013. "OLS with multiple high dimensional category variables." Computational Statistics & Data Analysis 66: 8-1. doi:10.1016/j.csda.2013.03.024
Halperin, I. 1962. "The product of projection operators." Acta Scientiarum Mathematicarum (Szeged) 23(1-2): 96-99.
MacKinnon, James, and Halbert White. 1985. "Some Heteroskedasticity-Consistent Covariance Matrix Estimators with Improved Finite Sample Properties." Journal of Econometrics 29 (3): 305-25. doi:10.1016/0304-4076(85)90158-7.
Pustejovsky, James E, and Elizabeth Tipton. 2016. "Small Sample Methods for Cluster-Robust Variance Estimation and Hypothesis Testing in Fixed Effects Models." Journal of Business & Economic Statistics. Taylor & Francis. doi:10.1080/07350015.2016.1247004.
Samii, Cyrus, and Peter M Aronow. 2012. "On Equivalencies Between Design-Based and Regression-Based Variance Estimators for Randomized Experiments." Statistics and Probability Letters 82 (2). doi:10.1016/j.spl.2011.10.024.
set.seed(15) library(fabricatr) dat <- fabricate( N = 40, y = rpois(N, lambda = 4), x = rnorm(N), z = rbinom(N, 1, prob = 0.4) ) # Default variance estimator is HC2 robust standard errors lmro <- lm_robust(y ~ x + z, data = dat) # Can tidy() the data in to a data.frame tidy(lmro) # Can use summary() to get more statistics summary(lmro) # Can also get coefficients three ways lmro$coefficients coef(lmro) tidy(lmro)$estimate # Can also get confidence intervals from object or with new 1 - `alpha` lmro$conf.low confint(lmro, level = 0.8) # Can recover classical standard errors lmclassic <- lm_robust(y ~ x + z, data = dat, se_type = "classical") tidy(lmclassic) # Can easily match Stata's robust standard errors lmstata <- lm_robust(y ~ x + z, data = dat, se_type = "stata") tidy(lmstata) # Easy to specify clusters for cluster-robust inference dat$clusterID <- sample(1:10, size = 40, replace = TRUE) lmclust <- lm_robust(y ~ x + z, data = dat, clusters = clusterID) tidy(lmclust) # Can also match Stata's clustered standard errors lm_robust( y ~ x + z, data = dat, clusters = clusterID, se_type = "stata" ) # Works just as LM does with functions in the formula dat$blockID <- rep(c("A", "B", "C", "D"), each = 10) lm_robust(y ~ x + z + factor(blockID), data = dat) # Weights are also easily specified dat$w <- runif(40) lm_robust( y ~ x + z, data = dat, weights = w, clusters = clusterID ) # Subsetting works just as in `lm()` lm_robust(y ~ x, data = dat, subset = z == 1) # One can also choose to set the significance level for different CIs lm_robust(y ~ x + z, data = dat, alpha = 0.1) # We can also specify fixed effects # Speed gains with fixed effects are greatest with "stata" or "HC1" std.errors tidy(lm_robust(y ~ z, data = dat, fixed_effects = ~ blockID, se_type = "HC1")) ## Not run: # Can also use 'margins' or 'emmeans' package if you have them installed # to get marginal effects library(margins) lmrout <- lm_robust(y ~ x + z, data = dat) summary(margins(lmrout)) # Can output results using 'texreg' library(texreg) texreg(lmrout) # Using emmeans to obtain covariate-adjusted means library(emmeans) fiber.rlm <- lm_robust(strength ~ diameter + machine, data = fiber) emmeans(fiber.rlm, "machine") ## End(Not run)
set.seed(15) library(fabricatr) dat <- fabricate( N = 40, y = rpois(N, lambda = 4), x = rnorm(N), z = rbinom(N, 1, prob = 0.4) ) # Default variance estimator is HC2 robust standard errors lmro <- lm_robust(y ~ x + z, data = dat) # Can tidy() the data in to a data.frame tidy(lmro) # Can use summary() to get more statistics summary(lmro) # Can also get coefficients three ways lmro$coefficients coef(lmro) tidy(lmro)$estimate # Can also get confidence intervals from object or with new 1 - `alpha` lmro$conf.low confint(lmro, level = 0.8) # Can recover classical standard errors lmclassic <- lm_robust(y ~ x + z, data = dat, se_type = "classical") tidy(lmclassic) # Can easily match Stata's robust standard errors lmstata <- lm_robust(y ~ x + z, data = dat, se_type = "stata") tidy(lmstata) # Easy to specify clusters for cluster-robust inference dat$clusterID <- sample(1:10, size = 40, replace = TRUE) lmclust <- lm_robust(y ~ x + z, data = dat, clusters = clusterID) tidy(lmclust) # Can also match Stata's clustered standard errors lm_robust( y ~ x + z, data = dat, clusters = clusterID, se_type = "stata" ) # Works just as LM does with functions in the formula dat$blockID <- rep(c("A", "B", "C", "D"), each = 10) lm_robust(y ~ x + z + factor(blockID), data = dat) # Weights are also easily specified dat$w <- runif(40) lm_robust( y ~ x + z, data = dat, weights = w, clusters = clusterID ) # Subsetting works just as in `lm()` lm_robust(y ~ x, data = dat, subset = z == 1) # One can also choose to set the significance level for different CIs lm_robust(y ~ x + z, data = dat, alpha = 0.1) # We can also specify fixed effects # Speed gains with fixed effects are greatest with "stata" or "HC1" std.errors tidy(lm_robust(y ~ z, data = dat, fixed_effects = ~ blockID, se_type = "HC1")) ## Not run: # Can also use 'margins' or 'emmeans' package if you have them installed # to get marginal effects library(margins) lmrout <- lm_robust(y ~ x + z, data = dat) summary(margins(lmrout)) # Can output results using 'texreg' library(texreg) texreg(lmrout) # Using emmeans to obtain covariate-adjusted means library(emmeans) fiber.rlm <- lm_robust(strength ~ diameter + machine, data = fiber) emmeans(fiber.rlm, "machine") ## End(Not run)
Internal method that creates linear fits
lm_robust_fit( y, X, yoriginal = NULL, Xoriginal = NULL, weights, cluster, fixed_effects = NULL, ci = TRUE, se_type, has_int, alpha = 0.05, return_vcov = TRUE, return_fit = TRUE, try_cholesky = FALSE, iv_stage = list(0) )
lm_robust_fit( y, X, yoriginal = NULL, Xoriginal = NULL, weights, cluster, fixed_effects = NULL, ci = TRUE, se_type, has_int, alpha = 0.05, return_vcov = TRUE, return_fit = TRUE, try_cholesky = FALSE, iv_stage = list(0) )
y |
numeric outcome vector |
X |
numeric design matrix |
yoriginal |
numeric outcome vector, unprojected if there are fixed effects |
Xoriginal |
numeric design matrix, unprojected if there are fixed effects. Any column named |
weights |
numeric weights vector |
cluster |
numeric cluster vector |
fixed_effects |
character matrix of fixed effect groups |
ci |
boolean that when T returns confidence intervals and p-values |
se_type |
character denoting which kind of SEs to return |
has_int |
logical, whether the model has an intercept, used for |
alpha |
numeric denoting the test size for confidence intervals |
return_vcov |
logical, whether to return the vcov matrix for later usage |
return_fit |
logical, whether to return fitted values |
try_cholesky |
logical, whether to try using a cholesky decomposition to solve LS instead of a QR decomposition |
iv_stage |
list of length two, the first element denotes the stage of 2SLS IV estimation, where 0 is used for OLS. The second element is only used for the second stage of 2SLS and has the first stage design matrix. For OLS, the default, |
Extra logging on na.omit handler
na.omit_detailed.data.frame(object)
na.omit_detailed.data.frame(object)
object |
a data.frame |
a normal omit
object, with the extra attribute why_omit
,
which contains the leftmost column containing an NA for each row that was dropped, by
column name, if any were dropped.
Builds condition probability matrices for Horvitz-Thompson estimation from permutation matrix
permutations_to_condition_pr_mat(permutations)
permutations_to_condition_pr_mat(permutations)
permutations |
A matrix where the rows are units and the columns are different treatment permutations; treated units must be represented with a 1 and control units with a 0 |
This function takes a matrix of permutations, for example from
the obtain_permutation_matrix
function in
randomizr or through simulation and returns a 2n*2n matrix that can
be used to fully specify the design for horvitz_thompson
estimation. You can read more about these matrices in the documentation for
the declaration_to_condition_pr_mat
function.
This is done by passing this matrix to the condition_pr_mat
argument
of
a numeric 2n*2n matrix of marginal and joint condition treatment
probabilities to be passed to the condition_pr_mat
argument of
horvitz_thompson
.
declare_ra
,
declaration_to_condition_pr_mat
# Complete randomization perms <- replicate(1000, sample(rep(0:1, each = 50))) comp_pr_mat <- permutations_to_condition_pr_mat(perms) # Arbitrary randomization possible_treats <- cbind( c(1, 1, 0, 1, 0, 0, 0, 1, 1, 0), c(0, 1, 1, 0, 1, 1, 0, 1, 0, 1), c(1, 0, 1, 1, 1, 1, 1, 0, 0, 0) ) arb_pr_mat <- permutations_to_condition_pr_mat(possible_treats) # Simulating a column to be realized treatment z <- possible_treats[, sample(ncol(possible_treats), size = 1)] y <- rnorm(nrow(possible_treats)) horvitz_thompson(y ~ z, condition_pr_mat = arb_pr_mat)
# Complete randomization perms <- replicate(1000, sample(rep(0:1, each = 50))) comp_pr_mat <- permutations_to_condition_pr_mat(perms) # Arbitrary randomization possible_treats <- cbind( c(1, 1, 0, 1, 0, 0, 0, 1, 1, 0), c(0, 1, 1, 0, 1, 1, 0, 1, 0, 1), c(1, 0, 1, 1, 1, 1, 1, 0, 0, 0) ) arb_pr_mat <- permutations_to_condition_pr_mat(possible_treats) # Simulating a column to be realized treatment z <- possible_treats[, sample(ncol(possible_treats), size = 1)] y <- rnorm(nrow(possible_treats)) horvitz_thompson(y ~ z, condition_pr_mat = arb_pr_mat)
lm_robust
objectPredict method for lm_robust
object
## S3 method for class 'lm_robust' predict( object, newdata, se.fit = FALSE, interval = c("none", "confidence", "prediction"), alpha = 0.05, na.action = na.pass, pred.var = NULL, weights, ... )
## S3 method for class 'lm_robust' predict( object, newdata, se.fit = FALSE, interval = c("none", "confidence", "prediction"), alpha = 0.05, na.action = na.pass, pred.var = NULL, weights, ... )
object |
an object of class 'lm_robust' |
newdata |
a data frame in which to look for variables with which to predict |
se.fit |
logical. Whether standard errors are required, default = FALSE |
interval |
type of interval calculation. Can be abbreviated, default = none |
alpha |
numeric denoting the test size for confidence intervals |
na.action |
function determining what should be done with missing values in newdata. The default is to predict NA. |
pred.var |
the variance(s) for future observations to be assumed for prediction intervals. |
weights |
variance weights for prediction. This can be a numeric vector or a bare (unquoted) name of the weights variable in the supplied newdata. |
... |
other arguments, unused |
Produces predicted values, obtained by evaluating the regression
function in the frame newdata
for fits from lm_robust
and
lm_lin
. If the logical se.fit is TRUE, standard errors of the
predictions are calculated. Setting intervals specifies computation of
confidence or prediction (tolerance) intervals at the specified level,
sometimes referred to as narrow vs. wide intervals.
The equation used for the standard error of a prediction given a row of
data is:
,
where is the estimated variance-covariance matrix from
lm_robust
.
The prediction intervals are for a single observation at each case in
newdata
with error variance(s) pred.var
. The the default is to assume
that future observations have the same error variance as those used for
fitting, which is gotten from the fit lm_robust
object. If
weights is supplied, the inverse of this is used as a scale factor. If the
fit was weighted, the default is to assume constant prediction variance,
with a warning.
# Set seed set.seed(42) # Simulate data n <- 10 dat <- data.frame(y = rnorm(n), x = rnorm(n)) # Fit lm lm_out <- lm_robust(y ~ x, data = dat) # Get predicted fits fits <- predict(lm_out, newdata = dat) # With standard errors and confidence intervals fits <- predict(lm_out, newdata = dat, se.fit = TRUE, interval = "confidence") # Use new data as well new_dat <- data.frame(x = runif(n, 5, 8)) predict(lm_out, newdata = new_dat) # You can also supply custom variance weights for prediction intervals new_dat$w <- runif(n) predict(lm_out, newdata = new_dat, weights = w, interval = "prediction")
# Set seed set.seed(42) # Simulate data n <- 10 dat <- data.frame(y = rnorm(n), x = rnorm(n)) # Fit lm lm_out <- lm_robust(y ~ x, data = dat) # Get predicted fits fits <- predict(lm_out, newdata = dat) # With standard errors and confidence intervals fits <- predict(lm_out, newdata = dat, se.fit = TRUE, interval = "confidence") # Use new data as well new_dat <- data.frame(x = runif(n, 5, 8)) predict(lm_out, newdata = new_dat) # You can also supply custom variance weights for prediction intervals new_dat$w <- runif(n) predict(lm_out, newdata = new_dat, weights = w, interval = "prediction")
Prepare model fits for stargazer
starprep( ..., stat = c("std.error", "statistic", "p.value", "ci", "df"), se_type = NULL, clusters = NULL, alpha = 0.05 )
starprep( ..., stat = c("std.error", "statistic", "p.value", "ci", "df"), se_type = NULL, clusters = NULL, alpha = 0.05 )
... |
a list of lm_robust or lm objects |
stat |
either "std.error" (the default), "statistic" (the t-statistic), "p.value", "ci", or "df" |
se_type |
(optional) if any of the objects are lm objects, what standard
errors should be used. Must only be one type and will be used for all lm
objects passed to starprep. See |
clusters |
(optional) if any of the objects are lm objects, what clusters
should be used, if clusters should be used. Must only be one vector and will
be used for all lm objects passed to starprep. See |
alpha |
(optional) if any of the objects are lm objects, what significance level should be used for the p-values or confidence intervals |
Used to help extract statistics from lists of model fits for stargazer.
Prefers lm_robust objects, but because stargazer
does not work with lm_robust
objects, starprep
can also take lm
objects and calls commarobust
to get
the preferred, robust statistics.
a list of vectors of extracted statistics for stargazers
library(stargazer) lm1 <- lm(mpg ~ hp, data = mtcars) lm2 <- lm(mpg ~ hp + wt, data = mtcars) # Use default "HC2" standard errors stargazer(lm1, lm2, se = starprep(lm1, lm2), p = starprep(lm1, lm2, stat = "p.value"), omit.stat = "f") # NB: We remove the F-stat because stargazer only can use original F-stat # which uses classical SEs # Use default "CR2" standard errors with clusters stargazer(lm1, lm2, se = starprep(lm1, lm2, clusters = mtcars$carb), p = starprep(lm1, lm2, clusters = mtcars$carb, stat = "p.value"), omit.stat = "f") # Can also specify significance levels and different standard errors stargazer(lm1, lm2, ci.custom = starprep(lm1, lm2, se_type = "HC3", alpha = 0.1, stat = "ci"), omit.stat = "f")
library(stargazer) lm1 <- lm(mpg ~ hp, data = mtcars) lm2 <- lm(mpg ~ hp + wt, data = mtcars) # Use default "HC2" standard errors stargazer(lm1, lm2, se = starprep(lm1, lm2), p = starprep(lm1, lm2, stat = "p.value"), omit.stat = "f") # NB: We remove the F-stat because stargazer only can use original F-stat # which uses classical SEs # Use default "CR2" standard errors with clusters stargazer(lm1, lm2, se = starprep(lm1, lm2, clusters = mtcars$carb), p = starprep(lm1, lm2, clusters = mtcars$carb, stat = "p.value"), omit.stat = "f") # Can also specify significance levels and different standard errors stargazer(lm1, lm2, ci.custom = starprep(lm1, lm2, se_type = "HC3", alpha = 0.1, stat = "ci"), omit.stat = "f")