Bias, Variance, and Doubly Robust Estimation: Testing The Promise of TMLE in Simulated Data

By Ken Koon Wong in r R tmle xgboost tidymodels future g computation bias variance

November 16, 2025

Finally understood TMLE’s “doubly robust” property through simulation. Works well when either outcome OR treatment model is correct. XGBoost + TMLE captured complex relationships without manual specification. But beware the confidence intervals - Frank Harrell was right to say “prepare to be disappointed!” 🤔

Motivations:

I’ve always heard about Targeted Maximum Likelihood Estimation (TMLE) and I’ve read Katherine Hoffman’s blog post several times. Printed her cheat sheet and go through it several times. Each time I thought I understood it, the next time I found myself questioning my understanding. 🤣 So, what a better way to dive a tad deeper as to the machinery behind this, and why is it useful? Let’s go!

Just to set the context right, we’re going to estimate Average Treatment Effect (ATE) and use g-computation as a standard approach.

Objectives:

What is TMLE?
What Does Doubly Robust mean?
What is Bias and Variance?
Simulate Data
Write a Function to Estimate
Comparing methods and models
Is there a traditional statistical method for this?
But Does It Really Work?
Acknowledgement
Opportunities for improvement
Lessons learnt

What is TMLE?

TMLE is a statistical method used for estimating causal effects in observational studies and clinical trials. It combines elements of machine learning and traditional statistical techniques to provide robust estimates of treatment effects while controlling for confounding variables. TMLE operates in two main steps: first, it estimates the outcome model and the treatment model, and then it uses these models to adjust the treatment effect estimate, targeting the parameter of interest directly. This approach is particularly useful in settings where standard methods may be biased or inefficient, as it allows for the incorporation of flexible machine learning algorithms to improve estimation accuracy. You will hear the term Doubly Robust about this method. What’s do robust x 2 about this?

What Does Doubly Robust mean?

Doubly Robust (DR) estimation refers to a statistical property of certain estimators that remain consistent if either the model for the treatment assignment (propensity score) or the model for the outcome is correctly specified, but not necessarily both. In other words, a doubly robust estimator provides two chances for obtaining a valid estimate of the causal effect: if one of the models is misspecified, as long as the other model is correctly specified, the estimator will still yield consistent results. This property is particularly advantageous in observational studies where there may be uncertainty about the correct specification of either model, enhancing the reliability of causal inferences drawn from the data. I didn’t quite understand this until we simulated the data to test this theory. It will, hopefully, be more clear when we go through the simulation. But, wait, what metrics should we use for this? Bias and variance!

What is Bias and Variance?

Bias and variance are two fundamental concepts in statistics and machine learning that describe different sources of error in predictive models. Bias refers to the systematic error that occurs when a model consistently overestimates or underestimates the true value of a parameter. High bias can lead to underfitting, where the model fails to capture the underlying patterns in the data. Variance, on the other hand, refers to the variability of model predictions for different training datasets. High variance can lead to overfitting, where the model captures noise in the training data rather than the true signal. The trade-off between bias and variance is a key consideration in model selection and evaluation, as it affects the overall accuracy and generalizability of predictive models.

The formula for bias is: $$ \text{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta $$ Where:

$\hat{\theta}$ is the estimator of the parameter $\theta$
$E[\hat{\theta}]$ is the expected value of the estimator

In pseudo-R code would look something like this:

predicted_theta <- vector(mode = "numeric", length=1000)

for (i in 1:1000) {
  training_data <- dplyr::slice_sample(original_training_data, n = nrow(original_training_data), replace = T)
  model <- glm(outcome~treatment+confounder,data=training_data)
  outcome_hat_1 <- predict(model,newdata = training_data |> mutate(treatment = 1))
  outcome_hat_0 <- predict(model,newdata = training_data |> mutate(treatment = 0))
  predicted_theta[i] <- mean(outcome_hat_1) - mean(outcome_hat_0)
}

bias <- mean(predicted_theta) - theta

In my own language, bias is, how close our estimation on average is to the true value.

The formula for variance is: $$ \text{Var}(\hat{\theta}) = E[(\hat{\theta} - E[\hat{\theta}])^2] $$

Where:

$\hat{\theta}$ is the estimator of the parameter $\theta$
$E[\hat{\theta}]$ is the expected value of the estimator

In pseudo-R code would look something like this:

predicted_theta <- vector(mode = "numeric", length=1000)

for (i in 1:1000) {
  training_data <- dplyr::slice_sample(original_training_data, n = nrow(original_training_data), replace = T)
  model <- glm(outcome~treatment+confounder,data=training_data)
  outcome_hat_1 <- predict(model,newdata = training_data |> mutate(treatment = 1))
  outcome_hat_0 <- predict(model,newdata = training_data |> mutate(treatment = 0))
  predicted_theta[i] <- mean(outcome_hat_1) - mean(outcome_hat_0)
}

variance <- mean((predicted_theta-mean(predicted_theta))^2)
# or 
variance <- var(predicted_theta)

We will be using bias and variance to test the doubly robust theory. But first, let’s simulate some data!

Simulate Data

library(tidyverse)

set.seed(1)

n <- 10000
W1 <- rnorm(n)
W2 <- rnorm(n)
W3 <- rbinom(n, 1, 0.5)
W4 <- rnorm(n)

# TRUE propensity score model
A <- rbinom(n, 1, plogis(-0.5 + 0.8*W1 + 0.5*W2^2 + 0.3*W3 - 0.4*W1*W2 + 0.2*W4))

# TRUE outcome model
Y <- rbinom(n, 1, plogis(-1 + 0.2*A + 0.6*W1 - 0.4*W2^2 + 0.5*W3 + 0.3*W1*W3 + 0.2*W4^2))

# Calculate TRUE ATE
logit_Y1 <- -1 + 0.2 + 0.6*W1 - 0.4*W2^2 + 0.5*W3 + 0.3*W1*W3 + 0.2*W4^2
logit_Y0 <- -1 + 0 + 0.6*W1 - 0.4*W2^2 + 0.5*W3 + 0.3*W1*W3 + 0.2*W4^2

Y1_true <- plogis(logit_Y1)
Y0_true <- plogis(logit_Y0)
true_ATE <- mean(Y1_true - Y0_true)

df <- tibble(W1 = W1, W2 = W2, W3 = W3, W4 = W4, A = A, Y = Y,
           true_ATE = true_ATE, Y1_true = Y1_true, Y0_true = Y0_true)

Alright, our true ATE here is 0.0373518. We’ll see if doubly robust method can be able to estimate this either outcome or treatment model is misspecified.

Write a Function to Estimate

Let’s look at the WRONG Outcome model ❌

model <- glm(Y ~ A + W1 + W2 + W3 + W4, family = "binomial")

summary(model)

## 
## Call:
## glm(formula = Y ~ A + W1 + W2 + W3 + W4, family = "binomial")
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.045765   0.041489 -25.206   <2e-16 ***
## A           -0.050142   0.047732  -1.050    0.293    
## W1           0.767386   0.026058  29.449   <2e-16 ***
## W2          -0.024726   0.022807  -1.084    0.278    
## W3           0.561572   0.045658  12.300   <2e-16 ***
## W4          -0.003209   0.022382  -0.143    0.886    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 12737  on 9999  degrees of freedom
## Residual deviance: 11519  on 9994  degrees of freedom
## AIC: 11531
## 
## Number of Fisher Scoring iterations: 4

Ouch. Looking quickly at the coefficient of A is -0.0501418. Totally inverse of the true ATE. Alright let’s look at g-computation and see if it returns the same result.

g-computation function

g_comp <- function(model,data,ml=F) {
  if (ml==T) {
   y1 <- predict(model, new_data=data |> mutate(A=as.factor(1)), type = "prob")[,2] |> pull()
   y0 <- predict(model, new_data=data |> mutate(A=as.factor(0)), type = "prob")[,2] |> pull()
  } else {
  y1 <- predict(model, newdata=data |> mutate(A=1), type = "response")
  y0 <- predict(model, newdata=data |> mutate(A=0), type = "response")
  }
  return(mean(y1-y0))
}

g_comp(model,df)

## [1] -0.009823307

Yup, incorrect! Now what if we use the RIGHT Outcome model?

The correct outcome model ✅

model <- glm(Y ~ A + W1 + I(W2^2) + W3 + W1:W2 + W4, family = "binomial")

g_comp(model,df)

## [1] 0.03576854

Wow! Look at that! if we correctly specify the outcome model, it actually is VERY close to true ATE!

The wrong treatment model ❌

Now what if we use IPW but with the wrong treatment model and see if we can estimate ATE

ps_model <- glm(A ~ W1 + W2 + W3 + W4, family = "binomial")  
ps <- ps_model$fitted.values
ps_final <- pmax(pmin(ps, 0.95), 0.05)
weights <- ifelse(A == 1, 1/ps_final, 1/(1-ps_final))

model <- glm(Y ~ A, family = "binomial", weights = weights)
g_comp(model,df)

## [1] -0.0095777

Wow, very wrong indeed! Now let’s look at the right treatment model

The Correct treatment model ✅

ps_model <- glm(A ~ W1 + I(W2^2) + W3 + W1:W2 + W4, family = "binomial")  #-0.5 + 0.8*W1 + 0.5*W2^2 + 0.3*W3 - 0.4*W1*W2 + 0.2*W4

ps <- ps_model$fitted.values
ps_final <- pmax(pmin(ps, 0.95), 0.05)
weights <- ifelse(A == 1, 1/ps_final, 1/(1-ps_final))

model <- glm(Y ~ A, family = "binomial", weights = weights)
g_comp(model,df)

## [1] 0.03874379

Not too shabby! very close to our true ATE! To be honest, how on earth are we supposed to know before hand the complex equation to specify on either treatment or outcome model !?

Let’s Try ML xgboost

Let’s see if xgboost can tease out outcome model without us specifying all these weird interactions and quadratic relationships.

library(tidymodels)
library(doParallel)  
library(future)

workers <- parallel::detectCores(logical = FALSE) - 1
plan(multisession, workers = workers)
future::nbrOfWorkers() 

# Set up parallel processing
all_cores <- parallel::detectCores(logical = FALSE)
cl <- makePSOCKcluster(all_cores - 1)  # Leave 1 core free
registerDoParallel(cl)

df_ml <- df |> 
  select(Y,A,W1,W2,W3,W4) |>
  mutate(Y = as.factor(Y),
         A = as.factor(A))

# Define model specification
xgb_spec <- boost_tree(
  trees = 1000,
  tree_depth = tune(),
  min_n = tune(),
  loss_reduction = tune(),
  # sample_size = tune(),
  mtry = tune(),
  learn_rate = tune()
) %>%
  set_engine("xgboost") %>%
  set_mode("classification")  

# Create workflow
xgb_wf <- workflow() %>%
  add_model(xgb_spec) %>%
  add_formula(Y ~ .)

# Tuning grid
xgb_grid <- grid_space_filling(
  tree_depth(),
  min_n(),
  loss_reduction(),
  finalize(mtry(),df_ml),
  learn_rate(),
  size = 20
)

# Cross-validation and tuning
set.seed(1)
folds <- vfold_cv(df_ml, v = 5)

xgb_res <- tune_grid(
  xgb_wf,
  resamples = folds,
  grid = xgb_grid,
  control = control_grid(save_pred = TRUE,
                         parallel_over = "everything")
)

# Select best model
best_xgb <- select_best(xgb_res, metric = "roc_auc")

# Finalize and fit
final_xgb <- finalize_workflow(xgb_wf, best_xgb)
final_fit <- fit(final_xgb, data = df_ml)

# g-comp
g_comp(final_fit, df_ml, T)

## [1] 0.03447109

Wow, nice! ✅ Quite close to our true ATE without specifying any interactions or quadratic relationship. Mind you, this dataset is quite large.

Now, let’s try out if we can use xgboost to create an accurate treatment model and use its weights to plug into our good trust glm.

# Rereate workflow
xgb_wf <- workflow() %>%
  add_model(xgb_spec) %>%
  add_formula(A ~ .)

# Tuning grid
xgb_grid <- grid_space_filling(
  tree_depth(),
  min_n(),
  loss_reduction(),
  finalize(mtry(),df_ml |> select(-Y)),
  learn_rate(),
  size = 20
)

# Cross-validation and tuning
set.seed(1)
folds <- vfold_cv(df_ml |> select(-Y), v = 5)

xgb_res <- tune_grid(
  xgb_wf,
  resamples = folds,
  grid = xgb_grid,
  control = control_grid(save_pred = TRUE,
                         parallel_over = "everything")
)

# Select best model
best_xgb <- select_best(xgb_res, metric = "roc_auc")

# Finalize and fit
final_xgb <- finalize_workflow(xgb_wf, best_xgb)
final_fit <- fit(final_xgb, data = df_ml |> select(-Y))

# calc ps
ps <- predict(final_fit, new_data = df_ml |> select(-Y), type = "prob")[,2] |> pull()
ps_final <- ps
weights <- ifelse(A == 1, 1/ps_final, 1/(1-ps_final))

# glm model 
model <- glm(Y ~ A, family = "binomial", weights = weights)
g_comp(model, df_ml)

## [1] 0.04840169

Wow, compared to this, our ATE is much closer to our true ATE than the wrongly specified treatment model. Though it’s still quite biased, isn’t it? it’s far from the true ATE. But at least we know ML methods can probably handle these complex relationhip.

Now let’s write function to estimate bias and variance! Since that’s our major question. And then we’ll look into TMLE procedure.

code

bias <- function(predicted_theta, theta) {
  return(mean(predicted_theta - theta))
}

variance <- function(predicted_theta) {
  return(var(predicted_theta))
}

TMLE

Since we know xgboost is able to estimate the correct outcome model, why don’t we just use logistic regression here. Let’s imagine we somehow got only treatment model correct, but not the outcome model, will TMLE be able to tease this out?

Step 1. Create Outcome Model

model_outcome <- glm(Y ~ A + W1 + W2 + W3 + W4, family = "binomial") #wrong
model_outcome_all <- predict(model_outcome, newdata = df, type = "response")
model_outcome_1 <- predict(model_outcome, newdata = df |> mutate(A = 1), type = "response")
model_outcome_0 <- predict(model_outcome, newdata = df |> mutate(A = 0), type = "response")

Step 2. Create Treatment Model & Clever Covariate

model_treatment <- glm(A ~ W1 + I(W2^2) + W3 + W1:W2 + W4, family = "binomial")  #-0.5 + 0.8*W1 + 0.5*W2^2 + 0.3*W3 - 0.4*W1*W2 + 0.2*W4

ps <- model_treatment$fitted.values
ps_final <- ps
# ps_final <- pmax(pmin(ps, 0.95), 0.05)
a_1 <- 1/(predict(model_treatment, df, type = "response"))
a_0 <- -1/(1 - predict(model_treatment, df, type = "response"))
clever_covariate <- ifelse(A == 1, 1/ps_final, -1/(1-ps_final))

Step 3. Estimate Fluctuation Parameter

epsilon_model <- glm(Y ~ -1 + offset(qlogis(model_outcome$fitted.values)) + clever_covariate, family = "binomial")
summary(epsilon_model)

## 
## Call:
## glm(formula = Y ~ -1 + offset(qlogis(model_outcome$fitted.values)) + 
##     clever_covariate, family = "binomial")
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## clever_covariate 0.041886   0.009675   4.329  1.5e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 11519  on 10000  degrees of freedom
## Residual deviance: 11497  on  9999  degrees of freedom
## AIC: 11499
## 
## Number of Fisher Scoring iterations: 4

epsilon <- epsilon_model$coefficients

Step 4. Update Initial Outcomes

updated_outcome_1 <- plogis(qlogis(model_outcome_1)+epsilon*a_1)
updated_outcome_0 <- plogis(qlogis(model_outcome_0)+epsilon*a_0)

Step 5. Compute ATE

(ate <- mean(updated_outcome_1-updated_outcome_0))

## [1] 0.04068321

Wow, imagine specifying the wrong outcome model, but the right treatment model will bring our ATE to closer to the true ATE! Imagine our wrongly specified outcoem model would have produced ATE of -0.009 here.of Not too shabby!

Step 6. Estimate Standard Error

updated_outcome <- ifelse(A == 1, updated_outcome_1, updated_outcome_0)
se <- sqrt(var((Y-updated_outcome)*clever_covariate+updated_outcome_1-updated_outcome_0-ate)/n)
pval <- 2*(1-pnorm(abs(ate/se)))
print(paste0("ATE: ",round(ate,3), " [95%CI ", round(ate-1.96*se,3),"-",round(ate+1.96*se,3),", p=",round(pval,3),"]"))

## [1] "ATE: 0.041 [95%CI 0.021-0.06, p=0]"

There you have it! Our final estimates, standard error, and pval. Thanks to khstats for step by step guidance. Very helpful to reproduce the framework.

Comparing Models

Now, let’s resample 1000 times with n = 6000 (after sample size calculation of the effect we have with 80% power and alpha 5%) of 1) correctly specified logistic regression outcome model. 2) an inverse probability weighting (IPW) approach with correctly specified treatment assignment probabilities . 3) incorrectly specific logistic regression outcome model. 4) Hyperparameter-tuned xgboost outcome model and see what how their biases and variances differ.

My messy code

n_sample <- 1000
df_ate <- tibble(model=character(),bias=numeric(),variance=numeric())

### correct outcome model specification, logistic regression
predicted_ate <- vector(mode = "numeric", length = n_sample)
for (i in 1:n_sample) {
  set.seed(i)
  data <- slice_sample(df, n = 6000, replace = T)
  model <- glm(Y ~ A + W1 + I(W2^2) + W3 + W1:W2 + W4, data = data, family = "binomial")
  ate <- g_comp(model,data)
  predicted_ate[i] <- ate
}

df_ate <- df_ate |>
  bind_rows(tibble(model="logreg_correct_outcome",bias=bias(predicted_ate,true_ATE),variance=variance(predicted_ate)))

### correct treatment model specification, logistic regression ipw
predicted_ate <- vector(mode = "numeric", length = n_sample)
for (i in 1:n_sample) {
  set.seed(i)
  data <- slice_sample(df, n = 6000, replace = T)
  ps_model <- glm(A ~ W1 + I(W2^2) + W3 + W1:W2 + W4, data=data, family = "binomial")  #-0.5 + 0.8*W1 + 0.5*W2^2 + 0.3*W3 - 0.4*W1*W2 + 0.2*W4

  ps <- ps_model$fitted.values
  ps_final <- pmax(pmin(ps, 0.95), 0.05)
  weights <- ifelse(data$A == 1, 1/ps_final, 1/(1-ps_final))

  model <- glm(Y ~ A, data=data, family = "binomial", weights = weights)
  ate <- g_comp(model,data)
  predicted_ate[i] <- ate
}

df_ate <- df_ate |>
  bind_rows(tibble(model="logreg_treatment_outcome",bias=bias(predicted_ate,true_ATE),variance=variance(predicted_ate)))

### no interaction or quadratic relationship outcome model
predicted_ate <- vector(mode = "numeric", length = n_sample)
for (i in 1:n_sample) {
  set.seed(i)
  data <- slice_sample(df, n = 6000, replace = T)
  model <- glm(Y ~ A + W1 + W2 + W3 + W4, data = data, family = "binomial")
  ate <- g_comp(model,data)
  predicted_ate[i] <- ate
}

df_ate <- df_ate |>
  bind_rows(tibble(model="logreg_wrong_outcome",bias=bias(predicted_ate,true_ATE),variance=variance(predicted_ate)))


### xgboost
library(tidymodels)
library(future)

plan(multisession, workers = 6)

predicted_ate <- vector(mode = "numeric", length = n_sample)
for (i in 1:n_sample) {
set.seed(i)
data <- slice_sample(df, n = 6000, replace = T) |> select(Y,A,W1:W4) |> mutate(Y = as.factor(Y), A = as.factor(A))
train <- data # you really don't need to do this, but i was lazy to change the oother ML codes

xgb_spec <- boost_tree(
  trees = tune(),
  tree_depth = tune(),
  min_n = tune(),
  loss_reduction = tune(),
  mtry = tune(),
  learn_rate = tune()
) %>%
  set_engine("xgboost") %>%
  set_mode("classification")  

# Create workflow
xgb_wf <- workflow() %>%
  add_model(xgb_spec) %>%
  add_formula(Y ~ .)

# Tuning grid
xgb_grid <- grid_space_filling(
  trees(),
  tree_depth(),
  min_n(),
  loss_reduction(),
  finalize(mtry(),train),
  learn_rate(),
  size = 5
)

# Cross-validation and tuning
folds <- vfold_cv(train, v = 5)

xgb_res <- tune_grid(
  xgb_wf,
  resamples = folds,
  grid = xgb_grid,
  control = control_grid(save_pred = TRUE,
                         verbose = T)
)

# Select best model
best_xgb <- select_best(xgb_res, metric = "roc_auc")

# Finalize and fit
final_xgb <- finalize_workflow(xgb_wf, best_xgb)
final_fit <- fit(final_xgb, data = train)

# g-comp
ate <- g_comp(final_fit, train, T)

predict_ate[i] <- ate
}

df_ate <- df_ate |>
  bind_rows(tibble(model="xgboost_outcome_model",bias=bias(predicted_ate,true_ATE),variance=variance(predicted_ate)))

model	bias	variance
logreg_correct_outcome	-0.0009778	0.0001410
logreg_treatment_outcome	0.0019444	0.0001581
logreg_wrong_outcome	-0.0464683	0.0001354
xgboost_outcome_model	-0.0164066	0.0000459

Wow, look at that! When outcome or treatment models were correctly specified, logistic regression is still the best with the lowest bias and low variance. When the outcome model was incorrectly specified, it became more biased and variance didn’t really change much. When we used xgboost for outcome model only, it’s less biased than misspecified logistic regression outcome model and interestingly, it has the lowest variance. Very interesting! This I think is helpful because it seems like tree based model is able to tease out quadratic and interaction relationship without us having to specify it. Now, what if we use xgboost models for both outcome and treatment models and then use TMLE framework to see if they are any better!

Using TMLE procedures with Xgboost

We’ll do the same as above by resampling 1000 times with n of 6000 with replacement. Then assess the bias and variance of the ATE. We will use xghboost model for both treatment and outcome models. Then use the TMLE procedure for the test as shown above. As for tuning, we will use grid search with space filling (grid_space_filling with size of 5).

My messy code

library(tidymodels)
library(tidyverse)
library(future)
plan(multisession, workers = 4)
predicted_ate <- vector(mode = "numeric", length = n_sample)}

for (i in n_sample:1) {

# sample
data <- slice_sample(df, n = 6000, replace = T) |> select(Y,A,W1:W4) |> mutate(Y = as.factor(Y), A = as.factor(A))
train <- data

# outcome model
xgb_spec <- boost_tree(
  trees = tune(),
  tree_depth = tune(),
  min_n = tune(),
  loss_reduction = tune(),
  # sample_size = tune(),
  mtry = tune(),
  learn_rate = tune()
) %>%
  set_engine("xgboost") %>%
  set_mode("classification")  

# Create workflow
xgb_wf <- workflow() %>%
  add_model(xgb_spec) %>%
  add_formula(Y ~ .)

# Tuning grid
xgb_grid <- grid_space_filling(
  trees(),
  tree_depth(),
  min_n(),
  loss_reduction(),
  finalize(mtry(),train),
  learn_rate(),
  size = 5
)

# Cross-validation and tuning
folds <- vfold_cv(train, v = 5)

xgb_res <- tune_grid(
  xgb_wf,
  resamples = folds,
  grid = xgb_grid,
  control = control_grid(save_pred = TRUE,
                         # parallel_over = "resamples", 
                         verbose = T)
)

# Select best model
best_xgb <- select_best(xgb_res, metric = "roc_auc")

# Finalize and fit
final_xgb <- finalize_workflow(xgb_wf, best_xgb)
final_fit <- fit(final_xgb, data = train)

# predict
outcome <- predict(final_fit, new_data = train, type = "prob")[,2] |> pull()
outcome_0 <- predict(final_fit, new_data = train |> mutate(A = as.factor(0)), type = "prob")[,2] |> pull()
outcome_1 <- predict(final_fit, new_data = train |> mutate(A = as.factor(1)), type = "prob")[,2] |> pull()

# -------------------------------------------------------
# treatment model 
train_tx <- train |> select(-Y)

xgb_wf_tx <- workflow() %>%
  add_model(xgb_spec) %>%
  add_formula(A ~ .)


xgb_grid_tx <- grid_space_filling(
  trees(),
  tree_depth(),
  min_n(),
  loss_reduction(),
  finalize(mtry(),train_tx),
  learn_rate(),
  size = 5
)

# Cross-validation and tuning
folds_tx <- vfold_cv(train_tx, v = 5)

xgb_res_tx <- tune_grid(
  xgb_wf_tx,
  resamples = folds_tx,
  grid = xgb_grid_tx,
  control = control_grid(save_pred = TRUE,
                         verbose = T)
)

# Select best model
best_xgb_tx <- select_best(xgb_res_tx, metric = "roc_auc")

# Finalize and fit
final_xgb_tx <- finalize_workflow(xgb_wf_tx, best_xgb)
final_fit_tx <- fit(final_xgb_tx, data = train_tx)

ps <- predict(final_fit_tx, new_data = train_tx |> select(-A), type = "prob")[,2] |> pull()
ps_final <- ps
a_1 <- 1/(predict(final_fit_tx, new_data = train_tx, type = "prob")[,2] |> pull())
a_0 <- -1/(1 - (predict(final_fit_tx, new_data = train_tx, type = "prob")[,2] |> pull()))
clever_covariate <- ifelse(train_tx$A == 1, 1/ps_final, -1/(1-ps_final))

# step 3 
epsilon_model <- glm(train$Y ~ -1 + offset(qlogis(outcome)) + clever_covariate, family = "binomial")
summary(epsilon_model)
epsilon <- epsilon_model$coefficients

#### Step 4. Update Initial Outcomes
updated_outcome_1 <- plogis(qlogis(outcome_1)+epsilon*a_1)
updated_outcome_0 <- plogis(qlogis(outcome_0)+epsilon*a_0)

#### Step 5. Compute ATE
ate <- mean(updated_outcome_1-updated_outcome_0)

predicted_ate[i] <- ate
}

df_ate <- df_ate |>
bind_rows(tibble(model="xgboost_tmle_model_size5",bias=bias(predicted_ate,true_ATE),variance=variance(predicted_ate)))

model	bias	variance
logreg_correct_outcome	-0.0009778	0.0001410
logreg_treatment_outcome	0.0019444	0.0001581
logreg_wrong_outcome	-0.0464683	0.0001354
xgboost_outcome_model	-0.0164066	0.0000459
xgboost_tmle_model_size5	0.0066270	0.0001399

Wow, how about that! Less biased than xgboost of outcome model only! Although the variance appear to be higher than pure xgboost outcome model, but it’s about the same as logistic regression. It does seem like TMLE can improve bias. Do you think we can do better? What if we increase the size to 20?

My messy code

predicted_ate <- vector(mode = "numeric", length = n_sample)

for (i in 1:n_sample) {
  set.seed(i)
data <- slice_sample(df, n = 6000, replace = T) |> select(Y,A,W1:W4) |> mutate(Y = as.factor(Y), A = as.factor(A))
train <- data
# train <- df |> select(Y,A,W1:W4) |> mutate(Y = as.factor(Y), A = as.factor(A))


# outcome model
xgb_spec <- boost_tree(
  trees = tune(),
  tree_depth = tune(),
  min_n = tune(),
  loss_reduction = tune(),
  # sample_size = tune(),
  mtry = tune(),
  learn_rate = tune()
) %>%
  set_engine("xgboost") %>%
  set_mode("classification")  

# Create workflow
xgb_wf <- workflow() %>%
  add_model(xgb_spec) %>%
  add_formula(Y ~ .)

# Tuning grid
xgb_grid <- grid_space_filling(
  trees(),
  tree_depth(),
  min_n(),
  loss_reduction(),
  finalize(mtry(),train),
  learn_rate(),
  size = 20
)

# Cross-validation and tuning
folds <- vfold_cv(train, v = 5)

xgb_res <- tune_grid(
  xgb_wf,
  resamples = folds,
  grid = xgb_grid,
  control = control_grid(save_pred = TRUE,
                         # parallel_over = "resamples", 
                         verbose = T)
)

# Select best model
best_xgb <- select_best(xgb_res, metric = "roc_auc")

# Finalize and fit
final_xgb <- finalize_workflow(xgb_wf, best_xgb)
final_fit <- fit(final_xgb, data = train)

# predict
outcome <- predict(final_fit, new_data = train, type = "prob")[,2] |> pull()
outcome_0 <- predict(final_fit, new_data = train |> mutate(A = as.factor(0)), type = "prob")[,2] |> pull()
outcome_1 <- predict(final_fit, new_data = train |> mutate(A = as.factor(1)), type = "prob")[,2] |> pull()

# -------------------------------------------------------
# treatment model 
train_tx <- train |> select(-Y)

xgb_wf_tx <- workflow() %>%
  add_model(xgb_spec) %>%
  add_formula(A ~ .)


xgb_grid_tx <- grid_space_filling(
  trees(),
  tree_depth(),
  min_n(),
  loss_reduction(),
  finalize(mtry(),train_tx),
  learn_rate(),
  size = 20
)

# Cross-validation and tuning
folds_tx <- vfold_cv(train_tx, v = 5)

xgb_res_tx <- tune_grid(
  xgb_wf_tx,
  resamples = folds_tx,
  grid = xgb_grid_tx,
  control = control_grid(save_pred = TRUE,
                         verbose = T)
)

# Select best model
best_xgb_tx <- select_best(xgb_res_tx, metric = "roc_auc")

# Finalize and fit
final_xgb_tx <- finalize_workflow(xgb_wf_tx, best_xgb)
final_fit_tx <- fit(final_xgb_tx, data = train_tx)

ps <- predict(final_fit_tx, new_data = train_tx |> select(-A), type = "prob")[,2] |> pull()
ps_final <- ps
# ps_final <- pmax(pmin(ps, 0.95), 0.05)
a_1 <- 1/(predict(final_fit_tx, new_data = train_tx, type = "prob")[,2] |> pull())
a_0 <- -1/(1 - (predict(final_fit_tx, new_data = train_tx, type = "prob")[,2] |> pull()))
clever_covariate <- ifelse(train_tx$A == 1, 1/ps_final, -1/(1-ps_final))

# step 3 
epsilon_model <- glm(train$Y ~ -1 + offset(qlogis(outcome)) + clever_covariate, family = "binomial")
summary(epsilon_model)
epsilon <- epsilon_model$coefficients

#### Step 4. Update Initial Outcomes
updated_outcome_1 <- plogis(qlogis(outcome_1)+epsilon*a_1)
updated_outcome_0 <- plogis(qlogis(outcome_0)+epsilon*a_0)

#### Step 5. Compute ATE
ate <- mean(updated_outcome_1-updated_outcome_0)

model	bias	variance
logreg_correct_outcome	-0.0009778	0.0001410
logreg_treatment_outcome	0.0019444	0.0001581
logreg_wrong_outcome	-0.0464683	0.0001354
xgboost_outcome_model	-0.0164066	0.0000459
xgboost_tmle_model_size5	0.0066270	0.0001399
xgboost_tmle_model_size20	0.0041615	0.0001645

Look at that! Less biased than tmle size of 5, but you can see variance start to increase. This is really cool!

Is there a traditional statistical method for this?

Yes, apparently there is! Augmented IPW estimator here provides a doubly robust method for estimating causal effects that combines propensity score weighting with outcome modeling. AIPW remains consistent as long as either the propensity score model or the outcome model is correctly specified (sounds familiar? It’s like TMLE!), making it more reliable than traditional methods when model specification is uncertain. Here is AIPW R package, if need to use this in the future.

This doubly robust method also reminds me of Double Machine Learning (DML), we should compare all these methods in the future!

But Does It Really Work?

After the blog was published on 11/16/25, Frank Harrell stated “No examination of TMLE is complete without critical examination of the non-coverage probabilities in both tails of confidence intervals. Prepare to be disappointed. Be sure to estimate non-coverage separately in the two tails. Some confidence interval procedures are accurate with overall coverage despite being wrong in both tails.” It’s true that we did not assess the 95% confidence interval with our resampling. Coming from a very experienced statistician, he is most likely right. Now, let’s examine the non-coverage tails! Let’s compare 1) logistic regression (correctly specificed outcome model), 2) logistic regression w IPW (correctly specified treatment model), 3) logistic regression (misspecified outcome model), 4) TMLE with xgboost (with grid search size of 5), 5) TMLE with xgboost (with grid search size of 20). Then, we’ll assess their 95% CIs and calculate what are the proportions of non-coverage in left and right tails across the 5 comparisons above and visualize them!

For logistic regression models, we use bootstrap with 1000 resamples to estimate standard error. For TMLE models, we will use the estimated standard error from the TMLE procedure.

my messy code for logistic regression

library(dplyr)
library(future)
library(future.apply)

set.seed(1)

n <- 10000
W1 <- rnorm(n)
W2 <- rnorm(n)
W3 <- rbinom(n, 1, 0.5)
W4 <- rnorm(n)

# TRUE propensity score model
A <- rbinom(n, 1, plogis(-0.5 + 0.8*W1 + 0.5*W2^2 + 0.3*W3 - 0.4*W1*W2 + 0.2*W4))

# TRUE outcome model
Y <- rbinom(n, 1, plogis(-1 + 0.2*A + 0.6*W1 - 0.4*W2^2 + 0.5*W3 + 0.3*W1*W3 + 0.2*W4^2))

# Calculate TRUE ATE
logit_Y1 <- -1 + 0.2 + 0.6*W1 - 0.4*W2^2 + 0.5*W3 + 0.3*W1*W3 + 0.2*W4^2
logit_Y0 <- -1 + 0 + 0.6*W1 - 0.4*W2^2 + 0.5*W3 + 0.3*W1*W3 + 0.2*W4^2

Y1_true <- plogis(logit_Y1)
Y0_true <- plogis(logit_Y0)
true_ATE <- mean(Y1_true - Y0_true)

df <- tibble(W1 = W1, W2 = W2, W3 = W3, W4 = W4, A = A, Y = Y)

g_comp <- function(model,data,ml=F) {
  if (ml==T) {
    y1 <- predict(model, new_data=data |> mutate(A=as.factor(1)), type = "prob")[,2] |> pull()
    y0 <- predict(model, new_data=data |> mutate(A=as.factor(0)), type = "prob")[,2] |> pull()
  } else {
    y1 <- predict(model, newdata=data |> mutate(A=1), type = "response")
    y0 <- predict(model, newdata=data |> mutate(A=0), type = "response")
  }
  return(mean(y1-y0))
}


# Set up parallel backend
plan(multisession, workers = 10)  # or plan(multisession)

n_sample <- 1000
predicted_ate <- predicted_lower <- predicted_upper <- vector(mode = "numeric", length = n_sample)
n_boot <- 1000
i_boot_seed <- runif(n_boot, 0, n_boot * 1000)  # Better seed range
n_boot_sample <- 6000

df_se <- tibble(method=character(),ate=numeric(),lower=numeric(),upper=numeric())

formula_list <- list()
formula_list[[1]] <- as.formula(Y ~ A + W1 + I(W2^2) + W3 + W1:W3 + I(W4^2))
formula_list[[2]] <- as.formula(Y ~ A + W1 + W2 + W3 + W4)
formula_list[[3]] <- as.formula(A ~ W1 + I(W2^2) + W1:W2 + I(W4^2))

model_function <- function(formula,data,method) {
  if (method=="logreg_outcome_correct") {
    model_final <- glm(formula = formula, data = data, family = "binomial")
  }
  if (method=="logreg_treatment_correct") {
  ps_model <- glm(formula=formula,data=data,family="binomial")
  ps <- ps_model$fitted.values
  ps_final <- pmax(pmin(ps, 0.95), 0.05)
  weights <- ifelse(data$A == 1, 1/ps_final, 1/(1-ps_final))
  model_final <- glm(Y~A,data=data,family="binomial",weights=weights)
  }
  if (method=="logreg_outcome_wrong") {
    model_final <- glm(formula = formula, data = data, family = "binomial")
  }
  if (is.null(method)) { stop("need method") }
  ate <- g_comp(model_final,data)
  return(ate)
}

# Create the bootstrap function
bootstrap_ate <- function(j, data, formula, n_boot_sample, i_boot_seed,method) {
  set.seed(i_boot_seed[j])
  data_boot <- slice_sample(data, n = n_boot_sample, replace = TRUE)
  ate_boot <- model_function(formula,data_boot,method=method)
  return(ate_boot)
}


# method vector 
method_vec <- c("logreg_outcome_correct","logreg_treatment_correct","logreg_outcome_wrong")

for (method in method_vec) {
  if (method=="logreg_outcome_correct") { formula <- formula_list[[1]] }
  if (method=="logreg_treatment_correct") { formula <- formula_list[[3]] }
  if (method=="logreg_outcome_wrong") { formula <- formula_list[[2]] }
for (i in 1:n_sample) {
  set.seed(i)
  data <- slice_sample(df, n = n_boot_sample, replace = TRUE)
  # model <- glm(formula = formula, data = data, family = "binomial")
  ate <- model_function(formula,data,method=method)
  predicted_ate[i] <- ate
  
  ate_vec <- future_lapply(1:n_boot, bootstrap_ate, 
                           data = data, 
                           formula = formula,
                           n_boot_sample = n_boot_sample,
                           i_boot_seed = i_boot_seed,
                           method = method,
                           future.globals = list(g_comp = g_comp, model_function=model_function),
                           future.seed = NULL,  # Use NULL since you're doing manual seeding
                           future.packages = c("dplyr"))
  
  # Convert list to numeric vector
  ate_vec <- unlist(ate_vec)
  
  # Calculate confidence intervals
  predicted_lower[i] <- quantile(ate_vec, 0.025)
  predicted_upper[i] <- quantile(ate_vec, 0.975)
  
  print(paste("Sample", i, "ATE:", round(ate, 4), 
              "CI: [", round(predicted_lower[i], 4), ",", round(predicted_upper[i], 4), "]"))
  df_se <- df_se |>
    rbind(tibble(method = method, ate=predicted_ate[i], lower=predicted_lower[i],upper=predicted_upper[i]))
}
}

# Clean up
plan(sequential)

df_se2 <- df_se
save(df_se2, file = "logreg_bootstrap_tails.rda")

my messy code for tmle

### change size 5 vs 20

library(dplyr)
library(tidymodels)
library(future)
plan(multisession, workers = 5)

set.seed(1)

n <- 10000
W1 <- rnorm(n)
W2 <- rnorm(n)
W3 <- rbinom(n, 1, 0.5)
W4 <- rnorm(n)

# TRUE propensity score model
A <- rbinom(n, 1, plogis(-0.5 + 0.8*W1 + 0.5*W2^2 + 0.3*W3 - 0.4*W1*W2 + 0.2*W4))

# TRUE outcome model
Y <- rbinom(n, 1, plogis(-1 + 0.2*A + 0.6*W1 - 0.4*W2^2 + 0.5*W3 + 0.3*W1*W3 + 0.2*W4^2))

# Calculate TRUE ATE
logit_Y1 <- -1 + 0.2 + 0.6*W1 - 0.4*W2^2 + 0.5*W3 + 0.3*W1*W3 + 0.2*W4^2
logit_Y0 <- -1 + 0 + 0.6*W1 - 0.4*W2^2 + 0.5*W3 + 0.3*W1*W3 + 0.2*W4^2

Y1_true <- plogis(logit_Y1)
Y0_true <- plogis(logit_Y0)
true_ATE <- mean(Y1_true - Y0_true)

df <- tibble(W1 = W1, W2 = W2, W3 = W3, W4 = W4, A = A, Y = Y)

# sample
n_sample <- 1000
n_i <- 6000

predicted_ate <- vector(mode = "numeric", length = n_sample)
pred_se <- vector(mode = "numeric", length = n_sample)

for (i in 1:n_sample) {
  set.seed(i)
data <- slice_sample(df, n = n_i, replace = T) |> select(Y,A,W1:W4) |> mutate(Y = as.factor(Y), A = as.factor(A))
train <- data


# outcome model
xgb_spec <- boost_tree(
  trees = tune(),
  tree_depth = tune(),
  min_n = tune(),
  loss_reduction = tune(),
  # sample_size = tune(),
  mtry = tune(),
  learn_rate = tune()
) %>%
  set_engine("xgboost") %>%
  set_mode("classification")  

# Create workflow
xgb_wf <- workflow() %>%
  add_model(xgb_spec) %>%
  add_formula(Y ~ .)

# Tuning grid
xgb_grid <- grid_space_filling(
  trees(),
  tree_depth(),
  min_n(),
  loss_reduction(),
  finalize(mtry(),train),
  learn_rate(),
  size = 5 # here <------------
)

# Cross-validation and tuning
folds <- vfold_cv(train, v = 5)

xgb_res <- tune_grid(
  xgb_wf,
  resamples = folds,
  grid = xgb_grid,
  control = control_grid(save_pred = TRUE,
                         parallel_over = "everything",
                         verbose = T)
)

# Select best model
best_xgb <- select_best(xgb_res, metric = "roc_auc")

# Finalize and fit
final_xgb <- finalize_workflow(xgb_wf, best_xgb)
final_fit <- fit(final_xgb, data = train)

# predict
outcome <- predict(final_fit, new_data = train, type = "prob")[,2] |> pull()
outcome_0 <- predict(final_fit, new_data = train |> mutate(A = as.factor(0)), type = "prob")[,2] |> pull()
outcome_1 <- predict(final_fit, new_data = train |> mutate(A = as.factor(1)), type = "prob")[,2] |> pull()

# -------------------------------------------------------
# treatment model 
train_tx <- train |> select(-Y)

xgb_wf_tx <- workflow() %>%
  add_model(xgb_spec) %>%
  add_formula(A ~ .)


xgb_grid_tx <- grid_space_filling(
  trees(),
  tree_depth(),
  min_n(),
  loss_reduction(),
  finalize(mtry(),train_tx),
  learn_rate(), 
  size = 5 # and here <-----------------
)

# Cross-validation and tuning
folds_tx <- vfold_cv(train_tx, v = 5)

xgb_res_tx <- tune_grid(
  xgb_wf_tx,
  resamples = folds_tx,
  grid = xgb_grid_tx,
  control = control_grid(save_pred = TRUE,
                         verbose = T)
)

# Select best model
best_xgb_tx <- select_best(xgb_res_tx, metric = "roc_auc")

# Finalize and fit
final_xgb_tx <- finalize_workflow(xgb_wf_tx, best_xgb)
final_fit_tx <- fit(final_xgb_tx, data = train_tx)

ps <- predict(final_fit_tx, new_data = train_tx |> select(-A), type = "prob")[,2] |> pull()
ps_final <- ps
# ps_final <- pmax(pmin(ps, 0.95), 0.05)
a_1 <- 1/(predict(final_fit_tx, new_data = train_tx, type = "prob")[,2] |> pull())
a_0 <- -1/(1 - (predict(final_fit_tx, new_data = train_tx, type = "prob")[,2] |> pull()))
clever_covariate <- ifelse(train_tx$A == 1, 1/ps_final, -1/(1-ps_final))

# step 3 
epsilon_model <- glm(train$Y ~ -1 + offset(qlogis(outcome)) + clever_covariate, family = "binomial")
summary(epsilon_model)
epsilon <- epsilon_model$coefficients

#### Step 4. Update Initial Outcomes
updated_outcome_1 <- plogis(qlogis(outcome_1)+epsilon*a_1)
updated_outcome_0 <- plogis(qlogis(outcome_0)+epsilon*a_0)

#### Step 5. Compute ATE
ate <- mean(updated_outcome_1-updated_outcome_0)


### step 6. SE
updated_outcome <- ifelse(train$A == 1, updated_outcome_1, updated_outcome_0)
se <- sqrt(var((as.numeric(train$Y==1)-updated_outcome)*clever_covariate+updated_outcome_1-updated_outcome_0-ate)/n_i)

predicted_ate[i] <- ate
pred_se[i] <- se
print(paste0("i: ",i, " ate: ", ate, " se: ",se))
if ((i %% 2)==0) { save(predicted_ate, pred_se, file = "predicted_ate_tmle_se_5.rda") } # and make sure you change file name when saving too <--------- size 5 vs 20
}

visualization code

library(tidyverse)
load("logreg_bootstrap_tails.rda")
load("predicted_ate_tmle_se_5.rda")
xgb_tmle_5_se_df <- tibble(ate=predicted_ate,se=pred_se)
load("predicted_ate_tmle_se_20.rda")
xgb_tmle_20_se_df <- tibble(ate=predicted_ate,se=pred_se)

#### combine all
df_all_ci <- xgb_tmle_20_se_df |>
  mutate(lower_ci = ate - 1.96*se,
         upper_ci = ate + 1.96*se) |>
  mutate(
    method = "tmle_xgboost_size20"
  ) |>
  select(method, ate, lower_ci, upper_ci) |>
  rbind(xgb_tmle_5_se_df |>
          mutate(lower_ci = ate - 1.96*se,
                 upper_ci = ate + 1.96*se) |>
          mutate(
            method = "tmle_xgboost_size5"
          ) |>
          select(method,ate,lower_ci,upper_ci)) |>
  rbind(df_se2 |>
          rename(lower_ci=lower,upper_ci=upper)) |>
  mutate(coverage = case_when(
    lower_ci > true_ATE ~ "left_miss",
    upper_ci < true_ATE ~ "right_miss",
    TRUE ~ "within"
  )) 

##### calculate prop
coverage_df <- df_all_ci |>
  group_by(method,coverage) |>
  summarize(prop = n()*100/1000) |>
  ungroup(coverage) |>
  pivot_wider(id_cols = c("method"), names_from = "coverage", values_from = "prop", values_fill = 0) |>
    mutate(stat = paste0("right missed: ", right_miss,"%, covered: ",within,"%, left missed: ",left_miss,"%"))

coverage_order <- coverage_df |>
  arrange(desc(within)) |>
  pull(method)

coverage_df <- coverage_df |>
  mutate(method = factor(method, levels = coverage_order))

plot <- df_all_ci |>
  group_by(method) |>
  arrange(ate) |>
  mutate(row = row_number(),
         method = factor(method, levels = coverage_order)) |>
  ggplot(aes(x=row,y=ate,color=coverage)) +
  geom_point() +
  geom_ribbon(aes(ymin=lower_ci,ymax=upper_ci,fill=coverage),alpha=0.5) +
  geom_hline(yintercept = true_ATE, linewidth=0.5, color="blue") +
  geom_text(data=coverage_df, aes(x=500,y=-0.05,label=stat), inherit.aes = F) +
  theme_bw() +
  facet_wrap(.~method, ncol=1) +
  theme(legend.position = "bottom") +
  xlab("bootstrap no.")

Wow, Frank was right to say “prepare to be disappointed!” 😅 The correctly specified logistic regression outcome model was the clear winner with 95.0% coverage and pretty symmetric tails (1.7% left miss, 3.3% right miss) - basically exactly what you’d want to see. The correctly specified treatment model with IPW did reasonably well at 92.4% coverage, though it had some asymmetry (7.0% left miss, 0.6% right miss). But wow, the misspecified logistic regression outcome model was a disaster - only 2.8% coverage with almost everything missing on the right tail (97.2% right miss)!

Now here’s where it gets interesting: TMLE with XGBoost grid search size 5 achieved 87.6% coverage, which isn’t terrible, but it had this concerning asymmetry (11.3% left miss, 1.1% right miss). When I cranked up the grid search to size 20, thinking it would get better, it actually got worse - dropped to 71.1% coverage with misses on both sides (21.9% left miss, 7.0% right miss). So Frank’s warning was spot on! While TMLE definitely saved us from the catastrophic failure of misspecified parametric models, those confidence intervals just don’t behave properly. The machine learning component does great at capturing complex relationships without us having to specify all those weird interactions, but the price is wonky uncertainty estimates. Seems like TMLE’s real strength is getting better point estimates, not reliable confidence intervals. 🤔

Acknowledgements

Thanks to Frank Harrell for the pointers! We can see the non-coverage tails much clearer for all methods!

Opportunities for improvement

learn to use future_lapply with distributed computing
test out different ML models for TMLE
does it matter if we use all data (like above) or does ATE change with train/test split?

Lessons learnt

learnt to include anchor
learnt from offset does in glm
learnt TMLE’s procedure
learnt to use future parallel computing

If you like this article:

please feel free to send me a comment or visit my other blogs
please feel free to follow me on BlueSky, twitter, GitHub or Mastodon
if you would like collaborate please feel free to contact me

Posted on:: November 16, 2025

Length:: 28 minute read, 5837 words

Categories:: r R tmle xgboost tidymodels future g computation bias variance

Tags:: r R tmle xgboost tidymodels future g computation bias variance

See Also:: My Attempt To Reproduce Stanford HIVdb Sequence and Mutation Analysis From Scratch; Learning And Exploring The Workflow of RNA-Seq Analysis - A Note To Myself; Learning About Assemblying DNA, BLAST, and Multiple Loci Sequence Testing (MLST) - A Note To Myself