library(tidyverse)
library(marginaleffects)
<- psych::bfi |>
bfi mutate(
gender = factor(gender),
A1 = ordered(A1)
|>
) drop_na(A1, age, gender)
nrow(bfi)
#> [1] 2784
<- MASS::polr(A1 ~ age + gender,
fit data = bfi,
Hess = TRUE)
Given an ordinal regression model, it is relatively easy to get class-wise predictions - the conditional predicted probability of each level of the outcome. However, often, one might be interested in summarizing effects not on the class-wise probability scale (nor on the latent scale), but instead on the rank scale - the expected ordinal level, expressed as a single number.
The {emmeans}
package has the mode = "mean.class"
that does just this.
Let’s see how we can do this in {marginaleffects}
by utilizing the powerful hypothesis=
argument.
Fit a model
Class-Wise Predictions
<- predictions(fit, variables = "gender")
pr1
pr1#>
#> Group Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> 1 0.1800 0.01136 15.85 <0.001 185.5 0.15777 0.2023
#> 1 0.1886 0.01128 16.71 <0.001 205.9 0.16649 0.2107
#> 1 0.1843 0.01132 16.28 <0.001 195.6 0.16210 0.2065
#> 1 0.1843 0.01132 16.28 <0.001 195.6 0.16210 0.2065
#> 1 0.1843 0.01132 16.28 <0.001 195.6 0.16210 0.2065
#> --- 33398 rows omitted. See ?print.marginaleffects ---
#> 6 0.0289 0.00335 8.64 <0.001 57.3 0.02234 0.0355
#> 6 0.0231 0.00264 8.78 <0.001 59.1 0.01798 0.0283
#> 6 0.0219 0.00250 8.76 <0.001 58.8 0.01699 0.0268
#> 6 0.0207 0.00238 8.71 <0.001 58.1 0.01604 0.0254
#> 6 0.0121 0.00165 7.35 <0.001 42.2 0.00891 0.0154
#> Type: probs
nrow(pr1) # original data * 2 (counterfactual gender) * 6 (levels of A1)
#> [1] 33408
For each observation in the original data frame we now have 12 predictions:
- 2 counterfactual predictions for the two levels of gender, times
- 6 (arguably also counterfactual) predictions for each of the possible outcome levels.
Sum-scores (mean rank)
We can average the class ranks by their predicted probability to obtain “sum scores”:
rowidcf
variable):
<- function(x) {
sum_score |>
x arrange(rowidcf, group, gender) |>
# for each counterfactual row and level of gender
summarise(
term = "sum score",
estimate = sum(estimate * (1:6)),
.by = c(rowidcf, gender)
)
}
<- predictions(fit, variables = "gender",
pr2 hypothesis = sum_score)
pr2#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> 3.02 0.0583 51.8 <0.001 Inf 2.90 3.13
#> 2.51 0.0451 55.6 <0.001 Inf 2.42 2.60
#> 2.97 0.0553 53.7 <0.001 Inf 2.86 3.08
#> 2.46 0.0415 59.4 <0.001 Inf 2.38 2.55
#> 3.00 0.0568 52.8 <0.001 Inf 2.88 3.11
#> --- 5558 rows omitted. See ?print.marginaleffects ---
#> 2.24 0.0305 73.4 <0.001 Inf 2.18 2.30
#> 2.67 0.0473 56.4 <0.001 Inf 2.57 2.76
#> 2.20 0.0304 72.2 <0.001 Inf 2.14 2.26
#> 2.26 0.0645 35.0 <0.001 890.8 2.13 2.38
#> 1.85 0.0457 40.6 <0.001 Inf 1.77 1.94
#> Term: sum score
#> Type: probs
nrow(pr2) # original data * 2 (counterfactual gender)
#> [1] 5568
As expected, we now have 2 predictions for each observation in the original data frame: - 2 counterfactual predictions for the two levels of gender
Note that we could have also computed the mean rank for the two levels of gender, or literally anything else. But here I am trying to mimic the basic behavior of the avg_/comparisons()
functions by keeping with the workflow:
- Compute observation wise counterfactual predictions
- Contrast them for each observation
- (Possibly) average them
So let’s get those contrasts!
Contrast the ranks
<- function(x) {
sum_score.contr # same as before, and...
sum_score(x) |>
# compute a single contrast for each counterfactual row
summarise(
term = "gender (2-1)",
estimate = estimate[gender == "2"] - estimate[gender == "1"],
.by = rowidcf
) }
This function will compute a difference between the rank for each gender for each observation.
<- predictions(fit, variables = "gender",
pr3 hypothesis = sum_score.contr)
pr3#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> -0.511 0.0588 -8.68 <0.001 57.8 -0.626 -0.395
#> -0.506 0.0584 -8.67 <0.001 57.7 -0.621 -0.392
#> -0.509 0.0586 -8.68 <0.001 57.8 -0.623 -0.394
#> -0.509 0.0586 -8.68 <0.001 57.8 -0.623 -0.394
#> -0.509 0.0586 -8.68 <0.001 57.8 -0.623 -0.394
#> --- 2774 rows omitted. See ?print.marginaleffects ---
#> -0.504 0.0582 -8.67 <0.001 57.6 -0.618 -0.390
#> -0.483 0.0560 -8.62 <0.001 57.1 -0.593 -0.373
#> -0.477 0.0554 -8.61 <0.001 56.9 -0.586 -0.368
#> -0.471 0.0548 -8.59 <0.001 56.7 -0.578 -0.363
#> -0.403 0.0488 -8.26 <0.001 52.6 -0.499 -0.307
#> Term: gender (2-1)
#> Type: probs
nrow(pr3) # original data
#> [1] 2784
Average Contrast
Finally, we can get the average difference:
# average contrast
<- function(x) {
avg_sum_score.contr sum_score.contr(x) |>
summarise(
term = "avg. gender (2-1)",
estimate = mean(estimate)
)
}
predictions(fit, variables = "gender",
hypothesis = avg_sum_score.contr)
#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> -0.474 0.0551 -8.61 <0.001 56.9 -0.582 -0.366
#>
#> Term: avg. gender (2-1)
#> Type: probs
POMP
POMP (percent of maximum possible) are a for of standardized units for Likert-type items (see Solomon Kurz’s excellent blog post for more – better – info).
These are linear transformations of the ranks described above:
<- function(x) {
POMP sum_score(x) |>
mutate(
estimate = 100 * (estimate - 1) / (6 - 1)
)
}
predictions(fit, variables = "gender",
hypothesis = POMP)
#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> 40.4 1.166 34.6 <0.001 870.6 38.1 42.7
#> 30.2 0.902 33.5 <0.001 812.8 28.4 31.9
#> 39.4 1.107 35.6 <0.001 920.9 37.3 41.6
#> 29.3 0.829 35.3 <0.001 905.5 27.7 30.9
#> 39.9 1.135 35.1 <0.001 896.3 37.7 42.1
#> --- 5558 rows omitted. See ?print.marginaleffects ---
#> 24.7 0.609 40.6 <0.001 Inf 23.5 25.9
#> 33.3 0.945 35.3 <0.001 902.9 31.5 35.2
#> 23.9 0.608 39.3 <0.001 Inf 22.7 25.1
#> 25.2 1.289 19.5 <0.001 279.4 22.6 27.7
#> 17.1 0.914 18.7 <0.001 256.9 15.3 18.9
#> Term: sum score
#> Type: probs
And likewise get contrasts -
<- function(x) {
POMP.contr POMP(x) |>
summarise(
term = "gender (2-1)",
estimate = estimate[gender == "2"] - estimate[gender == "1"],
.by = rowidcf
)
}
predictions(fit, variables = "gender",
hypothesis = sum_score.contr)
#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> -0.511 0.0588 -8.68 <0.001 57.8 -0.626 -0.395
#> -0.506 0.0584 -8.67 <0.001 57.7 -0.621 -0.392
#> -0.509 0.0586 -8.68 <0.001 57.8 -0.623 -0.394
#> -0.509 0.0586 -8.68 <0.001 57.8 -0.623 -0.394
#> -0.509 0.0586 -8.68 <0.001 57.8 -0.623 -0.394
#> --- 2774 rows omitted. See ?print.marginaleffects ---
#> -0.504 0.0582 -8.67 <0.001 57.6 -0.618 -0.390
#> -0.483 0.0560 -8.62 <0.001 57.1 -0.593 -0.373
#> -0.477 0.0554 -8.61 <0.001 56.9 -0.586 -0.368
#> -0.471 0.0548 -8.59 <0.001 56.7 -0.578 -0.363
#> -0.403 0.0488 -8.26 <0.001 52.6 -0.499 -0.307
#> Term: gender (2-1)
#> Type: probs
And average contrasts -
# average contrast
<- function(x) {
avg_POMP.contr POMP.contr(x) |>
summarise(
term = "avg. gender (2-1)",
estimate = mean(estimate)
)
}
predictions(fit, variables = "gender",
hypothesis = avg_POMP.contr)
#>
#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
#> -9.49 1.1 -8.61 <0.001 56.9 -11.6 -7.33
#>
#> Term: avg. gender (2-1)
#> Type: probs
A note for Bayesians
Hey, Bayesians, how are you doing?
I have some bad news: {marginaleffects}
does not support hypothesis=<function>
.
But I also have some good news! You can basically do all of this by getting the posterior draws in an rvar
format, and then directly manipulating the posterior(s) - so the examples above should basically work out of the box.
For example:
library(brms)
library(posterior)
<- brm(A1 ~ age + gender,
fit_b data = bfi,
family = cumulative(),
prior = NULL) # obviously this is a bad prior
<- predictions(fit_b, variables = "gender") pr1_b
We just need to adapt the function(s) written above to work properly with rvar
s. I’ve marked
<- function(x) {
avg_POMP.contr_b |>
x arrange(rowidcf, group, gender) |>
# for each counterfactual row and level of gender
summarise(
term = "sum score",
# estimate = sum(estimate * (1:6)),
1estimate = rvar_sum(rvar * (1:6)),
.by = c(rowidcf, gender)
|>
) # POMP
mutate(
estimate = 100 * (estimate - 1) / (6 - 1)
|>
) # contrasts
summarise(
term = "gender (2-1)",
estimate = estimate[gender == "2"] - estimate[gender == "1"],
.by = rowidcf
|>
) # average
summarise(
term = "avg. gender (2-1)",
# estimate = mean(estimate)
2estimate = rvar_mean(estimate)
) }
- 1
-
Use the
rvar
column and thervar_sum()
function (instead ofsum()
) - 2
-
Use the
rvar_mean()
function (instead ofmean()
)
get_draws(pr1_b, shape = "rvar") |>
avg_POMP.contr_b()
#> term estimate
#> 1 avg. gender (2-1) -9.4 ± 1.1
Compare this to the frequentists estimate of -9.49, SE = 1.10 we got above.