{"id":19655,"date":"2024-09-21T05:00:00","date_gmt":"2024-09-21T05:00:00","guid":{"rendered":"https:\/\/www.jtrive.com\/posts\/introduction-to-metropolis-hastings\/introduction-to-metropolis-hastings.html"},"modified":"2024-09-21T05:00:00","modified_gmt":"2024-09-21T05:00:00","slug":"introduction-to-markov-chain-monte-carlo-the-metropolis-hastings-algorithm","status":"publish","type":"post","link":"https:\/\/python-bloggers.com\/2024\/09\/introduction-to-markov-chain-monte-carlo-the-metropolis-hastings-algorithm\/","title":{"rendered":"Introduction to Markov Chain Monte Carlo &#8211; The Metropolis-Hastings Algorithm"},"content":{"rendered":"<div style=\\\"border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;\\\">\r\n<i>This article was first published on  <strong>\r\n<a href=\"https:\/\/www.jtrive.com\/posts\/introduction-to-metropolis-hastings\/introduction-to-metropolis-hastings.html\"> The Pleasure of Finding Things Out: A blog by James Triveri <\/a><\/strong>, and kindly contributed to <a href=\/about\/>python-bloggers<\/a>.  (You can report issue about the content on this page <a href=\/contact-us\/>here<\/a>)\r\n<br\/>Want to share your content on python-bloggers?<a href=\/add-your-blog\/> click here<\/a>.<\/i>\r\n<\/div>\n\n<p>Markov Chain Monte Carlo (MCMC) is a class of algorithms used to sample from probability distributions when direct sampling is difficult or inefficient. It leverages Markov chains to explore the target distribution and Monte Carlo methods to perform repeated random sampling. MCMC algorithms are widely used in the insurance industry, particularly in areas involving risk assessment, pricing, reserving, and capital modeling. Markov Chain Monte Carlo is an alternative to rejection sampling, which can be inefficient when dealing with high-dimensional probability distributions. MCMC is considered a Bayesian approach to statistical inference since it incorporates both prior knowledge and observed data into the estimation of the posterior distribution.<\/p>\n<p>The <em>Metropolis-Hastings<\/em> algorithm is a method used to generate a sequence of samples from a probability distribution for which direct sampling might be difficult. It is a particiular variant of MCMC, which approximates a desired distribution by creating a chain of values that resemble samples drawn from that distribution. The algorithm generates a sequence of samples by proposing new candidates and deciding whether to accept or reject them based on a ratio of probabilities from the target distribution.<\/p>\n<p>Before getting into the details of Metropolis-Hastings, a few key definitions:<\/p>\n<p><strong>Likelihood<\/strong>: <br \/> The apriori assumption specifying the distribution from which the data are assumed to originate. For example, if we assume losses follow an exponential distribution within unknown parameter <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta\">, this is equivalent to specifying an exponential likelihood. Symbolically, the likelihood is represented as <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?f(x%7C%5Ctheta)\">.<\/p>\n<p><strong>Prior<\/strong>:<br \/> Sticking with the exponential likelihood example, once we\u2019ve proposed the likelihood, we need to specify a distribution for each parameter of the likelihood. In the case of the exponential there is only a single parameter, <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta\">. Typically when selecting prior distributions, it should have the same domain as the parameter itself. When parameterizing the exponential distribution, we know that <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?0%20%3C%20%5Ctheta%20%3C%20%5Cinfty\">, so the prior distribution should be valid on <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?(0,%20%5Cinfty)\">. Valid distributions for <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta\"> are gamma, lognormal, pareto, weibull, etc. Invalid distributions would be any discrete distribution or the normal distribution. Symbolically, the prior is represented as <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?f(%5Ctheta)\">.<\/p>\n<p><strong>Posterior<\/strong>: <br \/> This is the expression which encapsulates the power, simplicity and flexibility of the Bayesian approach and is given by:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%0A%5Cmathrm%7BPosterior%7D%20%5Cpropto%20%5Cmathrm%7BLikelihood%7D%20%5Ctimes%20%5Cmathrm%7BPrior%7D%0A\"><\/p>\n<p>The posterior is represented as <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?f(%5Ctheta%7Cx)\">, so the above expression becomes:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%0Af(%5Ctheta%7Cx)%20%5Cpropto%20f(x%7C%5Ctheta)%20f(%5Ctheta).%0A\"><\/p>\n<p>We can update the proportionality to direct equality by the inclusion of a normalizing constant, which ensures the expression on the RHS integrates to 1:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%0Af(%5Ctheta%7Cx)%20=%20%5Cfrac%7Bf(x%7C%5Ctheta)%20%20f(%5Ctheta)%7D%7Bf(x)%7D.%0A\"><\/p>\n<section id=\"metropolis-hastings-outline\" class=\"level3\">\n<h3 class=\"anchored\" data-anchor-id=\"metropolis-hastings-outline\">Metropolis-Hastings Outline<\/h3>\n<p>Suppose we have a collection of <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5C%7B%5Ctheta%5E%7B(1)%7D,%20%5Cdots%20%5Ctheta%5E%7B(s)%7D%5C%7D\">, to which we would like to add a new value <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta%5E%7B(s+1)%7D\">. We generate a sample from our transition kernel <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta%5E%7B*%7D\"> which is nearby <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta%5E%7B(s)%7D\">.<\/p>\n<ul>\n<li>If <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?f(%5Ctheta%5E%7B*%7D%7Cy)%20%3E%20f(%5Ctheta%5E%7B(s)%7D%7Cy)\">, then we should include <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta%5E%7B*%7D\"> with probability 1.\n<\/li>\n<li>If <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?f(%5Ctheta%5E%7B*%7D%7Cy)%20%3C%20f(%5Ctheta%5E%7B(s)%7D%7Cy)\">, we will include <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta%5E%7B*%7D\"> with probability determined by the acceptance ratio.<\/li>\n<\/ul>\n<p>For <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta%5E%7B*%7D\">, the posterior is given by<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%0Af(%5Ctheta%5E%7B*%7D%7Cy)%20%20=%20%5Cfrac%7Bf(y%7C%5Ctheta%5E%7B*%7D)%20f(%5Ctheta%5E%7B*%7D)%7D%7Bf(y)%7D,%0A\"><\/p>\n<p>and for <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta%5E%7B(s)%7D\">, the posterior is given by<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%0Af(%5Ctheta%5E%7B(s)%7D%7Cy)%20%20=%20%5Cfrac%7Bf(y%7C%5Ctheta%5E%7B(s)%7D)%20f(%5Ctheta%5E%7B(s)%7D)%7D%7Bf(y)%7D.%0A\"><\/p>\n<p><\/p>\n<p>Next, compute the acceptance ratio, <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Calpha\">, as <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Cfrac%7Bf(%5Ctheta%5E%7B*%7D%7Cy)%7D%7Bf(%5Ctheta%5E%7B(s)%7D%7Cy)%7D\">:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%0A%5Calpha%20=%20%5Cfrac%7Bf(%5Ctheta%5E%7B*%7D%7Cy)%7D%7Bf(%5Ctheta%5E%7B(s)%7D%7Cy)%7D%20=%20%5Cfrac%7Bf(y%7C%5Ctheta%5E%7B*%7D)%20f(%5Ctheta%5E%7B*%7D)%7D%7Bf(y)%7D%20%5Ctimes%20%5Cfrac%7Bf(y)%7D%7Bf(y%7C%5Ctheta%5E%7B(s)%7D)%20f(%5Ctheta%5E%7B(s)%7D)%7D%20=%20%5Cfrac%7Bf(y%7C%5Ctheta%5E%7B*%7D)%20f(%5Ctheta%5E%7B*%7D)%7D%7Bf(y%7C%5Ctheta%5E%7B(s)%7D)%20f(%5Ctheta%5E%7B(s)%7D)%7D.%0A\"> <\/p>\n<ul>\n<li>\n<p>If <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Calpha\"> &gt;= 1: We add <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta%5E%7B*%7D\"> to our collection of samples, since it has a higher likelihood than <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta%5E%7B(s)%7D\">. Set <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta%5E%7B(s%20+%201)%7D%20=%20%5Ctheta%5E%7B*%7D\">.<\/p>\n<\/li>\n<li>\n<p>If <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Calpha\"> &lt; 1: Set <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta%5E%7B(s%20+%201)%7D%20=%20%5Ctheta%5E%7B*%7D\"> with probability <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Calpha\">.<\/p>\n<\/li>\n<\/ul>\n<p>Notice that the acceptance ratio is calculated without needing to compute the normalizing constant <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?f(y)\">, which can be difficult to do, especially in high-dimensional settings. This is the power of Metropolis-Hastings and MCMC in general: It provides a way to approximate the posterior distribution by generating samples from it without direct calculation of the normalizing constant.<\/p>\n<p><\/p>\n<p>Metropolis-Hastings accept-reject logic can be summarized in three steps:<\/p>\n<ol type=\"1\">\n<li>Generate sample from proposal distribution \/ transition kernel <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta%5E%7B*%7D%20%5Csim%20J(%5Ctheta%7C%5Ctheta%5E%7B(s)%7D)\">.<\/li>\n<li>Compute acceptance ratio <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Calpha%20=%20%5Cfrac%7Bf(y%7C%5Ctheta%5E%7B*%7D)%20f(%5Ctheta%5E%7B*%7D)%7D%7Bf(y%7C%5Ctheta%5E%7B(s)%7D)%20f(%5Ctheta%5E%7B(s)%7D)%7D\">.<\/li>\n<li>Sample <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?u%20%5Csim%20%5Cmathrm%7Buniform%7D(0,%201)\">.\n<ul>\n<li>If <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Calpha%20%5Cgeq%20u\">, set <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta%5E%7B(s%20+%201)%7D%20=%20%5Ctheta%5E%7B*%7D\">.<\/li>\n<li>If <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Calpha%20%3C%20u\">, set <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta%5E%7B(s%20+%201)%7D%20=%20%5Ctheta%5E%7B(s)%7D\">.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<p><\/p>\n<\/section>\n<section id=\"conjugate-priors\" class=\"level3\">\n<h3 class=\"anchored\" data-anchor-id=\"conjugate-priors\">Conjugate Priors<\/h3>\n<p><a href=\"https:\/\/www.jtrive.com\/posts\/introduction-to-metropolis-hastings\/introduction-to-metropolis-hastings.html\">Conjugate priors<\/a> are a class of prior distributions in Bayesian statistics that result in a posterior distribution that has the same functional form as the prior when combined with a particular likelihood function. This makes the posterior distribution easier to compute and analyze, as it remains within the same family of distributions as the prior. For example, if we select an exponential likelihood with a gamma prior, the posterior distribution is also gamma, with a specified parameterization.<\/p>\n<p>Further, many of these conjugate priors have analytical expressions for the posterior predictive distribution, which represents the modeled target output of our analysis. We can use conjugate prior relationships as a means to validate the output of our MCMC sampler.<\/p>\n<p><\/p>\n<p><\/p>\n<\/section>\n<section id=\"example-conjugate-normal-normal-model-with-known-variance\" class=\"level3\">\n<h3 class=\"anchored\" data-anchor-id=\"example-conjugate-normal-normal-model-with-known-variance\">Example: Conjugate Normal-Normal Model with Known Variance<\/h3>\n<p>Let\u2019s start with a simple example where we assume a model with normal likelihood and prior (adapted from Chapter 10 of <em>A First Course in Bayesian Statistical Methods<\/em> by Peter Hoff). Assume:<\/p>\n<ul>\n<li><img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5C%7By_%7B1%7D,%20%5Cdots,%20y_%7Bn%7D%5C%7D%20%5Csim%20%5Cmathcal%7BN%7D(%5Cmu,%20%5Csigma%5E%7B2%7D)\">.<\/li>\n<li><img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Cmu%20%5Csim%20%5Cmathcal%7BN%7D(%5Cmu_%7B0%7D,%20%5Csigma%5E%7B2%7D_%7B0%7D)\"><\/li>\n<li><img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Csigma%5E%7B2%7D%20=%201\"><\/li>\n<li><img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Cmu_%7B0%7D%20=%205\"><\/li>\n<li><img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Csigma%5E%7B2%7D_%7B0%7D%20=%2010\"><\/li>\n<li><img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?y%20=%20(9.37,%2010.18,%209.16,%2011.60,%2010.33)\"><\/li>\n<\/ul>\n<p> Because this model is conjugate, we have analytical expressions for the posterior parameters:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%0A%5Cmu_%7B0%7D%5E%7B'%7D%20=%20%5Cfrac%7B1%7D%7B%5Cfrac%7B1%7D%7B%5Csigma%5E%7B2%7D_%7B0%7D%7D%20+%20%5Cfrac%7Bn%7D%7B%5Csigma%5E%7B2%7D%7D%7D%20%5CBigg(%5Cfrac%7B%5Cmu_%7B0%7D%7D%7B%5Csigma%5E%7B2%7D_%7B0%7D%7D%20+%20%5Cfrac%7B%5Csum_%7Bi=1%7D%5E%7Bn%7D%20y_%7Bi%7D%7D%7B%5Csigma%5E%7B2%7D%7D%20%5CBigg);%20%5Chspace%7B.50em%7D%20%7B%5Csigma%5E%7B2%7D_%7B0%7D%7D%5E%7B'%7D%20=%20%5CBigg(%20%5Cfrac%7B1%7D%7B%5Csigma%5E%7B2%7D_%7B0%7D%7D%20+%20%5Cfrac%7Bn%7D%7B%5Csigma%5E%7B2%7D%7D%20%5CBigg)%5E%7B-1%7D%0A\"><\/p>\n<p><\/p>\n<p>We can compute these quantities for later reference:<\/p>\n<div id=\"cell-2\" class=\"cell\" data-tags=\"[]\" data-execution_count=\"1\">\n<pre>\n# Compute posterior mean and variance using closed-form expressions. \ny = [9.37, 10.18, 9.16, 11.60, 10.33]\ns2 = 1\nmu_prior = 5\ns2_prior = 10\nn = len(y)\n\nmu_posterior = (1 \/ (1 \/ s2_prior + n \/ s2)) * (mu_prior \/  s2_prior + sum(y) \/ s2)\ns2_posterior  = 1 \/ (1 \/ s2_prior + n \/ s2)\n\nprint(f\"mu_prior     : {mu_prior:.3f}\")\nprint(f\"s2_prior     : {s2_prior:.3f}\")\nprint(f\"mu_posterior : {mu_posterior:.3f}\")\nprint(f\"s2_posterior : {s2_posterior:.3f}\")<\/pre>\n<div class=\"cell-output cell-output-stdout\">\n<pre>mu_prior     : 5.000\ns2_prior     : 10.000\nmu_posterior : 10.027\ns2_posterior : 0.196<\/pre>\n<\/div>\n<\/div>\n<p>The posterior distribution parameter estimates have been updated in the direction of the data. Next imagine a scenario in which closed form expressions for posterior parameters did not exist, and it was necessary to use Metropolis-Hastings to approximate the posterior. The acceptance ratio comparing <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta%5E%7B*%7D\"> to <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Ctheta%5E%7B(s)%7D\"> is:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%0A%5Calpha%20=%20%5Cfrac%7Bf(%5Ctheta%5E%7B*%7D%7Cy)%7D%7Bf(%5Ctheta%5E%7B(s)%7D%7Cy)%7D%20=%20%5CBigg(%5Cfrac%7B%5Cprod_%7Bi=1%7D%5E%7Bn%7D%20%5Cmathrm%7Bdnorm%7D(y_%7Bi%7D,%20%5Ctheta%5E%7B*%7D,%20%5Csigma)%7D%7B%5Cprod_%7Bi=1%7D%5E%7Bn%7D%20%5Cmathrm%7Bdnorm%7D(y_%7Bi%7D,%20%5Ctheta%5E%7B(s)%7D,%20%5Csigma)%7D%5CBigg)%20%5Ctimes%20%5CBigg(%5Cfrac%7B%5Cmathrm%7Bdnorm%7D(%5Ctheta%5E%7B*%7D,%20%5Cmu_%7B0%7D,%20%5Csigma_%7B0%7D)%7D%7B%5Cmathrm%7Bdnorm%7D(%5Ctheta%5E%7B(s)%7D,%20%5Cmu_%7B0%7D,%20%5Csigma_%7B0%7D)%7D%5CBigg).%0A\"><\/p>\n<p><\/p>\n<p>An implementation of Metropolis-Hastings to recover the posterior mean is provided below.<\/p>\n<div id=\"cell-4\" class=\"cell\" data-execution_count=\"2\">\n<pre>\"\"\"\nImplementation of Metropolis-Hastings algorithm for normal likelihood \nand normal prior with known variance.\nGoal is to recover the posterior distribution of the unknown parameter mu. \n\"\"\"\nimport numpy as np\nfrom scipy.stats import norm\n\nrng = np.random.default_rng(516)\n\n\ny = [9.37, 10.18, 9.16, 11.60, 10.33]\n\nnbr_samples = 10000  # Number of samples to generate.\ns = 1                # Standard deviation of likelihood.\ns0 = 10**.5          # Prior standard deviation.\nmu0 = 5              # Prior mean.\ns_prop = 2           # Standard deviation of proposal distribution \/ transition kernel.\n\n# Array to hold posterior samples, initialized with prior mean.\nsamples = np.zeros(nbr_samples)\n\n# Initialize prior density.\nprior = norm(loc=mu0, scale=s0)\n\n# Track the number of accepted samples. \naccepted = 0\n\nfor ii in range(1, nbr_samples):\n\n    # Get most recently accepted sample.\n    theta = samples[ii - 1]\n\n    # Generate sample from proposal distribution.\n    theta_star = rng.normal(loc=theta, scale=s_prop)\n\n    # Compute numerator and denominator of acceptance ratio.\n    numer = np.prod(norm(loc=theta_star, scale=s).pdf(y)) * prior.pdf(theta_star)\n    denom = np.prod(norm(loc=theta, scale=s).pdf(y)) * prior.pdf(theta)\n    ar = numer \/ denom\n\n    # Generate random uniform sample.\n    u = rng.uniform(low=0, high=1)\n    \n    # Check whether theta_star should be added to samples by comparing ar with u.\n    if ar &gt;= u:\n        theta = theta_star\n        accepted+=1\n\n    # Update samples array.\n    samples[ii] = theta\n\n    if ii % 1000 == 0:\n        print(f\"{ii}: theta_star: {theta_star:.5f}, ar: {ar:.5f}, curr_rate: {accepted \/ ii:.5f}\")\n\nacc_rate = accepted \/ nbr_samples\n\nprint(f\"\\nAcceptance rate     : {acc_rate:.3f}.\")\nprint(f\"Posterior mean (mh) : {samples.mean():.5f}.\")\nprint(f\"Posterior mean (cp) : {mu_posterior:.5f}.\")<\/pre>\n<div class=\"cell-output cell-output-stdout\">\n<pre>1000: theta_star: 9.35597, ar: 1.08038, curr_rate: 0.24600\n2000: theta_star: 13.01504, ar: 0.00000, curr_rate: 0.25900\n3000: theta_star: 9.28924, ar: 0.27716, curr_rate: 0.25567\n4000: theta_star: 8.22652, ar: 0.00034, curr_rate: 0.26200\n5000: theta_star: 6.09725, ar: 0.00000, curr_rate: 0.26440\n6000: theta_star: 10.27182, ar: 2.13374, curr_rate: 0.26483\n7000: theta_star: 10.73066, ar: 0.30604, curr_rate: 0.26414\n8000: theta_star: 12.76409, ar: 0.00000, curr_rate: 0.26637\n9000: theta_star: 8.88424, ar: 0.03592, curr_rate: 0.26522\n\nAcceptance rate     : 0.267.\nPosterior mean (mh) : 10.03768.\nPosterior mean (cp) : 10.02745.<\/pre>\n<\/div>\n<\/div>\n<p>Generally the acceptance rate should fall between 20%-40%, so our result seems reasonable, if not a little on the low side.<\/p>\n<p>In the Metropolis-Hastings update step, we compute the product of many potentially small numbers to determine the acceptance ratio, which can be numerically unstable. We can instead compute the log of the RHS of the acceptance ratio, which will result in more stability especially as the number of data points increases. The posterior estimates will be no different, but we reduce the chance of numerical underflow by replacing the product with a sum. The update step using the log basis is given below:<\/p>\n<div id=\"cell-6\" class=\"cell\" data-execution_count=\"3\">\n<pre>\"\"\"\nImplementation of Metropolis-Hastings algorithm for Normal likelihood \nand Normal prior with known variance.\nGoal is to recover the posterior distribution of the unknown parameter mu. \n\"\"\"\nimport numpy as np\nfrom scipy.stats import norm\n\nrng = np.random.default_rng(516)\n\n\ny = [9.37, 10.18, 9.16, 11.60, 10.33]\n\nnbr_samples = 10000  # Number of samples to generate.\ns = 1                # Standard deviation of likelihood.\ns0 = 10**.5          # Prior standard deviation.\nmu0 = 5              # Prior mean.\ns_prop = 2           # Standard deviation of proposal distribution \/ transition kernel.\n\n# Array to hold posterior samples, initialized with prior mean.\nsamples = np.zeros(nbr_samples)\n\n# Initialize prior density.\nprior = norm(loc=mu0, scale=s0)\n\n# Track the number of accepted samples. \naccepted = 0\n\nfor ii in range(1, nbr_samples):\n\n    # Get most recently accepted sample.\n    theta = samples[ii - 1]\n\n    # Generate sample from proposal distribution.\n    theta_star = rng.normal(loc=theta, scale=s_prop)\n\n    # Compute numerator and denominator of acceptance ratio using log basis. \n    ar = (np.sum(norm(loc=theta_star, scale=s).logpdf(y)) + prior.logpdf(theta_star)) -\\\n         (np.sum(norm(loc=theta, scale=s).logpdf(y)) + prior.logpdf(theta))\n\n    # Generate random uniform sample.\n    u = rng.uniform(low=0, high=1)\n    \n    # Check whether theta_star should be added to samples by comparing a with u.\n    if ar &gt;= np.log(u):\n        theta = theta_star\n        accepted+=1\n\n    # Update samples array.\n    samples[ii] = theta\n\n    if ii % 1000 == 0:\n        print(f\"{ii}: theta_star: {theta_star:.5f}, ar: {ar:.5f}, curr_rate: {accepted \/ ii:.5f}\")\n\n\nacc_rate = accepted \/ nbr_samples\n\nprint(f\"\\nAcceptance rate   : {acc_rate:.3f}.\")\nprint(f\"Posterior mean (mh) : {samples.mean():.5f}.\")\nprint(f\"Posterior mean (cp) : {mu_posterior:.5f}.\")<\/pre>\n<div class=\"cell-output cell-output-stdout\">\n<pre>1000: theta_star: 9.35597, ar: 0.07731, curr_rate: 0.24600\n2000: theta_star: 13.01504, ar: -20.59962, curr_rate: 0.25900\n3000: theta_star: 9.28924, ar: -1.28315, curr_rate: 0.25567\n4000: theta_star: 8.22652, ar: -7.99570, curr_rate: 0.26200\n5000: theta_star: 6.09725, ar: -39.02023, curr_rate: 0.26440\n6000: theta_star: 10.27182, ar: 0.75788, curr_rate: 0.26483\n7000: theta_star: 10.73066, ar: -1.18404, curr_rate: 0.26414\n8000: theta_star: 12.76409, ar: -18.37748, curr_rate: 0.26637\n9000: theta_star: 8.88424, ar: -3.32637, curr_rate: 0.26522\n\nAcceptance rate   : 0.267.\nPosterior mean (mh) : 10.03768.\nPosterior mean (cp) : 10.02745.<\/pre>\n<\/div>\n<\/div>\n<p>As expected, this aligns with the original non-log basis results.<\/p>\n<p>We can visualize the distribution of posterior samples as well as the traceplot. Traceplots are graphical tools used to diagnose the convergence and mixing of MCMC simulations. They help assess whether the algorithm has properly explored the target distribution and whether the samples are representative of the posterior distribution.<\/p>\n<div id=\"cell-8\" class=\"cell\" data-execution_count=\"4\">\n<pre>\nimport matplotlib as mpl\nimport matplotlib.pyplot as plt\n\n\nfig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10, 5), tight_layout=True) \n\nax[0].set_title(\"Posterior samples (normal likelihood)\", color=\"#000000\", loc=\"center\", fontsize=9)\nax[0].hist(\n    samples, 50, density=True, alpha=1, color=\"#ff7595\", \n    edgecolor=\"#FFFFFF\", linewidth=1.0\n    )\n    \nax[0].axvline(mu_posterior, color=\"#000000\", linewidth=1.5, linestyle=\"--\", label=r\"$\\mu^{'}$\")\nax[0].set_yticklabels([])\nax[0].set_xlabel(\"\")\nax[0].set_ylabel(\"\")\nax[0].set_xlim(8)\nax[0].tick_params(axis=\"x\", which=\"major\", direction='in', labelsize=7)\nax[0].tick_params(axis=\"x\", which=\"minor\", direction='in', labelsize=7)\nax[0].tick_params(axis=\"y\", which=\"major\", direction='in', labelsize=7)\nax[0].tick_params(axis=\"y\", which=\"minor\", direction='in', labelsize=7)\nax[0].xaxis.set_ticks_position(\"none\")\nax[0].yaxis.set_ticks_position(\"none\")\nax[0].grid(True)   \nax[0].set_axisbelow(True) \nax[0].legend(loc=\"upper right\", fancybox=True, framealpha=1, fontsize=\"small\")\n\nax[1].set_title(\"Traceplot\", color=\"#000000\", loc=\"center\", fontsize=9)\nax[1].plot(samples, color=\"#000000\", linewidth=.25, linestyle=\"-\")\nax[1].xaxis.set_major_formatter(mpl.ticker.StrMethodFormatter(\"{x:,.0f}\"))\nax[1].set_xlabel(\"sample\", fontsize=9)\nax[1].set_ylabel(r\"$\\hat \\mu$\")\nax[1].tick_params(axis=\"x\", which=\"major\", direction='in', labelsize=7)\nax[1].tick_params(axis=\"y\", which=\"major\", direction='in', labelsize=7)\nax[1].xaxis.set_ticks_position(\"none\")\nax[1].yaxis.set_ticks_position(\"none\")\n\nplt.show()<\/pre>\n<div class=\"cell-output cell-output-display\">\n<div>\n<figure class=\"figure\">\n<p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/www.jtrive.com\/posts\/introduction-to-metropolis-hastings\/introduction-to-metropolis-hastings_files\/figure-html\/cell-5-output-1.png?w=578&#038;ssl=1\" class=\"img-fluid figure-img\"><\/p>\n<\/figure>\n<\/div>\n<\/div>\n<\/div>\n<p>Notice in the traceplot that even though we started far from the estimated posterior mean, it made little difference, as the algorithm was able to quickly zero in on the region of higher likelihood. You want to see that the samples have stabilized around a certain value after an initial \u201cburn-in\u201d period. If the trace shows significant fluctuations without settling, it may indicate that the chain has not yet converged. This is not the case with our samples.<\/p>\n<p><\/p>\n<\/section>\n<section id=\"severity-modeling\" class=\"level2\">\n<h2 class=\"anchored\" data-anchor-id=\"severity-modeling\">Severity Modeling<\/h2>\n<p>MCMC approaches can be leveraged to estimate severity or size-of-loss curves for a given line of business based on past claim history. Severity estimates are used in multiple actuarial contexts, especially reserving and capital modeling. Imagine we have loss data we believe originates from an exponential distribution with unknown rate parameter:<\/p>\n<blockquote class=\"blockquote\">\n<p>266, 934, 138<\/p>\n<\/blockquote>\n<p>We again assume a conjugate relationship between prior and posterior distributions:<\/p>\n<ul>\n<li>\n<p><strong>Likelihood<\/strong>: Losses are exponentially distributed with unknown rate <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Clambda\">.<\/p>\n<\/li>\n<li>\n<p><strong>Prior<\/strong>: Gamma with <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Calpha_%7B0%7D\">, <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Cbeta_%7B0%7D\">.<\/p>\n<\/li>\n<li>\n<p><strong>Posterior<\/strong>: Gamma with <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Calpha%5E%7B'%7D%20=%20%5Calpha_%7B0%7D%20+%20n\"> and <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Cbeta%5E%7B'%7D%20=%20%5Cbeta_%7B0%7D%20+%20%5Csum_%7Bi=1%7D%5E%7Bn%7D%20x_%7Bi%7D\">.<\/p>\n<\/li>\n<li>\n<p><strong>Posterior predictive<\/strong>: Lomax (shifted Pareto with support beginning at zero) with <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Calpha%5E%7B'%7D,%20%5Cbeta%5E%7B'%7D\">. The expected value of the posterior predictive distribution is <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Cfrac%7B%5Cbeta%5E%7B'%7D%7D%7B%5Calpha%5E%7B'%7D%20-%201%7D\">.<\/p>\n<\/li>\n<\/ul>\n<p>We judgmentally set <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Calpha_%7B0%7D%20=%202\"> and <img decoding=\"async\" src=\"https:\/\/latex.codecogs.com\/png.latex?%5Cbeta_%7B0%7D%20=%201,000\">. Prior and posterior predictive means are computed in the next cell.<\/p>\n<div id=\"cell-11\" class=\"cell\" data-execution_count=\"5\">\n<pre>\ny = [266, 934, 138]\n\n# Judgmentally select a0 and b0.\na0 = 2\nb0 = 1000\nn = len(y)\na_posterior = a0 + n\nb_posterior = b0 + np.sum(y)\nprior_mean = a0 * b0\npost_pred_mean = b_posterior \/ (a_posterior - 1)\n\nprint(f\"a0              : {a0}\")\nprint(f\"b0              : {b0}\")\nprint(f\"a_posterior     : {a_posterior}.\")\nprint(f\"b_posterior     : {b_posterior}.\")\nprint(f\"Empirical mean  : {np.mean(y)}\")\nprint(f\"Prior mean      : {prior_mean}\")\nprint(f\"Post. pred. mean: {post_pred_mean:.2f}\")<\/pre>\n<div class=\"cell-output cell-output-stdout\">\n<pre>a0              : 2\nb0              : 1000\na_posterior     : 5.\nb_posterior     : 2338.\nEmpirical mean  : 446.0\nPrior mean      : 2000\nPost. pred. mean: 584.50<\/pre>\n<\/div>\n<\/div>\n<p><\/p>\n<p>Using Metropolis-Hastings, the mean of generated samples should match the posterior predictive mean obtained from the analytical expression (584.50 above). Adapting the sampling code from the previous model, an exponential distribution is used to generate proposals, since the exponential scale parameter must be strictly greater than 0. We have the following:<\/p>\n<div id=\"cell-13\" class=\"cell\" data-execution_count=\"6\">\n<pre>\"\"\"\nImplementation of Metropolis-Hastings algorithm for exponential likelihood\nwith gamma prior.\nGoal is to recover the posterior distribution of the unknown parameter lambda. \n\"\"\"\nimport numpy as np\nfrom scipy.stats import expon, norm, gamma, lomax\n\nrng = np.random.default_rng(516)\n\nnbr_samples = 10000\ny = [266, 934, 138]\na0 = 2\nb0 = 1000\n\n\n# Array to hold posterior samples, initialized with prior mean.\nsamples = np.zeros(nbr_samples)\nsamples[0] = np.mean(y)\n\n# Initialize prior density.\nprior = gamma(a=a0, loc=0, scale=b0)\n\n# Track the number of accepted samples. \naccepted = 0\n\nfor ii in range(1, nbr_samples):\n\n    # Get most recently accepted sample.\n    theta = samples[ii - 1]\n\n    # Generate sample from proposal distribution.\n    theta_star = rng.exponential(scale=theta)\n\n    # Compute numerator and denominator of acceptance ratio.\n    numer = np.prod(expon(scale=theta_star).pdf(y)) * prior.pdf(theta_star)\n    denom = np.prod(expon(scale=theta).pdf(y)) * prior.pdf(theta)\n    ar = numer \/ denom\n\n    # Generate random uniform sample.\n    u = rng.uniform(low=0, high=1)\n    \n    # Check whether theta_star should be added to samples by comparing ar with u.\n    if ar &gt;= u:\n        theta = theta_star\n        accepted+=1\n\n    # Update samples array.\n    samples[ii] = theta\n\n    if ii % 1000 == 0:\n        print(f\"{ii}: theta_star: {theta_star:.5f}, ar: {ar:.5f}, curr_rate: {accepted \/ ii:.5f}\")\n\nacc_rate = accepted \/ nbr_samples\n\nprint(f\"\\nAcceptance rate    : {acc_rate:.3f}.\")\nprint(f\"Posterior sample mean: {samples.mean():.3f}.\")<\/pre>\n<div class=\"cell-output cell-output-stdout\">\n<pre>1000: theta_star: 82.99100, ar: 0.00008, curr_rate: 0.45900\n2000: theta_star: 544.00186, ar: 1.07857, curr_rate: 0.46450\n3000: theta_star: 246.98682, ar: 0.33871, curr_rate: 0.47433\n4000: theta_star: 73.75990, ar: 0.00002, curr_rate: 0.47900\n5000: theta_star: 451.83333, ar: 0.97910, curr_rate: 0.47880\n6000: theta_star: 607.07687, ar: 1.45090, curr_rate: 0.47833\n7000: theta_star: 1509.11070, ar: 0.74937, curr_rate: 0.48186\n8000: theta_star: 677.30894, ar: 1.04365, curr_rate: 0.48187\n9000: theta_star: 740.90659, ar: 1.58914, curr_rate: 0.48156\n\nAcceptance rate    : 0.481.\nPosterior sample mean: 586.353.<\/pre>\n<\/div>\n<\/div>\n<p><\/p>\n<p>Visualizing the histogram of posterior samples along with the traceplot:<\/p>\n<div id=\"cell-15\" class=\"cell\" data-execution_count=\"8\">\n<pre>\nfig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10, 5), tight_layout=True) \n\nax[0].set_title(\n    \"Posterior samples (exponential likelihood)\", \n    color=\"#000000\", loc=\"center\", fontsize=9\n)\nax[0].hist(\n    samples, 50, density=True, alpha=1, color=\"#ff7595\", \n    edgecolor=\"#FFFFFF\", linewidth=1.0\n)\nax[0].axvline(\n    samples.mean(), color=\"#000000\", linewidth=1.5, linestyle=\"--\", \n    label=\"posterior mean\"\n)\n\nax[0].set_xlabel(\"\")\nax[0].set_ylabel(\"\")\nax[0].set_xlim(0, 2500)\nax[0].xaxis.set_major_formatter(mpl.ticker.StrMethodFormatter(\"{x:,.0f}\"))\nax[0].tick_params(axis=\"x\", which=\"major\", direction='in', labelsize=7)\nax[0].tick_params(axis=\"x\", which=\"minor\", direction='in', labelsize=7)\nax[0].tick_params(axis=\"y\", which=\"major\", direction='in', labelsize=7)\nax[0].tick_params(axis=\"y\", which=\"minor\", direction='in', labelsize=7)\nax[0].xaxis.set_ticks_position(\"none\")\nax[0].yaxis.set_ticks_position(\"none\")\nax[0].grid(True)   \nax[0].set_axisbelow(True) \nax[0].legend(loc=\"upper right\", fancybox=True, framealpha=1, fontsize=\"medium\")\n\nax[1].set_title(\"Traceplot\", color=\"#000000\", loc=\"center\", fontsize=9)\nax[1].plot(samples, color=\"#000000\", linewidth=.25, linestyle=\"-\")\nax[1].xaxis.set_major_formatter(mpl.ticker.StrMethodFormatter(\"{x:,.0f}\"))\nax[1].set_xlabel(\"sample\", fontsize=9)\nax[1].set_ylabel(r\"$\\hat \\beta$\")\nax[1].tick_params(axis=\"x\", which=\"major\", direction='in', labelsize=7)\nax[1].tick_params(axis=\"y\", which=\"major\", direction='in', labelsize=7)\nax[1].xaxis.set_ticks_position(\"none\")\nax[1].yaxis.set_ticks_position(\"none\")\n\nplt.show()<\/pre>\n<div class=\"cell-output cell-output-display\">\n<div>\n<figure class=\"figure\">\n<p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/www.jtrive.com\/posts\/introduction-to-metropolis-hastings\/introduction-to-metropolis-hastings_files\/figure-html\/cell-8-output-1.png?w=578&#038;ssl=1\" class=\"img-fluid figure-img\"><\/p>\n<\/figure>\n<\/div>\n<\/div>\n<\/div>\n<p><\/p>\n<p>The distribution of posterior samples resembles a gamma distribution, which we expect.<\/p>\n<p>Next, to generate posterior predictive samples, we randomly sample from an exponential distribution parameterized using each scale parameter. This is accomplished in the next cell:<\/p>\n<div id=\"cell-17\" class=\"cell\" data-execution_count=\"9\">\n<pre>\"\"\"\nGenerate posterior predictive samples, one random draw per posterior scale sample.\n\"\"\"\n\npost_pred_samples = rng.exponential(scale=samples)\n\nprint(f\"Posterior predictive mean (cp): {post_pred_mean:.5f}\")\nprint(f\"Posterior predictive mean (mh): {post_pred_samples.mean():.5f}\")<\/pre>\n<div class=\"cell-output cell-output-stdout\">\n<pre>Posterior predictive mean (cp): 584.50000\nPosterior predictive mean (mh): 585.71307<\/pre>\n<\/div>\n<\/div>\n<p><\/p>\n<p>We can overlay the posterior predictive distribution with the histogram of posterior predictive samples and assess how well they match:<\/p>\n<div id=\"cell-19\" class=\"cell\" data-execution_count=\"10\">\n<pre>\nimport matplotlib as mpl\nimport matplotlib.pyplot as plt\n\npp = lomax(c=a_posterior, scale=b_posterior)\nxx = np.linspace(0, pp.ppf(.995), 1000)\nyy = pp.pdf(xx)\n\n\nfig, ax = plt.subplots(nrows=1, ncols=1, figsize=(8, 5.5), tight_layout=True) \n\nax.set_title(\"Posterior predictive samples with analytical density\", color=\"#000000\", loc=\"center\", fontsize=10)\n\nax.hist(\n    post_pred_samples, 100, density=True, alpha=1, color=\"#ff7595\", \n    edgecolor=\"#FFFFFF\", linewidth=1.0, label=\"Posterior predictive samples\"\n)\nax.plot(\n    xx, yy, alpha=1, color=\"#000000\", linewidth=1.5, linestyle=\"--\", \n    label=r\"$\\mathrm{Lomax}($\" + f\"{a_posterior:.0f}, {b_posterior:.0f}\" + r\"$)$\"\n)\nax.axvline(\n    post_pred_mean, color=\"green\", linewidth=1.5, linestyle=\"-.\", \n    label=\"posterior preditive mean\"\n)\n\nax.set_xlim(0, 5000)\nax.set_xlabel(\"\")\nax.set_ylabel(\"\")\nax.tick_params(axis=\"x\", which=\"major\", direction='in', labelsize=7)\nax.tick_params(axis=\"x\", which=\"minor\", direction='in', labelsize=7)\nax.tick_params(axis=\"y\", which=\"major\", direction='in', labelsize=6)\nax.tick_params(axis=\"y\", which=\"minor\", direction='in', labelsize=6)\nax.xaxis.set_major_formatter(mpl.ticker.StrMethodFormatter(\"{x:,.0f}\"))\nax.xaxis.set_ticks_position(\"none\")\nax.yaxis.set_ticks_position(\"none\")\nax.grid(True)   \nax.set_axisbelow(True) \nax.legend(loc=\"upper right\", fancybox=True, framealpha=1, fontsize=\"medium\")\nplt.show()<\/pre>\n<div class=\"cell-output cell-output-display\">\n<div>\n<figure class=\"figure\">\n<p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/www.jtrive.com\/posts\/introduction-to-metropolis-hastings\/introduction-to-metropolis-hastings_files\/figure-html\/cell-10-output-1.png?w=578&#038;ssl=1\" class=\"img-fluid figure-img\"><\/p>\n<\/figure>\n<\/div>\n<\/div>\n<\/div>\n<p><\/p>\n<p>The plot shows the two distributions align well. Finally, we can compare quantiles of our posterior predictive samples with the analytical density to see how well they agree in the extreme left and right tails:<\/p>\n<div id=\"cell-21\" class=\"cell\" data-execution_count=\"11\">\n<pre>\nimport pandas as pd\n\nq = [.01, .025, .05, .10, .25, .50, .75, .90, .95, .99, .995, .999]\n\ndf = pd.DataFrame({\n    \"q\": q,\n    \"cp\": pp.ppf(q), \n    \"mh\": np.quantile(post_pred_samples, q)\n})\n\ndf[\"error\"] = 100 * (df[\"cp\"] - df[\"mh\"]) \/ df[\"cp\"] \n\ndf.head(12)<\/pre>\n<div class=\"cell-output cell-output-display\" data-execution_count=\"11\">\n<div>\n<table class=\"dataframe caption-top table table-sm table-striped small\" data-quarto-postprocess=\"true\" data-border=\"1\">\n<thead>\n<tr class=\"header\">\n<th data-quarto-table-cell-role=\"th\"><\/th>\n<th data-quarto-table-cell-role=\"th\">q<\/th>\n<th data-quarto-table-cell-role=\"th\">cp<\/th>\n<th data-quarto-table-cell-role=\"th\">mh<\/th>\n<th data-quarto-table-cell-role=\"th\">error<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr class=\"odd\">\n<td data-quarto-table-cell-role=\"th\">0<\/td>\n<td>0.010<\/td>\n<td>4.704263<\/td>\n<td>4.841719<\/td>\n<td>-2.921931<\/td>\n<\/tr>\n<tr class=\"even\">\n<td data-quarto-table-cell-role=\"th\">1<\/td>\n<td>0.025<\/td>\n<td>11.868630<\/td>\n<td>12.136623<\/td>\n<td>-2.257991<\/td>\n<\/tr>\n<tr class=\"odd\">\n<td data-quarto-table-cell-role=\"th\">2<\/td>\n<td>0.050<\/td>\n<td>24.108192<\/td>\n<td>23.368946<\/td>\n<td>3.066368<\/td>\n<\/tr>\n<tr class=\"even\">\n<td data-quarto-table-cell-role=\"th\">3<\/td>\n<td>0.100<\/td>\n<td>49.789318<\/td>\n<td>47.340357<\/td>\n<td>4.918647<\/td>\n<\/tr>\n<tr class=\"odd\">\n<td data-quarto-table-cell-role=\"th\">4<\/td>\n<td>0.250<\/td>\n<td>138.465340<\/td>\n<td>135.870227<\/td>\n<td>1.874197<\/td>\n<\/tr>\n<tr class=\"even\">\n<td data-quarto-table-cell-role=\"th\">5<\/td>\n<td>0.500<\/td>\n<td>347.656754<\/td>\n<td>347.845394<\/td>\n<td>-0.054260<\/td>\n<\/tr>\n<tr class=\"odd\">\n<td data-quarto-table-cell-role=\"th\">6<\/td>\n<td>0.750<\/td>\n<td>747.009495<\/td>\n<td>736.928440<\/td>\n<td>1.349522<\/td>\n<\/tr>\n<tr class=\"even\">\n<td data-quarto-table-cell-role=\"th\">7<\/td>\n<td>0.900<\/td>\n<td>1367.480284<\/td>\n<td>1387.431296<\/td>\n<td>-1.458962<\/td>\n<\/tr>\n<tr class=\"odd\">\n<td data-quarto-table-cell-role=\"th\">8<\/td>\n<td>0.950<\/td>\n<td>1918.479107<\/td>\n<td>1958.620885<\/td>\n<td>-2.092375<\/td>\n<\/tr>\n<tr class=\"even\">\n<td data-quarto-table-cell-role=\"th\">9<\/td>\n<td>0.990<\/td>\n<td>3534.790477<\/td>\n<td>3488.275714<\/td>\n<td>1.315913<\/td>\n<\/tr>\n<tr class=\"odd\">\n<td data-quarto-table-cell-role=\"th\">10<\/td>\n<td>0.995<\/td>\n<td>4408.064760<\/td>\n<td>4380.679896<\/td>\n<td>0.621245<\/td>\n<\/tr>\n<tr class=\"even\">\n<td data-quarto-table-cell-role=\"th\">11<\/td>\n<td>0.999<\/td>\n<td>6969.745648<\/td>\n<td>6872.765923<\/td>\n<td>1.391439<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<\/div>\n<p><\/p>\n<p>In the table above:<\/p>\n<ul>\n<li><code>q<\/code> represents the target quantile.<\/li>\n<li><code>cp<\/code> represents analytical quantiles from the conjugate prior posterior predictive distribution.<\/li>\n<li><code>mh<\/code> represents quantiles from the Metropolis-Hastings generated posterior predictive samples.<\/li>\n<li><code>error<\/code> represents the percent deviation from analytical quantiles.<\/li>\n<\/ul>\n<p>Even at q=0.999, <code>cp<\/code> and <code>mh<\/code> differ by less than 1.50%.<\/p>\n<p>Unfortunately, most distributional relationships used in practice are not conjugate. But by leveraging conjugate relationships we were able to demonstrate that when the same likelihood, prior and loss data are used, Metropolis-Hastings will yield distributional estimates of the posterior predictive distribution very to close to the analytical distribution.<\/p>\n<p>While implementing your own MCMC sampler is a great way to gain a better understanding of the inner workings of Markov Chain Monte Carlo, in practice it is almost always preferrable to an optimized MCMC library such as PyStan or PyMC3. These will be explored in a future post.<\/p>\n<\/section>\n\n<div style=\\\"border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;\\\">\r\n<div style=\\\"text-align: center;\\\">To <strong>leave a comment<\/strong> for the author, please follow the link and comment on their blog: <strong><a href=\"https:\/\/www.jtrive.com\/posts\/introduction-to-metropolis-hastings\/introduction-to-metropolis-hastings.html\"> The Pleasure of Finding Things Out: A blog by James Triveri <\/a><\/strong>.<\/div>\r\n<hr \/>\r\nWant to share your content on python-bloggers?<a href=\/add-your-blog\/ rel=\\\"nofollow\\\"> click here<\/a>.\r\n<\/div>","protected":false},"excerpt":{"rendered":"<div style = \"width: 60%; display: inline-block; float:left; \">\n<p>Markov Chain Monte Carlo (MCMC) is a class of algorithms used to sample from probability distributions when direct sampling is difficult or inefficient. It leverages Markov chains to explore the target distribution and Monte Carlo methods to per&#8230;<\/p><\/div>\n<div style = \"width: 40%; display: inline-block; float:right;\"><img id=\"excerpts_images\" src=' https:\/\/latex.codecogs.com\/png.latex?f(x%7C%5Ctheta)' width = \"200\"  style = \"padding: 10px;\" \/><\/div>\n<div style=\"clear: both;\"><\/div>\n","protected":false},"author":104,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[4],"tags":[],"class_list":["post-19655","post","type-post","status-publish","format-standard","hentry","category-data-science"],"aioseo_notices":[],"jetpack_featured_media_url":"","jetpack-related-posts":[],"jetpack_sharing_enabled":true,"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/python-bloggers.com\/wp-json\/wp\/v2\/posts\/19655","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/python-bloggers.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/python-bloggers.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/python-bloggers.com\/wp-json\/wp\/v2\/users\/104"}],"replies":[{"embeddable":true,"href":"https:\/\/python-bloggers.com\/wp-json\/wp\/v2\/comments?post=19655"}],"version-history":[{"count":1,"href":"https:\/\/python-bloggers.com\/wp-json\/wp\/v2\/posts\/19655\/revisions"}],"predecessor-version":[{"id":19656,"href":"https:\/\/python-bloggers.com\/wp-json\/wp\/v2\/posts\/19655\/revisions\/19656"}],"wp:attachment":[{"href":"https:\/\/python-bloggers.com\/wp-json\/wp\/v2\/media?parent=19655"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/python-bloggers.com\/wp-json\/wp\/v2\/categories?post=19655"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/python-bloggers.com\/wp-json\/wp\/v2\/tags?post=19655"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}