{"id":9277,"date":"2012-11-12T09:14:46","date_gmt":"2012-11-12T09:14:46","guid":{"rendered":"https:\/\/www.portfolioprobe.com\/?p=9277"},"modified":"2014-01-19T12:28:55","modified_gmt":"2014-01-19T12:28:55","slug":"the-guts-of-a-statistical-factor-model","status":"publish","type":"post","link":"https:\/\/www.portfolioprobe.com\/2012\/11\/12\/the-guts-of-a-statistical-factor-model\/","title":{"rendered":"The guts of a statistical factor model"},"content":{"rendered":"<p>Specifics of statistical factor models and of a particular implementation of them.<\/p>\n<h2>Previously<\/h2>\n<p>Posts that are background for this one include:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.portfolioprobe.com\/2012\/04\/09\/three-things-factor-models-do\/\">Three things factor models do<\/a><\/li>\n<li><a href=\"https:\/\/www.portfolioprobe.com\/2011\/03\/07\/factor-models-of-variance-in-finance\/\">Factor models of variance in finance<\/a><\/li>\n<li><a href=\"https:\/\/www.portfolioprobe.com\/2012\/02\/16\/the-burstfin-r-package\/\">The BurStFin R package<\/a><\/li>\n<li><a href=\"https:\/\/www.portfolioprobe.com\/2012\/03\/12\/the-quality-of-variance-matrix-estimation\/\">The quality of variance matrix estimation<\/a><\/li>\n<\/ul>\n<h2>The problem<\/h2>\n<p>Someone asked me some questions about the statistical factor model in <code>BurStFin<\/code>.\u00a0 The response &#8220;I don&#8217;t know either&#8221; didn&#8217;t seem quite optimal.<\/p>\n<h2>The method<\/h2>\n<p>My way of getting to the answers was:<\/p>\n<ol>\n<li>Go back to the mathematical model<\/li>\n<li>Create a small, simple data example<\/li>\n<\/ol>\n<p>Small data examples can be an effective way to clarify your thoughts.\u00a0 We&#8217;ll see that &#8220;small&#8221; means small for us, not necessarily small for the computer.\u00a0 That those can be different is part of <a href=\"https:\/\/www.portfolioprobe.com\/user-area\/some-hints-for-the-r-beginner\/\">the power of R<\/a>.<\/p>\n<h2>The model<\/h2>\n<p>The post <a href=\"https:\/\/www.portfolioprobe.com\/2011\/03\/07\/factor-models-of-variance-in-finance\/\">&#8220;Factor models of variance in finance&#8221;<\/a> said:<\/p>\n<p>In matrix notation a factor model is:<\/p>\n<p style=\"text-align: center;\">V = B\u2019FB + D<\/p>\n<p>This notation hides a lot of details:<\/p>\n<ul>\n<li>V is a square of numbers (of size number-of-assets by number-of-assets).<\/li>\n<li>B is a rectangle of sensitivities of size number-of-factors by number-of-assets.<\/li>\n<li>F is the variance matrix of the factors (of size number-of-factors by number-of-factors).<\/li>\n<li>D is a diagonal matrix (all off-diagonal elements are zero) of the idiosyncratic variance of each asset.\u00a0 The total size of D is number-of-assets by number-of-assets, but there are only number-of-assets values that are not zero.<\/li>\n<\/ul>\n<p>(end quote)<\/p>\n<p>But that is the model in terms of the variance matrix.\u00a0 The more fundamental model is in terms of the returns.\u00a0 We can write that as:<\/p>\n<p style=\"text-align: center;\">r = fB + e<\/p>\n<p>where:<\/p>\n<ul>\n<li>r is a matrix of returns of size number-of-times by number-of-assets.<\/li>\n<li>f is a matrix of factor returns of size number-of-times by number-of-factors.<\/li>\n<li>B is a matrix of sensitivities of size number-of-factors by number-of-assets.<\/li>\n<li>e is a matrix of idiosyncratic (specific) returns of size number-of-times by number-of-assets.<\/li>\n<\/ul>\n<p>The return equation looks suspiciously like a multivariate linear regression.\u00a0 In fact that is precisely how macroeconomic factor models are built.\u00a0 You could think of &#8220;f&#8221; as being the history of market returns, interest rates and so on; you would then do regressions to get the coefficients in B.<\/p>\n<p>But we are doing a statistical factor model.\u00a0 The more proper name is a latent factor model.\u00a0 That is, we don&#8217;t get to see the factors &#8212; we only infer them.\u00a0 Since we are imagining the existence of phantoms, we might as well assume they are nice: we assume that they are uncorrelated with each other, and each has variance one.\u00a0 This means that &#8220;F&#8221; disappears from the first equation.<\/p>\n<h2>An example<\/h2>\n<p>Let&#8217;s use R to build a 2-factor model for 3 assets.<\/p>\n<h4>generate variance<\/h4>\n<p>The first thing to do is create a matrix of factor sensitivities:<\/p>\n<pre>set.seed(3)\r\nrealfac &lt;- matrix(runif(6, -1,1), nrow=3)<\/pre>\n<p>The first command sets the seed for the random number generator so that the result will be reproducible.\u00a0 The second command creates a 3 by 2 matrix of random uniforms between -1 and 1.<\/p>\n<p>It is traditional for there to be orthogonality between factors in statistical factor models.\u00a0 We can achieve that in our case by using the residuals from a linear regression:<\/p>\n<pre>realfac[,2] &lt;- resid(lm(realfac[,2] ~ realfac[,1]))<\/pre>\n<p>The sensitivities are:<\/p>\n<pre>&gt; realfac\r\n           [,1]       [,2]\r\n[1,] -0.6639169 -0.1563740\r\n[2,]  0.6150328 -0.0802644\r\n[3,] -0.2301153  0.2366384<\/pre>\n<p>Now we can create specific variances and the actual variance matrix:<\/p>\n<pre>realspec &lt;- c(.04, .12, .32)\r\nrealvar &lt;- realfac %*% t(realfac)\r\ndiag(realvar) &lt;- diag(realvar) + realspec<\/pre>\n<p>The variance matrix is:<\/p>\n<pre>&gt; realvar\r\n           [,1]       [,2]       [,3]\r\n[1,]  0.5052385 -0.3957794  0.1157734\r\n[2,] -0.3957794  0.5047077 -0.1605221\r\n[3,]  0.1157734 -0.1605221  0.4289508<\/pre>\n<h4>generate returns<\/h4>\n<p>We&#8217;ll generate returns with a multivariate normal distribution with the variance that we&#8217;ve created.\u00a0 We&#8217;ll use a function from the <code>MASS<\/code> package to do that:<\/p>\n<pre>require(MASS)\r\nretOneGo &lt;- mvrnorm(1e6, mu=c(0,0,0), Sigma=realvar)\r\ncolnames(retOneGo) &lt;- paste0(\"A\", 1:3)<\/pre>\n<p>This creates a matrix that has 3 columns and 1 million rows.\u00a0 The last line names the assets.<\/p>\n<h2>The gore<\/h2>\n<p>We can now estimate a factor model from the returns:<\/p>\n<pre>require(BurStFin)\r\nsfmOneGo &lt;- factor.model.stat(retOneGo, weight=1, \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 output=\"factor\", range=c(2,2))<\/pre>\n<p>Often this function is called by just giving it the matrix of returns, but here we are overriding the default value of some arguments.\u00a0 By default the estimation uses time weights; saying &#8220;<code>weight=1<\/code>&#8221; is the easiest way of specifying equal weights.\u00a0 The default is to return the estimated variance matrix, here we are asking for the object containing the pieces to be returned.\u00a0 Finally we are demanding that exactly 2 factors be used.<\/p>\n<p>The object is:<\/p>\n<pre>&gt; sfmOneGo\r\n$loadings\r\n         [,1]       [,2]\r\nA1  0.8617398 -0.3310512\r\nA2 -0.8883301  0.2148856\r\nA3  0.5923341  0.8038865\r\nattr(,\"scaled:scale\")\r\n[1] 0.728824 1.116636\r\n\r\n$uniquenesses\r\n        A1         A2         A3 \r\n0.14780966 0.16469372 0.03188738 \r\n\r\n$sdev\r\n       A1        A2        A3 \r\n0.7119102 0.7111206 0.6551395 \r\n\r\n$timestamp\r\n[1] \"Sun Nov 11 10:07:19 2012\"\r\n\r\n$call\r\nfactor.model.stat(x = retOneGo, weights = 1, \r\n    output = \"factor\", range.factors = c(2, 2))\r\n\r\nattr(,\"class\")\r\n[1] \"statfacmodBurSt\"<\/pre>\n<p>The components that pertain to the actual model are:<\/p>\n<ul>\n<li><code>loadings<\/code><\/li>\n<li><code>uniquenesses<\/code><\/li>\n<li><code>sdev<\/code><\/li>\n<\/ul>\n<h4>sensitivities<\/h4>\n<p>The B in the equations above is:<\/p>\n<pre>&gt; t(sfmOneGo$sdev * sfmOneGo$loadings)\r\n             A1         A2        A3\r\n[1,]  0.6134813 -0.6317098 0.3880615\r\n[2,] -0.2356787  0.1528096 0.5266578<\/pre>\n<p>The operation to get this is to multiply each row of <code>loadings<\/code> by the corresponding element of <code>sdev<\/code> and then take the transpose.<\/p>\n<h4>specific variances<\/h4>\n<p>The uniquenesses are the fractions of the asset variances that are not explained by the factors.\u00a0 Thus the specific variances are the square of <code>sdev<\/code> times <code>uniquenesses<\/code>:<\/p>\n<pre>&gt; sfmOneGo$sdev^2 * sfmOneGo$uniquenesses\r\n        A1         A2         A3 \r\n0.07491232 0.08328438 0.01368631<\/pre>\n<h4>orthogonality<\/h4>\n<p>Now let&#8217;s have a hunt for orthogonality.\u00a0 It is in the <code>loadings<\/code>:<\/p>\n<pre>&gt; cov.wt(sfmOneGo$loadings, center=FALSE)\r\n$cov\r\n             [,1]         [,2]\r\n[1,] 9.412928e-01 4.163336e-17\r\n[2,] 4.163336e-17 4.010021e-01\r\n\r\n$center\r\n[1] 0\r\n\r\n$n.obs\r\n[1] 3<\/pre>\n<p>Note that the variance is the wrong thing to do here because it subtracts off the means:<\/p>\n<pre>&gt; var(sfmOneGo$loadings)\r\n            [,1]        [,2]\r\n[1,]  0.88794845 -0.06484563\r\n[2,] -0.06484563  0.32217545<\/pre>\n<p>The sensitivities including the standard deviations are not orthogonal:<\/p>\n<pre>&gt; cov.wt(sfmOneGo$loadings * sfmOneGo$sdev, center=FALSE)\r\n$cov\r\n            [,1]        [,2]\r\n[1,]  0.46300419 -0.01837012\r\n[2,] -0.01837012  0.17813184\r\n\r\n$center\r\n[1] 0\r\n\r\n$n.obs\r\n[1] 3<\/pre>\n<p>This suggests that perhaps an option could be added to get orthogonality on this scale.<\/p>\n<h4>accuracy<\/h4>\n<p>We can see how close the estimate of the variance matrix is to the actual value. The estimated variance is:<\/p>\n<pre>&gt; fitted(sfmOneGo)\r\n           A1         A2         A3\r\nA1  0.5068161 -0.4235562  0.1139464\r\nA2 -0.4235562  0.5056925 -0.1646639\r\nA3  0.1139464 -0.1646639  0.4416465\r\nattr(,\"number.of.factors\")\r\n[1] 2\r\nattr(,\"timestamp\")\r\n[1] \"Sun Nov 11 10:07:19 2012\"<\/pre>\n<p>The difference between the estimated variance and its true value is:<\/p>\n<pre>&gt; fitted(sfmOneGo) - realvar\r\n             A1            A2           A3\r\nA1  0.001577602 -0.0277767353 -0.001826927\r\nA2 -0.027776735  0.0009847477 -0.004141787\r\nA3 -0.001826927 -0.0041417871  0.012695676\r\nattr(,\"number.of.factors\")\r\n[1] 2\r\nattr(,\"timestamp\")\r\n[1] \"Sun Nov 11 10:07:19 2012\"<\/pre>\n<p>The dfference between the estimate and the actual is small but not especially close to zero.<\/p>\n<h2>More gore (factor returns)<\/h2>\n<p>Now we&#8217;ll use the model we just estimated to simulate another set of returns.\u00a0 This time, though, we will create factor returns and idiosyncratic returns, and then put them together.<\/p>\n<pre># generate factor returns ('f' in 2nd equation)\r\nrealFacRet &lt;- matrix(rnorm(2e6), ncol=2)\r\n# generate idiosyncratic returns ('e' in 2nd equation)\r\nrealFacSpec &lt;- matrix(rnorm(3e6, \r\n\u00a0\u00a0\u00a0 sd=rep(sfmOneGo$sdev * sqrt(sfmOneGo$uniquenesses), \r\n\u00a0\u00a0\u00a0 each=1e6)), ncol=3)\r\n# compute asset returns\r\nretFR &lt;- realFacRet %*% t(sfmOneGo$sdev * \r\n\u00a0\u00a0\u00a0 sfmOneGo$loadings) + realFacSpec\r\n# name the assets\r\ncolnames(retFR) &lt;- paste0(\"B\", 1:3)<\/pre>\n<p>We use the new return matrix to estimate a factor model:<\/p>\n<pre>sfmFR &lt;- factor.model.stat(retFR, weight=1, \r\n\u00a0\u00a0\u00a0\u00a0 output=\"factor\", range=c(2,2))<\/pre>\n<p>Now we can compare the sensitivities of the two models:<\/p>\n<pre>&gt; t(sfmOneGo$sdev * sfmOneGo$loadings)\r\n             A1         A2        A3\r\n[1,]  0.6134813 -0.6317098 0.3880615\r\n[2,] -0.2356787  0.1528096 0.5266578\r\nattr(,\"scaled:scale\")\r\n[1] 0.728824 1.116636\r\n&gt; t(sfmFR$sdev * sfmFR$loadings)\r\n             B1         B2        B3\r\n[1,]  0.6305413 -0.6505287 0.3757351\r\n[2,] -0.2284438  0.1400226 0.5476813\r\nattr(,\"scaled:scale\")\r\n[1] 0.7171768 1.1040259<\/pre>\n<p>The <code>\"scaled:scale\"<\/code> attribute can be ignored.<\/p>\n<p>We can estimate the factor returns implied by the newly estimated factor model.\u00a0 We have a multivariate regression but it is turned sideways from before.\u00a0 There are number-of-times individual regressions to do.\u00a0 Each of them have number-of-assets observations and number-of-factors coefficients to estimate.\u00a0 The coefficients will be our estimates of the factor returns.<\/p>\n<p>As preparation, we create the matrix of sensitivities:<\/p>\n<pre>sensFRt &lt;- sfmFR$sdev * sfmFR$loadings<\/pre>\n<p>Estimating a million regressions is more than we really want to do.\u00a0 Let&#8217;s do just the first 1000 observations in the return matrix:<\/p>\n<pre>\u00a0estFacRet &lt;- t(coef(lm(t(retFR[1:1000,]) ~ 0 +\r\n\u00a0\u00a0\u00a0 sensFRt)))<\/pre>\n<p>This does a linear regression (<code>lm<\/code>) with the transpose of the first 1000 returns as the response and the sensitivities as the explanatory variables with no intercept.\u00a0 We then get the transpose of the coefficients of that regression.<\/p>\n<p>We see that the variance matrix of the estimated factor returns is close to the identity &#8212; as we want:<\/p>\n<pre>&gt; var(estFacRet)\r\n            sensFRt1    sensFRt2\r\nsensFRt1 1.015795719 0.007103762\r\nsensFRt2 0.007103762 1.064414085<\/pre>\n<p>and that the variance of the errors of the factor return estimates is smaller:<\/p>\n<pre>&gt; var(estFacRet - realFacRet[1:1000,])\r\n            sensFRt1    sensFRt2\r\nsensFRt1  0.07660990 -0.03752711\r\nsensFRt2 -0.03752711  0.06754317<\/pre>\n<p>Plotting the errors shows that there can be fairly large discrepancies however.<\/p>\n<p>Figure 1: The estimated factor returns minus the true factor returns for the first 1000 observations. <a href=\"https:\/\/www.portfolioprobe.com\/2012\/11\/12\/the-guts-of-a-statistical-factor-model\/facreterr\/\" rel=\"attachment wp-att-9322\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-9322\" title=\"facreterr\" alt=\"\" src=\"https:\/\/www.portfolioprobe.com\/wp-content\/uploads\/2012\/11\/facreterr.png\" width=\"512\" height=\"480\" srcset=\"https:\/\/www.portfolioprobe.com\/wp-content\/uploads\/2012\/11\/facreterr.png 512w, https:\/\/www.portfolioprobe.com\/wp-content\/uploads\/2012\/11\/facreterr-250x234.png 250w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/><\/a><\/p>\n<p>It seems feasible to suppose that the regressions to get factor returns would be better in practice since there will be a large number of assets &#8212; the current case is estimating 2 parameters with only 3 observations.\u00a0 However, the estimation of the sensitivities will be worse since there will not be a million time points with which to estimate them (and the market is not going to follow the model anyway).<\/p>\n<p>Figures 2 and 3 compare the true and estimated factor returns.<\/p>\n<p>Figure 2: True and estimated returns for factor 1. <a href=\"https:\/\/www.portfolioprobe.com\/2012\/11\/12\/the-guts-of-a-statistical-factor-model\/facret1\/\" rel=\"attachment wp-att-9323\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-9323\" title=\"facret1\" alt=\"\" src=\"https:\/\/www.portfolioprobe.com\/wp-content\/uploads\/2012\/11\/facret1.png\" width=\"512\" height=\"480\" srcset=\"https:\/\/www.portfolioprobe.com\/wp-content\/uploads\/2012\/11\/facret1.png 512w, https:\/\/www.portfolioprobe.com\/wp-content\/uploads\/2012\/11\/facret1-250x234.png 250w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/><\/a><\/p>\n<p>Figure 3: True and estimated returns for factor 2. <a href=\"https:\/\/www.portfolioprobe.com\/2012\/11\/12\/the-guts-of-a-statistical-factor-model\/facret2\/\" rel=\"attachment wp-att-9324\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-9324\" title=\"facret2\" alt=\"\" src=\"https:\/\/www.portfolioprobe.com\/wp-content\/uploads\/2012\/11\/facret2.png\" width=\"512\" height=\"480\" srcset=\"https:\/\/www.portfolioprobe.com\/wp-content\/uploads\/2012\/11\/facret2.png 512w, https:\/\/www.portfolioprobe.com\/wp-content\/uploads\/2012\/11\/facret2-250x234.png 250w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/><\/a><\/p>\n<h2>Questions<\/h2>\n<p>I&#8217;m thinking that statistical factor models have no advantages over a Ledoit-Wolf shrinkage estimate (<code>var.shrink.eqcor<\/code> in <code>BurStFin<\/code>).\u00a0 Hence enhancing <code>factor.model.stat<\/code> is a waste of time.\u00a0 Why am I wrong?<\/p>\n<p>One of the features of the variance estimators in <code>BurStFin<\/code> is that they allow missing values.\u00a0 There is no guarantee that missing values are handled especially well.\u00a0 Would someone like to research how they should be handled?\u00a0 Please?<\/p>\n<!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons generic via filter on the_content -->","protected":false},"excerpt":{"rendered":"<p>Specifics of statistical factor models and of a particular implementation of them. Previously Posts that are background for this one include: Three things factor models do Factor models of variance in finance The BurStFin R package The quality of variance matrix estimation The problem Someone asked me some questions about the statistical factor model in &hellip; <a href=\"https:\/\/www.portfolioprobe.com\/2012\/11\/12\/the-guts-of-a-statistical-factor-model\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><!-- AddThis Advanced Settings generic via filter on get_the_excerpt --><!-- AddThis Share Buttons generic via filter on get_the_excerpt --><\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[11,6],"tags":[114,228],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.portfolioprobe.com\/wp-json\/wp\/v2\/posts\/9277"}],"collection":[{"href":"https:\/\/www.portfolioprobe.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.portfolioprobe.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.portfolioprobe.com\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.portfolioprobe.com\/wp-json\/wp\/v2\/comments?post=9277"}],"version-history":[{"count":0,"href":"https:\/\/www.portfolioprobe.com\/wp-json\/wp\/v2\/posts\/9277\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.portfolioprobe.com\/wp-json\/wp\/v2\/media?parent=9277"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.portfolioprobe.com\/wp-json\/wp\/v2\/categories?post=9277"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.portfolioprobe.com\/wp-json\/wp\/v2\/tags?post=9277"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}