{"id":9060,"date":"2012-10-15T10:46:59","date_gmt":"2012-10-15T09:46:59","guid":{"rendered":"https:\/\/www.portfolioprobe.com\/?p=9060"},"modified":"2012-10-16T10:20:16","modified_gmt":"2012-10-16T09:20:16","slug":"annotations-for-r-for-dummies","status":"publish","type":"post","link":"https:\/\/www.portfolioprobe.com\/2012\/10\/15\/annotations-for-r-for-dummies\/","title":{"rendered":"Annotations for &#8220;R For Dummies&#8221;"},"content":{"rendered":"<p>Here are detailed comments on the book.\u00a0 Elsewhere there is a <a href=\"https:\/\/www.portfolioprobe.com\/2012\/10\/15\/review-of-r-for-dummies\/\">review of the book<\/a>.<\/p>\n<h2>How to read <em>R For Dummies<\/em><\/h2>\n<p>In order to learn R you need to do something with it.\u00a0 After you have read a little of the book, find something to do.\u00a0 Mix reading and doing your project.<\/p>\n<p>You cannot win if you do not play.<\/p>\n<h2>Two complementary documents<\/h2>\n<p>They are also complimentary.<\/p>\n<h3>Some hints for the R beginner<\/h3>\n<p><a href=\"https:\/\/www.portfolioprobe.com\/user-area\/some-hints-for-the-r-beginner\/\">&#8220;Some hints for the R beginner&#8221;<\/a> is\u00a0 a set of pages that give you the basics of the R language.\u00a0 It is a completely different approach to the one <em>R For Dummies<\/em> takes &#8212; you may want to investigate it.<\/p>\n<h3><em>The R Inferno<\/em><\/h3>\n<p>If you are just at the beginning of learning R, you should ignore <a href=\"http:\/\/www.burns-stat.com\/pages\/Tutor\/R_inferno.pdf\" target=\"_blank\"><em>The R Inferno<\/em><\/a> (except perhaps Circle 1).<\/p>\n<p>When you start using R for real <strong>and<\/strong> run into problems, that is the time to pick it up and see if it helps.<\/p>\n<h2>Missing piece<\/h2>\n<p>There is one thing that I think is missing in <em>R For Dummies<\/em>.\u00a0 Actually it isn&#8217;t missing, it comes at the very end while I think it should be at the start.<\/p>\n<p>That piece is the <code>search<\/code> function.\u00a0 More specifically the way that R operates that is highlighted by the results of the <code>search<\/code> function.<\/p>\n<p>The start of <a href=\"https:\/\/www.portfolioprobe.com\/user-area\/some-hints-for-the-r-beginner\/\">&#8220;Some hints for the R beginner&#8221;<\/a> talks about <code>search<\/code> and how R finds objects.<\/p>\n<h2>How to use these annotations<\/h2>\n<h4>first learning<\/h4>\n<p>If you are new to R and first reading the book, then you should probably mostly ignore my comments.\u00a0 However, when you are confused by something in the book, you can look to see if there is a comment on that page that pertains to what you are confused about.<\/p>\n<h4>revising<\/h4>\n<p>On further reading, these comments are more likely to be of use.\u00a0 Some are clarifications, some are extensions.<\/p>\n<h2>Page by page comments<\/h2>\n<p>These comments are based on the first printing.<\/p>\n<h3>Page 10<\/h3>\n<p>There is more history in the <a href=\"https:\/\/www.portfolioprobe.com\/2012\/05\/31\/inferno-ish-r\/\">Inferno-ish R presentation<\/a>.<\/p>\n<h3>Page 11<\/h3>\n<h4>distribution<\/h4>\n<p>I&#8217;m not a lawyer, but I think the phrasing about redistribution is not right.\u00a0 I think it should say &#8220;change <strong>and<\/strong> redistribute&#8221; rather than &#8220;change or redistribute&#8221;.<\/p>\n<p>If what you do never leaves your entity, then you can do absolutely whatever you want.\u00a0 That is the free as in speech part.\u00a0 Legalities only come into play if what you do is made available to others.\u00a0 It is a common misunderstanding that you are restricted in what you do within your own world.<\/p>\n<h4>runs anywhere<\/h4>\n<p>The book highlights that R runs on many operating systems.\u00a0 It fails to make clear that the objects that it creates on the operating systems are all the same.\u00a0 You can start a project on a Linux machine at work, continue it while you commute with your Mac laptop, and then finish it on your Windows machine at home.\u00a0 No problem.<\/p>\n<h3>Page 12<\/h3>\n<p>The book should tell you not to be afraid of new words.\u00a0 New words like &#8220;vector&#8221;.\u00a0 You don&#8217;t need to make friends with them right away, but don&#8217;t be scared off.<\/p>\n<p>(<strong>technical<\/strong>) Unhappily the word &#8220;vector&#8221; in R has several meanings &#8212; so it is unfortunate that it is the first new word.\u00a0 The meaning used throughout the book is the most common meaning.\u00a0 See <em><a href=\"http:\/\/www.burns-stat.com\/pages\/Tutor\/R_inferno.pdf\" target=\"_blank\">The R Inferno<\/a><\/em> (Circle 5.1) for the gory details.<\/p>\n<h3>Page 13<\/h3>\n<h4>statistics<\/h4>\n<p>Pretty much everywhere in the book where it says &#8220;statistics&#8221; I would prefer &#8220;data analysis&#8221; instead.\u00a0 Statistics in many people&#8217;s mind is formal and academic, not like what they do.\u00a0 More people can feel comfortable doing data analysis than statistics.<\/p>\n<p>In addition to the fear factor, there really is a (slight) difference between data analysis and statistics.\u00a0 I think data analysis is more important even though I&#8217;m trained as a statistician.<\/p>\n<h4>fields of study<\/h4>\n<p>There are additional fields of study where R is used that are not considered to be data hotbeds, such as music and literature.\u00a0 The flexibility of R becomes very important for data in non-traditional forms.<\/p>\n<h3>Page 23<\/h3>\n<h4>vectors<\/h4>\n<p>If you are new to R, you shouldn&#8217;t expect yourself to understand this discussion.\u00a0 Just let it sink in over time.<\/p>\n<h3>Page 24<\/h3>\n<h4>assignment operator<\/h4>\n<p>Always put spaces around the assignment operator. That makes the code much more readable.<\/p>\n<p>The book tells you on page 63 that you can use <code>=<\/code> as well.\u00a0 You will see both used.\u00a0 They are mostly the same (differences are explained in <em>The R Inferno<\/em>, Circle 8.2.26).\u00a0 I agree with the book&#8217;s approach to use <code>&lt;-<\/code> but really you can use either.<\/p>\n<h3>Page 28<\/h3>\n<h4>RStudio<\/h4>\n<p>A nice feature of the RStudio workspace view is that it categorizes the objects.<\/p>\n<h3>Page 29<\/h3>\n<h4>Windows pathnames (technical)<\/h4>\n<p>The book implies that you can not write Windows pathnames with backslashes.\u00a0 Actually you can, you just need to put a double backslash where you want a backslash.\u00a0 Hence it is easier and (often) less confusing to use slashes rather than backslashes.<\/p>\n<h3>Page 30<\/h3>\n<h4>loading objects (technical)<\/h4>\n<p>It is possible to use <code>attach<\/code> instead of <code>load<\/code>.\u00a0 If you load an object, then it is put into your global environment.\u00a0 If you attach an object, it is put separately on the search list.\u00a0 If you modify an object that has been attached, then the modified version goes into your global environment.<\/p>\n<h3>Page 32<\/h3>\n<h4>vectorization<\/h4>\n<p>There are different forms of vectorization, and the book doesn&#8217;t make that explicit.\u00a0 Vectorization can be put into three categories:<\/p>\n<ul>\n<li>vectorization along vectors<\/li>\n<li>summary<\/li>\n<li>vectorization across arguments<\/li>\n<\/ul>\n<p>Functions like <code>sum<\/code> and <code>mean<\/code> are vectorized in the sense that they take a vector and summarize it.\u00a0 This is done in pretty much all languages, it is not special.<\/p>\n<p>Vectorization as it is commonly spoken of in R is vectorization along vectors.\u00a0 For example the addition operator as seen on page 24.\u00a0 This is the form of vectorization that is so useful and powerful in R.<\/p>\n<p>You should not expect the third form of vectorization in R.\u00a0 However, it does exist in a few functions.\u00a0 The <code>sum<\/code> and <code>mean<\/code> functions do summary-type vectorization:<\/p>\n<pre>&gt; sum(1:3)\r\n[1] 6\r\n&gt; mean(1:3)\r\n[1] 2<\/pre>\n<p>The <code>sum<\/code> function also does vectorization along arguments:<\/p>\n<pre>&gt; sum(1, 2, 3)\r\n[1] 6<\/pre>\n<p>That is basically anomalous.\u00a0 The <code>mean<\/code> function is more typical by not doing this form of vectorization:<\/p>\n<pre>&gt; mean(1, 2, 3) # WRONG\r\n[1] 1<\/pre>\n<p>Unfortunately you don&#8217;t get an error or a warning in this case.\u00a0 Do not expect this form of vectorization.<\/p>\n<h3>Page 33<\/h3>\n<h4>error message<\/h4>\n<p>Getting error messages can be frightening for a while.\u00a0 But it&#8217;s not the end of the world.\u00a0 Relax.<\/p>\n<h3>Page 36<\/h3>\n<h4>names (technical)<\/h4>\n<p>In fact it is possible to get any name that you want, but you probably don&#8217;t want to.<\/p>\n<h4>return (technical)<\/h4>\n<p>Actually <code>return<\/code> is not a reserved word, but you should treat it as if it were.<\/p>\n<pre>&gt; break &lt;- 1\r\nError in break &lt;- 1 : invalid (NULL) left side of assignment\r\n&gt; while &lt;- 1\r\nError: unexpected assignment in \"while &lt;-\"<\/pre>\n<pre>&gt; return &lt;- 1 #do NOT do this\r\n&gt;<\/pre>\n<h3>Page 37<\/h3>\n<h4>F and T<\/h4>\n<p>I wish to emphasize the advice in the book:<\/p>\n<ul>\n<li>never abbreviate <code>TRUE<\/code> and <code>FALSE<\/code> to <code>T<\/code> and <code>F<\/code><\/li>\n<li>avoid using <code>T<\/code> and <code>F<\/code> as object names<\/li>\n<\/ul>\n<h3>Page 42<\/h3>\n<h4>library<\/h4>\n<p>The book suggests (with a slight revision on page 361) to load packages with the <code>library<\/code> function.\u00a0 Some of us prefer <code>require<\/code> instead of <code>library<\/code> for this use.\u00a0 The best use of <code>library<\/code> is without arguments &#8212; this gives you a list of available packages.<\/p>\n<pre>&gt; library(fortunes) # load package\r\n&gt; require(fortunes) # same thing\r\n&gt; library() # get list of packages\r\n&gt; require() # don't do this\r\nLoading required package: \r\nFailed with error:\u00a0 \u2018invalid package name\u2019<\/pre>\n<h4>contributed packages<\/h4>\n<p>I think the authors might be being a little too polite in their description of the quality of contributed packages.<\/p>\n<p>I find base R to be phenomenally clean code &#8212; it is hard to find commercial code that is less buggy.\u00a0 The quality of contributed packages varies widely.\u00a0 A few are up to the standards of base R, some are quite good, I&#8217;m sure there are a few dreadful ones.<\/p>\n<p>With contributed packages you need to be more cautious than when only using base R functionality.\u00a0 Or perhaps I should say that you always need to be vigilent, but if you are using contributed packages, there is a larger chance that a problem is due to a package rather than your own fault.<\/p>\n<p>Without inspecting the code, I know of two clues to suggest a package is of good quality:<\/p>\n<ul>\n<li>widely used<\/li>\n<li>good documentation<\/li>\n<\/ul>\n<p>A widely used package &#8212; such as those highlighted in the book &#8212; is an indication that a lot of problems with the code have been fixed or didn&#8217;t exist in the first place.<\/p>\n<p>Many people use the test of the cleanliness of restaurant restrooms to infer the cleanliness of the kitchen.\u00a0 Likewise, carefully written documentation is likely to be a sign of clean code.<\/p>\n<h3>Page 46<\/h3>\n<h4>exponentiation (technical)<\/h4>\n<p>It is not a good idea to use <code>**<\/code> to mean exponentiation &#8212; it is not out of the question for that to go away.\u00a0 Stick to using the <code>^<\/code> operator.<\/p>\n<h3>Page 49<\/h3>\n<h4>log and exp<\/h4>\n<p>The sentence a little below mid-page about creating the vector inside <code>exp<\/code> should say inside the <code>log<\/code> function.<\/p>\n<h3>Page 52<\/h3>\n<h4>infinity<\/h4>\n<p>The last sentence on the page should say <code>10^309<\/code> and <code>10^310<\/code> rather than <code>10^308<\/code> and <code>10^309<\/code>.<\/p>\n<h3>Page 54<\/h3>\n<h4>table 4-3<\/h4>\n<p>You are unlikely to use any of these except for <code>is.na<\/code>, which you may use quite a lot.<\/p>\n<h3>Page 55<\/h3>\n<h4>types of vectors<\/h4>\n<p>All of the types of vectors listed may have missing values (<code>NA<\/code>).<\/p>\n<h3>Page 56<\/h3>\n<h4>integer versus double<\/h4>\n<p>One of the nice things about R is that you hardly ever need to worry about whether something is stored as an integer or a double.<\/p>\n<h4>largest integer (technical)<\/h4>\n<p>We can see how big the biggest integer is in a couple different ways:<\/p>\n<pre>&gt; format(2^31 - 1, big.mark=\",\")\r\n[1] \"2,147,483,647\"\r\n&gt; .Machine$integer.max\r\n[1] 2147483647<\/pre>\n<h3>Page 59<\/h3>\n<h4>indexing<\/h4>\n<p>What is called &#8220;indexing&#8221; in the book is more commonly called &#8220;subscripting&#8221;.<\/p>\n<h3>Page 64<\/h3>\n<h4>missing value testing<\/h4>\n<p>It is a common mistake to try testing missing values with a command like:<\/p>\n<pre>&gt; x == NA<\/pre>\n<p>That doesn&#8217;t work &#8212; you need to use <code>is.na<\/code>.<\/p>\n<h3>Page 65<\/h3>\n<h4>any and all<\/h4>\n<p>The last sentence on the page is a false statement.\u00a0 The <code>any<\/code> and <code>all<\/code> functions are smart enough to know when they can know the answer and when they can&#8217;t:<\/p>\n<pre>&gt; all(c(NA, FALSE))\r\n[1] FALSE\r\n&gt; all(c(NA, TRUE))\r\n[1] NA\r\n&gt; any(c(NA, FALSE))\r\n[1] NA\r\n&gt; any(c(NA, TRUE))\r\n[1] TRUE<\/pre>\n<h3>Page 72<\/h3>\n<h4>assigning to character (technical)<\/h4>\n<p>It is more correct to think of the mode being character than the class being character.<\/p>\n<h3>Page 82<\/h3>\n<h4>grep<\/h4>\n<p>Alternatively, you can use the <code>value<\/code> argument of <code>grep<\/code>:<\/p>\n<pre>&gt; grep(\"New\", state.name, value=TRUE)\r\n[1] \"New Hampshire\" \"New Jersey\"\u00a0\u00a0\u00a0 \"New Mexico\"\u00a0 \u00a0\r\n[4] \"New York\"<\/pre>\n<h3>Page 83<\/h3>\n<h4>sub versus gsub<\/h4>\n<p>Here is an example that should make clear the difference between <code>sub<\/code> and <code>gsub<\/code>:<\/p>\n<pre>&gt; gsub(\"e\", \"a\", c(\"sheep\", \"cheap\", \"cheep\"))\r\n[1] \"shaap\" \"chaap\" \"chaap\"\r\n&gt; sub(\"e\", \"a\", c(\"sheep\", \"cheap\", \"cheep\"))\r\n[1] \"shaep\" \"chaap\" \"chaep\"<\/pre>\n<h3>Page 86<\/h3>\n<h4>factor attributes (technical)<\/h4>\n<p>The book says:<\/p>\n<blockquote><p>[factors are] neither character vectors nor numeric vectors, although they have some attributes of both.<\/p><\/blockquote>\n<p>This sentence is using &#8220;attribute&#8221; in the non-technical sense.\u00a0 But attributes in the technical sense do come into play: factors have &#8220;class&#8221; and &#8220;levels&#8221; attributes.<\/p>\n<h3>Page 87<\/h3>\n<h4>factor versus character<\/h4>\n<p>Notice how the factor is printed differently than the character vector.<\/p>\n<h3>Page 91<\/h3>\n<h4>American regions (off topic)<\/h4>\n<p>There is a brilliant analysis of North American regions called <a href=\"http:\/\/en.wikipedia.org\/wiki\/The_Nine_Nations_of_North_America\" target=\"_blank\">The Nine Nations of North America<\/a>.<\/p>\n<h3>Page 94<\/h3>\n<h4>date sequences<\/h4>\n<p>You might wonder what happens if you start on the thirty-first of the month rather than the first.\u00a0 If you wonder something, try it out to see what happens:<\/p>\n<pre>&gt; myStart &lt;- as.Date(\"2012-12-31\")\r\n&gt; seq(myStart, by=\"1 month\", length=6)\r\n[1] \"2012-12-31\" \"2013-01-31\" \"2013-03-03\" \"2013-03-31\"\r\n[5] \"2013-05-01\" \"2013-05-31\"<\/pre>\n<p>The result is a bit Aspergery, and not to everyone&#8217;s taste.\u00a0 But perhaps we can do better:<\/p>\n<pre>&gt; seq(myStart + 1, by=\"1 month\", length=6) - 1\r\n[1] \"2012-12-31\" \"2013-01-31\" \"2013-02-28\" \"2013-03-31\"\r\n[5] \"2013-04-30\" \"2013-05-31\"<\/pre>\n<p>Wondering is great, experimenting is even greater.<\/p>\n<h3>Page 104<\/h3>\n<h4>one-dimensional arrays (technical)<\/h4>\n<p>Regular vectors are not dimensional at all in the technical sense, but we think of them as being one-dimensional.\u00a0 But there really are one-dimensional arrays.\u00a0 They are almost like plain vectors but not quite.<\/p>\n<h3>Page 106<\/h3>\n<h4>playing with attributes<\/h4>\n<p>For large objects you often won&#8217;t like the response you get when you do:<\/p>\n<pre>&gt; attributes(x)<\/pre>\n<p>Often better is to just look at what attributes the object has:<\/p>\n<pre>&gt; names(attributes(x))<\/pre>\n<h3>Page 109<\/h3>\n<h4>extracting values from matrices<\/h4>\n<p>The flexibility of subscripting matrices (and data frames) as vectors is a curse as well as a blessing.<\/p>\n<p>If you want to do:<\/p>\n<pre>&gt; x[-2,]<\/pre>\n<p>and you do:<\/p>\n<pre>&gt; x[-2]<\/pre>\n<p>then you will get an entirely different result.\u00a0 This can be a hard mistake to find &#8212; a few pixels difference on your screen can have a big impact.<\/p>\n<h3>Page 113<\/h3>\n<h4>first.matrix<\/h4>\n<p>The example on this page assumes that <code>first.matrix<\/code> is as it was first created, not as it has been modified in the intervening exercises.<\/p>\n<h3>Page 114<\/h3>\n<h4>matrix operations<\/h4>\n<p>So adding numbers by row is easy.\u00a0 How to add them by column?\u00a0 One way is:<\/p>\n<pre>&gt; fmat &lt;- matrix(1:12, ncol=4)\r\n&gt; fmat + rep((1:4)*10, each=nrow(fmat))\r\n\u00a0\u00a0\u00a0\u00a0 [,1] [,2] [,3] [,4]\r\n[1,]\u00a0\u00a0 11\u00a0\u00a0 24\u00a0\u00a0 37\u00a0\u00a0 50\r\n[2,]\u00a0\u00a0 12\u00a0\u00a0 25\u00a0\u00a0 38\u00a0\u00a0 51\r\n[3,]\u00a0\u00a0 13\u00a0\u00a0 26\u00a0\u00a0 39\u00a0\u00a0 52<\/pre>\n<p>This uses the <code>rep<\/code> function to create a vector with as many elements as the matrix has (assuming the vector being replicated has length equal to the number of columns), and the replicated values are in the desired positions.<\/p>\n<h3>Page 116<\/h3>\n<h4>inverting a matrix<\/h4>\n<p>The reason that the command to invert a matrix is not intuitive is because it is seldom the case that (explicitly) inverting a matrix is a good idea.<\/p>\n<h3>Page 117<\/h3>\n<h4>vectors as arrays (technical)<\/h4>\n<p>Actually vectors, in general, are not arrays at all.\u00a0 The difference is of little consequence, however.<\/p>\n<h4>third array dimension (technical)<\/h4>\n<p>I call the items in the third dimension of an array &#8220;slices&#8221; rather than &#8220;tables&#8221;.\u00a0 I&#8217;m not aware of any standardized nomenclature.\u00a0 I don&#8217;t think &#8220;tables&#8221; is such a good choice because there are other meanings of &#8220;table&#8221; in R.<\/p>\n<h4>array filling (technical)<\/h4>\n<p>I&#8217;m not able to follow the sentence in the book describing how arrays are filled.\u00a0 How I think of it is that the first subscripts vary fastest (no matter how many dimensions are in the array).<\/p>\n<h3>Page 119<\/h3>\n<h4>rows and columns (technical)<\/h4>\n<p>Maybe my brain went on strike, but I think that &#8220;rows&#8221; and &#8220;columns&#8221; are reversed in the first paragraph on the page.<\/p>\n<h3>Page 120<\/h3>\n<h4>data frame structure<\/h4>\n<p>Note that all the vectors that make up the columns need to be the same length.<\/p>\n<h4>data frame structure (technical)<\/h4>\n<p>It is possible for a &#8220;column&#8221; of a data frame to be a matrix, in which case the number of rows needs to match.<\/p>\n<h4>data frame length<\/h4>\n<p>Note that the length of a data frame is different from the length of the equivalent matrix.\u00a0 The length of the data frame is the number of columns, while the length of the matrix is the number of columns times the number of rows.<\/p>\n<h3>Page 122<\/h3>\n<h4>character versus factor<\/h4>\n<p>The book suggests always making sure that data frames hold character vectors instead of factors in order to reduce problems.\u00a0 The other main route to avoid frustration is to always assume that there are factors.<\/p>\n<p>The thing you don&#8217;t want to do is assume that what is really a factor is a character vector.<\/p>\n<h4>naming variables<\/h4>\n<p>If in the middle of the page where it says &#8220;In the previous section&#8221; you don&#8217;t know what they are talking about, not to worry &#8212; you&#8217;re not alone.<\/p>\n<h4>as with matrices<\/h4>\n<p>I&#8217;m not clear on the reference to matrices at the very bottom of the page.<\/p>\n<h3>Page 124<\/h3>\n<h4>data frame subscripting<\/h4>\n<p>You can get a column of a data frame using either the $ or [ form of subscripting.\u00a0 But there is a difference:<\/p>\n<pre>&gt; baskets.df$Granny\r\n[1] 12\u00a0 4\u00a0 5\u00a0 6\u00a0 9\u00a0 3\r\n&gt; baskets.df[,Granny]\r\nError in `[.data.frame`(baskets.df, , Granny) : \r\n  object 'Granny' not found\r\n&gt; baskets.df[,\"Granny\"]\r\n[1] 12\u00a0 4\u00a0 5\u00a0 6\u00a0 9\u00a0 3<\/pre>\n<p>Note the quotes or lack thereof.<\/p>\n<h3>Page 130<\/h3>\n<h4>pieces of a list<\/h4>\n<p>I prefer calling the pieces of a list &#8220;components&#8221; rather than &#8220;elements&#8221;.\u00a0 One reason is that a component of a list can be another list, and hence not very elementary.<\/p>\n<h3>Page 139<\/h3>\n<p>The functions that you write are essentially the same as the inbuilt functions.\u00a0 They are first-class citizens.<\/p>\n<h3>Page 152<\/h3>\n<h4>functional programming<\/h4>\n<p>You can very effectively use R without having a clue what &#8220;functional programming&#8221; means.\u00a0 The important idea behind functional programming is safety &#8212; the data that you want to use is almost surely the data that really is being used.<\/p>\n<h3>Page 153<\/h3>\n<h4>calculation example<\/h4>\n<p>The object names were obviously changed midstream. <code>fifty<\/code> should be <code>half<\/code> and <code>hundred<\/code> should be <code>full<\/code>.<\/p>\n<h3>Page 157<\/h3>\n<h4>generic functions (technical)<\/h4>\n<p>A detail that only occasionally really matters is that the argument names in methods should match the argument name in the generic.\u00a0 You don&#8217;t want to have the argument called <code>x<\/code> in the generic but <code>object<\/code> in a method.<\/p>\n<h3>Page 171<\/h3>\n<h4>looping without loops<\/h4>\n<p>Using apply functions is really hiding loops rather than eliminating them.<\/p>\n<h3>Page 172<\/h3>\n<h4>number of apply functions<\/h4>\n<p>Not that it matters, but I count 8 apply functions in the base package in version 2.15.0.\u00a0 There is also a reasonably large number of apply functions in contributed packages.<\/p>\n<h3>Page 188<\/h3>\n<h4>error checking (technical)<\/h4>\n<p>Another way to write the check for out of bounds values is:<\/p>\n<pre>stopifnot(all(x &gt;= 0 &amp; x &lt;= 1))<\/pre>\n<p>This will create an appropriate error message if there is a violation.<\/p>\n<p>This will take multiple conditions separated by commas.\u00a0 So you can have checks like:<\/p>\n<pre>stopifnot(is.matrix(x), is.data.frame(y))<\/pre>\n<p>to make sure that <code>x<\/code> is a matrix and <code>y<\/code> is a data frame.<\/p>\n<h3>Page 190<\/h3>\n<p>technical tip (technical)<\/p>\n<p>The first sentence starts:<\/p>\n<blockquote><p>In fact, functions are generic &#8230;<\/p><\/blockquote>\n<p>It should read:<\/p>\n<blockquote><p>In fact, some functions are generic &#8230;<\/p><\/blockquote>\n<h3>Page 192<\/h3>\n<h4>factor to numeric<\/h4>\n<p>The book gives the efficient method of converting a factor to numeric:<\/p>\n<pre>as.numeric(levels(x))[x]<\/pre>\n<p>The slightly less efficient but easier to remember method is:<\/p>\n<pre>as.numeric(as.character(x))<\/pre>\n<p>Don&#8217;t forget the <code>as.character<\/code> &#8212; it matters.<\/p>\n<h4>problems with factors (technical)<\/h4>\n<p>Circle 8.2 of <em>The R Inferno<\/em> starts with a number of items about factors.<\/p>\n<h3>Page 193<\/h3>\n<h4>documentation quality<\/h4>\n<p>Unfortunately, I think the authors are painting too rosy of\u00a0 a picture of the quality of R documentation.\u00a0 There probably is some great documentation for any task or issue that you have, but you may have a significant search on your hands to find that great document.<\/p>\n<h3>Page 194<\/h3>\n<h4>help files<\/h4>\n<p>It takes practice to learn how to use help files well.\u00a0 It doesn&#8217;t help that sections of the help files are in the wrong order (in my opinion).\u00a0 The &#8220;See also&#8221; and &#8220;Examples&#8221; should be near the top, &#8220;Details&#8221; should be at the bottom.<\/p>\n<p>The examples often are the most important part.\u00a0 The book implies that all examples are reproducible.\u00a0 Not all are, but many are.<\/p>\n<p>You don&#8217;t need to understand the whole of a help file the first time around.\u00a0 The goal should be to <strong>improve<\/strong> your understanding of the function.<\/p>\n<h3>Page 199<\/h3>\n<h4>Stack Overflow<\/h4>\n<p>It is possible to subscribe via RSS to R tags.<\/p>\n<h3>Page 200<\/h3>\n<h4>cards<\/h4>\n<p>With the cards I&#8217;m used to, the command to create cards should include <code>2:10<\/code> rather than <code>1:9<\/code>.<\/p>\n<h3>Page 202<\/h3>\n<h4>session info<\/h4>\n<p>The book says that it is sometimes helpful to include the results of <code>sessionInfo()<\/code> in questions.\u00a0 I would change that from &#8220;sometimes&#8221; to &#8220;often&#8221;.<\/p>\n<h3>Page 210<\/h3>\n<h4>reading in data<\/h4>\n<p>The start of Circle 8.3 in <em>The R Inferno<\/em> has a number of items about problems reading data in.<\/p>\n<h3>Page 216<\/h3>\n<h4>changing directories<\/h4>\n<p>If you are using the RGui, there is a &#8220;change dir&#8221; item in the File menu.<\/p>\n<h3>Page 221<\/h3>\n<h4>three subset operators<\/h4>\n<p>The [[ operator always gets one component.\u00a0 The result is often not a list.<\/p>\n<p>In contrast the [ operator can get any number of items and (except for dropping)\u00a0 gives you back the same type of object.<\/p>\n<h3>Page 226<\/h3>\n<h4>removing duplicates<\/h4>\n<p>The book shows the removal of duplicates using both logical subscripts and negative numeric subscripts.\u00a0 Be careful with the latter of these:<\/p>\n<pre>&gt; vec &lt;- 1:5\r\n&gt; dups &lt;- duplicated(vec)\r\n&gt; vec[!dups]\r\n[1] 1 2 3 4 5\r\n&gt; vec[-which(dups)]\r\ninteger(0)<\/pre>\n<p>If you create a vector of negative subscripts, you need to make sure it has at least one element.\u00a0 Otherwise you get nothing when you want everything.<\/p>\n<h3>Page 240<\/h3>\n<h4>apply output<\/h4>\n<p>The book is in error when it says that the result of <code>apply<\/code> is always a vector.\u00a0 Other possible results include a matrix and a list.<\/p>\n<h3>Page 243<\/h3>\n<h4>sapply example (technical)<\/h4>\n<p>The example at the very top of the page that uses <code>ifelse<\/code> would be more in the spirit of R if it instead used:<\/p>\n<pre>if(is.numeric(x)) mean(x) else NA<\/pre>\n<h3>Page 245<\/h3>\n<h4>aggregate (technical)<\/h4>\n<p>Alternatives to <code>aggregate<\/code> include the <code>by<\/code> function (if you have a data frame) and the <code>data.table<\/code> package.<\/p>\n<h3>Page 253<\/h3>\n<h4>third paragraph<\/h4>\n<p>Something seems to have gone wrong.\u00a0 That the phrase &#8220;doesn&#8217;t make sense at all&#8221; appears in the paragraph seems apropos.<\/p>\n<h3>Page 254<\/h3>\n<h4>checking data<\/h4>\n<p>Often checking data with graphics is best. Do plots look as expected?<\/p>\n<h3>Page 260<\/h3>\n<h4>mode<\/h4>\n<p>There is a <code>mode<\/code> function in R, but it is not the same meaning as in the discussion of location.<\/p>\n<h3>Page 270<\/h3>\n<h4>missing values (technical)<\/h4>\n<p>You might think that <code>\"pairwise\"<\/code> should be the default choice since it uses the most data.\u00a0 The problem with it is that the resulting correlation matrix is not guaranteed to be positive definite.<\/p>\n<h3>Page 274<\/h3>\n<h4>prop.table (technical)<\/h4>\n<p>I wondered if <code>prop.table<\/code> recognized a table that had added margins.\u00a0 The answer is no, it thinks the margins are part of the data.<\/p>\n<h3>Page 312<\/h3>\n<h4>multiple plots (technical)<\/h4>\n<p>If you want to put the graphics device back into a single plot state without using the <code>old.par<\/code> trick, then say:<\/p>\n<pre>par(mfcol=c(1,1))<\/pre>\n<p>or<\/p>\n<pre>par(mfrow=c(1,1))<\/pre>\n<p>It doesn&#8217;t matter which you say.<\/p>\n<h3>Page 314<\/h3>\n<h4>hardcopy graphics<\/h4>\n<p>If you are putting your graphics into a word processor, then often <code>pdf<\/code> is a good choice.<\/p>\n<p>If you are putting your graphics onto a webpage or into a presentation, then <code>png<\/code> can be a good choice.<\/p>\n<h3>Page 326<\/h3>\n<h4>boxplots (technical)<\/h4>\n<p>To be clear: whiskers are <strong>at most<\/strong> 1.5 times the width of the box.<\/p>\n<h3>Page 332<\/h3>\n<h4>changing directory (technical)<\/h4>\n<p>To change the working directory and then change it back to the original, you would do something like:<\/p>\n<pre>&gt; origwd &lt;- getwd()\r\n&gt; setwd(\"blah\/blah\")\r\n&gt; # do stuff\r\n&gt; setwd(origwd)<\/pre>\n<h3>Page 359<\/h3>\n<h4>CRAN mirrors (technical)<\/h4>\n<p>While all mirrors are conceptually the same as the primary CRAN site, it takes time for changes to propagate.\u00a0 This is unlikely to be an issue unless you are trying to get a brand new release.<\/p>\n<h3>Page 360<\/h3>\n<h4>CRAN packages<\/h4>\n<p>As of 2012 October 14, CRAN has 4087 contributed packages.<\/p>\n<h3>Page 362<\/h3>\n<h4>unloading packages<\/h4>\n<p>I&#8217;ve used R pretty much every day for over a decade and never unloaded a package.\u00a0 I doubt this will be a big issue for you.<\/p>\n<h3>Page 363<\/h3>\n<h4>R-Forge<\/h4>\n<p>R-Forge also provides mailing lists.\u00a0 The immediate significance of this for you is that some of your favorite contributed packages might have a dedicated mailing list.<\/p>\n<h3>Page 364<\/h3>\n<h4>own repository (technical)<\/h4>\n<p>You can even set up your own repository and fill it with packages that you write.<\/p>\n<h3>Page 1<\/h3>\n<p>Do you appreciate the meaning of:<\/p>\n<pre>knowledge &lt;- apply(theory, 1, sum)<\/pre>\n<p>as promised?<\/p>\n<h2>Epilogue<\/h2>\n<blockquote><p>I saw a little teddy bear.<br \/>\nWell, I said to myself,<br \/>\n&#8220;I know what I want. I gotta get a bear some way.&#8221;<\/p><\/blockquote>\n<p>from &#8220;You cannot win if you do not play&#8221; by Steve Forbert<br \/>\n<object width=\"520\" height=\"390\" classid=\"clsid:d27cdb6e-ae6d-11cf-96b8-444553540000\" codebase=\"http:\/\/download.macromedia.com\/pub\/shockwave\/cabs\/flash\/swflash.cab#version=6,0,40,0\"><param name=\"allowFullScreen\" value=\"true\" \/><param name=\"allowscriptaccess\" value=\"always\" \/><param name=\"src\" value=\"http:\/\/www.youtube.com\/v\/x9VL48cPVPw?version=3&amp;hl=en_GB\" \/><param name=\"allowfullscreen\" value=\"true\" \/><embed width=\"520\" height=\"390\" type=\"application\/x-shockwave-flash\" src=\"http:\/\/www.youtube.com\/v\/x9VL48cPVPw?version=3&amp;hl=en_GB\" allowFullScreen=\"true\" allowscriptaccess=\"always\" allowfullscreen=\"true\" \/><\/object><\/p>\n<!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons generic via filter on the_content -->","protected":false},"excerpt":{"rendered":"<p>Here are detailed comments on the book.\u00a0 Elsewhere there is a review of the book. How to read R For Dummies In order to learn R you need to do something with it.\u00a0 After you have read a little of the book, find something to do.\u00a0 Mix reading and doing your project. You cannot win &hellip; <a href=\"https:\/\/www.portfolioprobe.com\/2012\/10\/15\/annotations-for-r-for-dummies\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><!-- AddThis Advanced Settings generic via filter on get_the_excerpt --><!-- AddThis Share Buttons generic via filter on get_the_excerpt --><\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[23,6],"tags":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.portfolioprobe.com\/wp-json\/wp\/v2\/posts\/9060"}],"collection":[{"href":"https:\/\/www.portfolioprobe.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.portfolioprobe.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.portfolioprobe.com\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.portfolioprobe.com\/wp-json\/wp\/v2\/comments?post=9060"}],"version-history":[{"count":0,"href":"https:\/\/www.portfolioprobe.com\/wp-json\/wp\/v2\/posts\/9060\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.portfolioprobe.com\/wp-json\/wp\/v2\/media?parent=9060"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.portfolioprobe.com\/wp-json\/wp\/v2\/categories?post=9060"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.portfolioprobe.com\/wp-json\/wp\/v2\/tags?post=9060"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}