{"id":383303,"date":"2024-04-23T06:38:00","date_gmt":"2024-04-23T12:38:00","guid":{"rendered":"https:\/\/quantixed.org\/?p=3269"},"modified":"2024-04-23T06:38:00","modified_gmt":"2024-04-23T12:38:00","slug":"prehistoric-when-do-authors-preprint-their-papers","status":"publish","type":"post","link":"https:\/\/www.r-bloggers.com\/2024\/04\/prehistoric-when-do-authors-preprint-their-papers\/","title":{"rendered":"Prehistoric: when do authors preprint their papers?"},"content":{"rendered":"<!-- \r\n<div style=\"min-height: 30px;\">\r\n[social4i size=\"small\" align=\"align-left\"]\r\n<\/div>\r\n-->\r\n\r\n<div style=\"border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;\">\r\n[This article was first published on  <strong><a href=\"https:\/\/quantixed.org\/2024\/04\/23\/prehistoric-when-do-authors-preprint-their-papers\/\"> Rstats \u2013 quantixed<\/a><\/strong>, and kindly contributed to <a href=\"https:\/\/www.r-bloggers.com\/\" rel=\"nofollow\">R-bloggers<\/a>].  (You can report issue about the content on this page <a href=\"https:\/\/www.r-bloggers.com\/contact-us\/\">here<\/a>)\r\n<hr>Want to share your content on R-bloggers?<a href=\"https:\/\/www.r-bloggers.com\/add-your-blog\/\" rel=\"nofollow\"> click here<\/a> if you have a blog, or <a href=\"http:\/\/r-posts.com\/\" rel=\"nofollow\"> here<\/a> if you don't.\r\n<\/div>\n\n<p>Previously, I took advantage of a dataset that linked preprints to their published counterparts to look at the <a href=\"https:\/\/quantixed.org\/2024\/03\/09\/pre-self-what-fraction-of-a-journals-papers-are-preprinted\/\" data-type=\"post\" data-id=\"3204\" rel=\"nofollow\" target=\"_blank\">fraction of papers in a journal that are preprinted<\/a>. This linkage can be used to answer other interesting questions. Such as: <strong>when do authors preprint their papers relative to submission?<\/strong> And does this differ by journal?<\/p>\n\n\n\n<p>There\u2019s a bit of preamble. If you just want to know the answer, click <a href=\"https:\/\/quantixed.org\/2024\/04\/23\/prehistoric-when-do-authors-preprint-their-papers\/#answer\" rel=\"nofollow\" target=\"_blank\">here<\/a>. If you want to see the code, click <a href=\"https:\/\/quantixed.org\/2024\/04\/23\/prehistoric-when-do-authors-preprint-their-papers\/#code\" rel=\"nofollow\" target=\"_blank\">here<\/a>.<\/p>\n\n\n\n<p>For each paper, we can extract from PubMed the \u201creceived\u201d date and the \u201caccepted\u201d date. Because we have linked published papers to preprints, we also know the date when the preprint of the paper was first posted. Subtracting this date from the received date, we get something we\u2019ll call \u201cpretime\u201d.<\/p>\n\n\n\n<p>Now let\u2019s plot the Pretime versus the Received to Accepted time.<\/p>\n\n\n\n<figure data-wp-context=\"{\"uploadedSrc\":\"https:\\\/\\\/quantixed.org\\\/wp-content\\\/uploads\\\/2024\\\/03\\\/pretime.png\",\"figureClassNames\":\"wp-block-image size-large\",\"figureStyles\":null,\"imgClassNames\":\"wp-image-3270\",\"imgStyles\":null,\"targetWidth\":3000,\"targetHeight\":1500,\"scaleAttr\":false,\"ariaLabel\":\"Enlarge image\",\"alt\":\"\"}\" data-wp-interactive=\"core\/image\" class=\"wp-block-image size-large wp-lightbox-container\"><img loading=\"lazy\" fetchpriority=\"high\" decoding=\"async\" width=\"450\" data-attachment-id=\"3270\" data-permalink=\"https:\/\/quantixed.org\/2024\/04\/23\/prehistoric-when-do-authors-preprint-their-papers\/pretime\/\" data-orig-file=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime.png?fit=3000%2C1500&#038;ssl=1\" data-orig-size=\"3000,1500\" data-comments-opened=\"1\" data-image-meta=\"{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}\" data-image-title=\"pretime\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime.png?fit=300%2C150&#038;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime.png?fit=640%2C320&#038;ssl=1\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime.png?resize=640%2C320&#038;ssl=1\" alt=\"\" class=\"wp-image-3270\" srcset_temp=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime.png?resize=1024%2C512&#038;ssl=1 1024w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime.png?resize=300%2C150&#038;ssl=1 300w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime.png?resize=768%2C384&#038;ssl=1 768w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime.png?resize=1536%2C768&#038;ssl=1 1536w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime.png?resize=2048%2C1024&#038;ssl=1 2048w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime.png?w=1280&#038;ssl=1 1280w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime.png?w=1920&#038;ssl=1 1920w\" sizes=\"(max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge image\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"context.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"context.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n\n\n\n<p>In the plots above, we see ~3 years of a paper\u2019s journey to acceptance. Let\u2019s zoom in a bit to look at the first year. <\/p>\n\n\n\n<figure data-wp-context=\"{\"uploadedSrc\":\"https:\\\/\\\/quantixed.org\\\/wp-content\\\/uploads\\\/2024\\\/03\\\/pretime_1yr.png\",\"figureClassNames\":\"wp-block-image size-large\",\"figureStyles\":null,\"imgClassNames\":\"wp-image-3271\",\"imgStyles\":null,\"targetWidth\":3000,\"targetHeight\":1500,\"scaleAttr\":false,\"ariaLabel\":\"Enlarge image\",\"alt\":\"\"}\" data-wp-interactive=\"core\/image\" class=\"wp-block-image size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"450\" data-attachment-id=\"3271\" data-permalink=\"https:\/\/quantixed.org\/2024\/04\/23\/prehistoric-when-do-authors-preprint-their-papers\/pretime_1yr\/\" data-orig-file=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime_1yr.png?fit=3000%2C1500&#038;ssl=1\" data-orig-size=\"3000,1500\" data-comments-opened=\"1\" data-image-meta=\"{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}\" data-image-title=\"pretime_1yr\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime_1yr.png?fit=300%2C150&#038;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime_1yr.png?fit=640%2C320&#038;ssl=1\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime_1yr.png?resize=640%2C320&#038;ssl=1\" alt=\"\" class=\"wp-image-3271\" srcset_temp=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime_1yr.png?resize=1024%2C512&#038;ssl=1 1024w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime_1yr.png?resize=300%2C150&#038;ssl=1 300w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime_1yr.png?resize=768%2C384&#038;ssl=1 768w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime_1yr.png?resize=1536%2C768&#038;ssl=1 1536w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime_1yr.png?resize=2048%2C1024&#038;ssl=1 2048w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime_1yr.png?w=1280&#038;ssl=1 1280w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime_1yr.png?w=1920&#038;ssl=1 1920w\" sizes=\"(max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge image\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"context.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"context.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n\n\n\n<p>What does this mean? To help interpret the plot, here\u2019s a key:<\/p>\n\n\n\n<figure data-wp-context=\"{\"uploadedSrc\":\"https:\\\/\\\/quantixed.org\\\/wp-content\\\/uploads\\\/2024\\\/03\\\/pretime-copy.png\",\"figureClassNames\":\"wp-block-image size-large\",\"figureStyles\":null,\"imgClassNames\":\"wp-image-3272\",\"imgStyles\":null,\"targetWidth\":2606,\"targetHeight\":920,\"scaleAttr\":false,\"ariaLabel\":\"Enlarge image\",\"alt\":\"\"}\" data-wp-interactive=\"core\/image\" class=\"wp-block-image size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"450\" data-attachment-id=\"3272\" data-permalink=\"https:\/\/quantixed.org\/2024\/04\/23\/prehistoric-when-do-authors-preprint-their-papers\/pretime-copy\/\" data-orig-file=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime-copy.png?fit=2606%2C920&#038;ssl=1\" data-orig-size=\"2606,920\" data-comments-opened=\"1\" data-image-meta=\"{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}\" data-image-title=\"pretime-copy\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime-copy.png?fit=300%2C106&#038;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime-copy.png?fit=640%2C226&#038;ssl=1\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime-copy.png?resize=640%2C226&#038;ssl=1\" alt=\"\" class=\"wp-image-3272\" srcset_temp=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime-copy.png?resize=1024%2C362&#038;ssl=1 1024w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime-copy.png?resize=300%2C106&#038;ssl=1 300w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime-copy.png?resize=768%2C271&#038;ssl=1 768w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime-copy.png?resize=1536%2C542&#038;ssl=1 1536w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime-copy.png?resize=2048%2C723&#038;ssl=1 2048w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime-copy.png?w=1280&#038;ssl=1 1280w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/03\/pretime-copy.png?w=1920&#038;ssl=1 1920w\" sizes=\"(max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge image\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"context.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"context.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n\n\n\n<p>There are four categories; manuscripts are posted to bioRxiv:<\/p>\n\n\n\n<ul>\n<li>Prior to submission<\/li>\n\n\n\n<li>Approximately at submission time<\/li>\n\n\n\n<li>After submission<\/li>\n\n\n\n<li>After acceptance<\/li>\n<\/ul>\n\n\n\n<p>Note that we are looking at the final journal destination for each paper, which might not be the first place a paper is submitted. It\u2019s likely that papers posted prior to submission, especially those with long pretimes, were submitted elsewhere first; rather than the authors posting their work early for the purpose of gathering feedback before a first submission. All journals have such papers, not just the sibling journals like Nature Communications and Cell Reports, which were created to capture papers following rejection from other titles.<\/p>\n\n\n\n<p>The plots indicate that many papers are preprinted at the same time as submission. There are also a surprising number preprinted after submission. Very few preprints are posted after acceptance, for obvious reasons.<\/p>\n\n\n\n<p>To simplify things, we can classify preprints with pretimes of -7 to 30 days as those papers preprinted at submission. Papers with less than are post-submission, those with more are pre-submission.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>Category<\/td><td>Papers<\/td><td>Percentage<\/td><\/tr><tr><td>Pre-submission<\/td><td>7025<\/td><td>33.9%<\/td><\/tr><tr><td>On-submission<\/td><td>8286<\/td><td>40.0%<\/td><\/tr><tr><td>Post-submission<\/td><td>5399<\/td><td>26.1%<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"answer\">The answer<\/h3>\n\n\n\n<p>The analysis shows that generally, <strong>most authors preprint their work around the moment of submission<\/strong>.<\/p>\n\n\n\n<p>Let\u2019s look at how these fractions breakdown at each journal.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"450\" data-attachment-id=\"3277\" data-permalink=\"https:\/\/quantixed.org\/2024\/04\/23\/prehistoric-when-do-authors-preprint-their-papers\/preprint_status\/\" data-orig-file=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status.png?fit=3000%2C1500&#038;ssl=1\" data-orig-size=\"3000,1500\" data-comments-opened=\"1\" data-image-meta=\"{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}\" data-image-title=\"preprint_status\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status.png?fit=300%2C150&#038;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status.png?fit=640%2C320&#038;ssl=1\" src=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status.png?resize=640%2C320&#038;ssl=1\" alt=\"\" class=\"wp-image-3277\" srcset_temp=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status.png?resize=1024%2C512&#038;ssl=1 1024w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status.png?resize=300%2C150&#038;ssl=1 300w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status.png?resize=768%2C384&#038;ssl=1 768w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status.png?resize=1536%2C768&#038;ssl=1 1536w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status.png?resize=2048%2C1024&#038;ssl=1 2048w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status.png?w=1280&#038;ssl=1 1280w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status.png?w=1920&#038;ssl=1 1920w\" sizes=\"(max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><\/figure>\n\n\n\n<p>The fraction of papers preprinted upon submission is largest at several journals including Biochem J, Development, EMBO J etc. If we consider that many of the pre-submission preprints were posted around the submission time to a preceding journal, then <strong>preprinting upon submission is the most likely behaviour<\/strong>.<\/p>\n\n\n\n<p>The fraction of papers posted <em>after<\/em> submission is a minority activity but it is sizeable at some journals, notably Nature Cell Biol and Dev Cell. Possible reasons why authors may only choose to post after submission (in some cases many months later) might include: a belief that preprinting may cause desk rejection, only preprinting after the paper has gone out to review, or authors getting twitchy about priority during a lengthy peer review process.<\/p>\n\n\n\n<p>We can break down the data by year of publication to see that the patterns are fairly consistent over time.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"450\" data-attachment-id=\"3278\" data-permalink=\"https:\/\/quantixed.org\/2024\/04\/23\/prehistoric-when-do-authors-preprint-their-papers\/preprint_status_facet\/\" data-orig-file=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status_facet.png?fit=3000%2C1500&#038;ssl=1\" data-orig-size=\"3000,1500\" data-comments-opened=\"1\" data-image-meta=\"{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}\" data-image-title=\"preprint_status_facet\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status_facet.png?fit=300%2C150&#038;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status_facet.png?fit=640%2C320&#038;ssl=1\" src=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status_facet.png?resize=640%2C320&#038;ssl=1\" alt=\"\" class=\"wp-image-3278\" srcset_temp=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status_facet.png?resize=1024%2C512&#038;ssl=1 1024w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status_facet.png?resize=300%2C150&#038;ssl=1 300w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status_facet.png?resize=768%2C384&#038;ssl=1 768w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status_facet.png?resize=1536%2C768&#038;ssl=1 1536w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status_facet.png?resize=2048%2C1024&#038;ssl=1 2048w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status_facet.png?w=1280&#038;ssl=1 1280w, https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2024\/04\/preprint_status_facet.png?w=1920&#038;ssl=1 1920w\" sizes=\"(max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Caveats<\/h3>\n\n\n\n<p>Any analysis like this is limited by the available data. First, the \u201creceived\u201d date on PubMed may not be accurate. A journal may \u201creset the clock\u201d on a submission and thereby make it appear that the preprint had been posted prior to submission when it may have actually been submitted to the publishing journal at the time of posting.<\/p>\n\n\n\n<p>This analysis is also limited to:<\/p>\n\n\n\n<ul>\n<li>papers that were preprinted on bioRxiv<\/li>\n\n\n\n<li>papers for which we had complete data (the PubMed data is missing for some journals)<\/li>\n\n\n\n<li>a subset of journals \u2013 other journal data can be retrieved by tweaking the code<\/li>\n<\/ul>\n\n\n\n<p>To reiterate that the analysis is limited to papers where the authors actually posted a preprint. At many of the journals analysed here, over half of the authors still choose not to preprint their work!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"code\">The code<\/h2>\n\n\n\n<p>This R script is quite long and has a few dependencies from my earlier <a href=\"https:\/\/quantixed.org\/2024\/03\/09\/pre-self-what-fraction-of-a-journals-papers-are-preprinted\/\" data-type=\"post\" data-id=\"3204\" rel=\"nofollow\" target=\"_blank\">post<\/a>. Crunching through the xml files and through the bioRxiv dois to get the submission dates is sped up using parallel processing (on Mac\/linux).<\/p>\n\n\n<pre>\nlibrary(dplyr)\nlibrary(ggplot2)\nlibrary(doParallel)\nlibrary(foreach)\nlibrary(rbiorxiv)\n\n## some pre-requisite files required for this script\n# preprint - paper relationships\ndf_all &lt;- read.csv(&quot;Data\/crossref-preprint-article-relationships-Aug-2023.csv&quot;)\n# code to extract data from Pubmed XML files\nsource(&quot;Script\/pubmedXML.R&quot;)\n# previously downloaded Pubmed XML files in the Data directory\nxml_files &lt;- list.files(&quot;Data&quot;, pattern = &quot;*.xml&quot;, full.names = TRUE)\n\n# setup parallel backend\ncores &lt;- detectCores()\ncl &lt;- makeCluster(cores[1] - 1) #not to overload your computer\nregisterDoParallel(cl)\n\npprs &lt;- foreach(i = 1 : seq_along(xml_files), .combine = rbind) %dopar% {\n  tempdf &lt;- extract_xml(xml_files[i])\n}\n\n# stop cluster\nstopCluster(cl)\n\n# remove duplicates\npprs &lt;- pprs[!duplicated(pprs$pmid), ]\n\n# remove unwanted publication types by using a vector of strings\nunwanted &lt;- c(&quot;Review&quot;, &quot;Comment&quot;, &quot;Retracted Publication&quot;,\n              &quot;Retraction of Publication&quot;, &quot;Editorial&quot;, &quot;Autobiography&quot;,\n              &quot;Biography&quot;, &quot;Historical&quot;, &quot;Published Erratum&quot;,\n              &quot;Expression of Concern&quot;, &quot;Editorial&quot;)\n# subset pprs to remove unwanted publication types using grepl, call this &quot;pure&quot;\npure &lt;- pprs[!grepl(paste(unwanted, collapse = &quot;|&quot;), pprs$ptype), ]\n# ensure that ptype contains &quot;Journal Article&quot;\npure &lt;- pure[grepl(&quot;Journal Article&quot;, pure$ptype), ]\n# remove papers with &quot;NA NA&quot; as the sole author\npure &lt;- pure[!grepl(&quot;NA NA&quot;, pure$authors), ]\n\n# add factor column to pure that indicates if a row in pprs has a doi that is\n# also found in df_all$article_doi\npure$in_crossref &lt;- ifelse(tolower(pure$doi) %in%\n                             tolower(df_all$article_doi), &quot;yes&quot;, &quot;no&quot;)\n\n# lag times\npure$recacc &lt;- pure$accdate - pure$recdate\npure$recpub &lt;- pure$pubdate - pure$recdate\n\n# subset data for only in_crossref == &quot;yes&quot;\npure_yes &lt;- pure[pure$in_crossref == &quot;yes&quot;, ]\n# add column that has the preprint_doi from df_all where article_doi matches doi\npure_yes$preprint_doi &lt;- df_all$preprint_doi[match(tolower(pure_yes$doi),\n                                                   tolower(df_all$article_doi))]\n# subset for biorxiv doi, i.e. starts &quot;10.1101&quot;\npure_yes &lt;- pure_yes[grepl(&quot;10.1101&quot;, pure_yes$preprint_doi), ]\n\n# if the preprint_doi is longer than 15 characters, parse the date from the doi\n# and if it is less than 15 characters, set to NA\npure_yes$date &lt;- as.Date.numeric(ifelse(nchar(pure_yes$preprint_doi) &lt; 16,\n                                        NA,\n                                        as.Date(\n                                          substr(pure_yes$preprint_doi, 9, 18),\n                                          format = &quot;%Y.%m.%d&quot;)))\n\n# subset pure_yes for date is NA\npure_yes_na &lt;- pure_yes[is.na(pure_yes$date), ]\n\n# get the content of each preprint and assemble into large data frame\nregisterDoParallel(cl)\npreprints &lt;- foreach(i = 1:nrow(pure_yes_na),\n                     .errorhandling = &quot;pass&quot;, .multicombine = TRUE) %do% {\n                       temp &lt;- NULL\n                       temp &lt;- as.data.frame(biorxiv_content(doi = pure_yes_na$preprint_doi[i]))\n                       # subset to only include the doi, authors, title, and date; and first row only\n                       if (!is.null(temp)) {\n                         temp &lt;- temp[1, c(&quot;doi&quot;, &quot;authors&quot;, &quot;title&quot;, &quot;date&quot;)]\n                       }\n                     }\nstopCluster(cl)\n\n# the above code results in a large list of data frames, so we need to combine\n# them into one data frame. We didn&#039;t use .combine, because we wanted to remove\n# one or more of the preprints may have failed to download. The failed items do\n# not have 4 columns, so we can use ncol to check for this\n\nncol_preprints &lt;- sapply(preprints, ncol)\n# write a for loop to start at the end of the list and remove the failed items\nlist_preprints &lt;- preprints\nfor (i in rev(seq_along(list_preprints))) {\n  if (is.null(ncol_preprints[[i]])) {\n    list_preprints &lt;- list_preprints[-i]\n  }\n}\n\ndf_preprints &lt;- do.call(rbind, list_preprints)\n\n# add a column to pure_yes_na that has the date from df_preprints\npure_yes_na$date &lt;- df_preprints$date[match(tolower(pure_yes_na$preprint_doi),\n                                            tolower(df_preprints$doi))]\n# if pure_yes$date is NA, set to pure_yes_na$date\npure_yes_all &lt;- pure_yes\npure_yes_all$date &lt;- ifelse(is.na(pure_yes_all$date),\n                            as.Date(pure_yes_na$date[match(tolower(pure_yes_all$preprint_doi),\n                                                           tolower(pure_yes_na$preprint_doi))]),\n                            as.Date(pure_yes_all$date))\n# ensure date is as.Date\npure_yes_all$date &lt;- as.Date(pure_yes_all$date, format = &quot;%Y-%m-%d&quot;)\n# find pretime by subtracting the date from the recdate\npure_yes_all$pretime &lt;- pure_yes_all$recdate - pure_yes_all$date\n\npure_yes_all %&gt;% \n  filter(!is.na(pretime)) %&gt;%\n  ggplot(aes(x = as.numeric(recacc), y = as.numeric(pretime))) +\n  geom_abline(intercept = 0,\n              slope = -1, linetype = &quot;dashed&quot;, colour = &quot;#a3a3a3&quot;) +\n  geom_point(colour = &quot;#ae363b&quot;, shape = 16, size = 0.5, alpha = 0.2) +\n  theme_minimal(9) +\n  lims(x = c(0, 1000), y = c(-1000, 1000)) +\n  facet_wrap( ~ journal) +\n  labs(x = &quot;Received to Accepted (days)&quot;, y = &quot;Pretime (days)&quot;) +\n  theme(legend.position = &quot;none&quot;)\nggsave(&quot;Output\/Plots\/pretime.png&quot;,\n       width = 3000, height = 1500, dpi = 300, units = &quot;px&quot;, bg = &quot;white&quot;)  \n\npure_yes_all %&gt;% \n  filter(!is.na(pretime)) %&gt;%\n  ggplot(aes(x = as.numeric(recacc), y = as.numeric(pretime))) +\n  geom_abline(intercept = 0,\n              slope = -1, linetype = &quot;dashed&quot;, colour = &quot;#a3a3a3&quot;) +\n  geom_point(colour = &quot;#ae363b&quot;, shape = 16, size = 0.7, alpha = 0.2) +\n  theme_minimal(9) +\n  lims(x = c(0, 365), y = c(-365, 365)) +\n  facet_wrap( ~ journal) +\n  labs(x = &quot;Received to Accepted (days)&quot;, y = &quot;Pretime (days)&quot;) +\n  theme(legend.position = &quot;none&quot;)\nggsave(&quot;Output\/Plots\/pretime_1yr.png&quot;,\n       width = 3000, height = 1500, dpi = 300, units = &quot;px&quot;, bg = &quot;white&quot;)\n\n\n# pure_yes_all contains the data of interest. Let&#039;s classify the papers\n# into three categories: 1) preprinted on submission, 2) preprinted after\n# submission, and 3) preprinted prior to submission\n# To classify them, group 1 is pretime of -7 to 30 days, group 2 is pretime\n# of greater than 31 days, and group 3 is pretime of less than -7 days\n# make a factor column to classify the papers\npure_yes_all$preprint_status &lt;- ifelse(pure_yes_all$pretime &gt;= 31, &quot;Pre-submission&quot;,\n                                       ifelse(pure_yes_all$pretime &lt;= -7, &quot;Post-submission&quot;,\n                                              &quot;On-submission&quot;))\n# now summarise the fraction of papers at each journal that are in each category\nsummary_status &lt;- pure_yes_all %&gt;%\n  filter(!is.na(pretime)) %&gt;%\n  group_by(journal, preprint_status) %&gt;%\n  summarise(papers = n()) %&gt;%\n  group_by(journal) %&gt;%\n  mutate(fraction = papers \/ sum(papers))\n# order fraction so that post, on, pre submission are in the correct order\nsummary_status$preprint_status &lt;- factor(summary_status$preprint_status,\n                                         levels = c(&quot;Pre-submission&quot;, &quot;On-submission&quot;,\n                                                    &quot;Post-submission&quot;))\n\n# make a stacked bar chart to show the fraction of papers in each category\n# for each journal\n# Pre submission at the top, on submission middle and post submission at the bottom\nsummary_status %&gt;%\n  ggplot(aes(x = journal, y = fraction, fill = preprint_status)) +\n  geom_bar(stat = &quot;identity&quot;, position = &quot;stack&quot;) +\n  scale_x_discrete(guide = guide_axis(n.dodge = 2)) +\n  theme_minimal(9) +\n  labs(x = &quot;Journal&quot;, y = &quot;Fraction of papers&quot;) +\n  theme(legend.position = &quot;right&quot;,\n        legend.title = element_blank()) +\n  scale_fill_manual(values = c(&quot;#534666&quot;, &quot;#138086&quot;, &quot;#cd7672&quot;))\nggsave(&quot;Output\/Plots\/preprint_status.png&quot;,\n       width = 3000, height = 1500, dpi = 300, units = &quot;px&quot;, bg = &quot;white&quot;)\n\n# let&#039;s do the same again but only look at each journal and facet by year\npure_yes_all %&gt;%\n  filter(!is.na(pretime)) %&gt;%\n  filter(!year == &quot;2024&quot;) %&gt;% \n  group_by(journal, year, preprint_status) %&gt;%\n  summarise(papers = n()) %&gt;%\n  group_by(journal, year) %&gt;%\n  mutate(fraction = papers \/ sum(papers)) %&gt;%\n  ggplot(aes(x = year,\n             y = fraction,\n             fill = factor(preprint_status,\n                           levels = c(&quot;Pre-submission&quot;, &quot;On-submission&quot;,\n                                      &quot;Post-submission&quot;)))) +\n  geom_bar(stat = &quot;identity&quot;, position = &quot;stack&quot;) +\n  theme_minimal(9) +\n  labs(x = &quot;Journal&quot;, y = &quot;Fraction of papers&quot;) +\n  theme(legend.position = &quot;right&quot;,\n        legend.title = element_blank()) +\n  scale_fill_manual(values = c(&quot;#534666&quot;, &quot;#138086&quot;, &quot;#cd7672&quot;)) +\n  facet_wrap( ~ journal)\nggsave(&quot;Output\/Plots\/preprint_status_facet.png&quot;,\n       width = 3000, height = 1500, dpi = 300, units = &quot;px&quot;, bg = &quot;white&quot;)\n\n# generate summary stats for table (all papers with linked preprint)\nsummary_all &lt;- pure_yes_all %&gt;%\n  filter(!is.na(pretime)) %&gt;%\n  group_by(preprint_status) %&gt;%\n  summarise(papers = n()) %&gt;%\n  mutate(fraction = papers \/ sum(papers))\n<\/pre>\n\n\n<p>\u2014<\/p>\n\n\n\n<p>The post title comes from \u201cPrehistoric\u201d by Circulatory System from their \u201cCirculatory System\u201d LP. <\/p>\n\n<div style=\"border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;\">\r\n<div style=\"text-align: center;\">To <strong>leave a comment<\/strong> for the author, please follow the link and comment on their blog: <strong><a href=\"https:\/\/quantixed.org\/2024\/04\/23\/prehistoric-when-do-authors-preprint-their-papers\/\"> Rstats \u2013 quantixed<\/a><\/strong>.<\/div>\r\n<hr \/>\r\n<a href=\"https:\/\/www.r-bloggers.com\/\" rel=\"nofollow\">R-bloggers.com<\/a> offers <strong><a href=\"https:\/\/feedburner.google.com\/fb\/a\/mailverify?uri=RBloggers\" rel=\"nofollow\">daily e-mail updates<\/a><\/strong> about <a title=\"The R Project for Statistical Computing\" href=\"https:\/\/www.r-project.org\/\" rel=\"nofollow\">R<\/a> news and tutorials about <a title=\"R tutorials\" href=\"https:\/\/www.r-bloggers.com\/how-to-learn-r-2\/\" rel=\"nofollow\">learning R<\/a> and many other topics. <a title=\"Data science jobs\" href=\"https:\/\/www.r-users.com\/\" rel=\"nofollow\">Click here if you're looking to post or find an R\/data-science job<\/a>.\r\n\r\n<hr>Want to share your content on R-bloggers?<a href=\"https:\/\/www.r-bloggers.com\/add-your-blog\/\" rel=\"nofollow\"> click here<\/a> if you have a blog, or <a href=\"http:\/\/r-posts.com\/\" rel=\"nofollow\"> here<\/a> if you don't.\r\n<\/div>","protected":false},"excerpt":{"rendered":"<div style = \"width:60%; display: inline-block; float:left; \"> Previously, I took advantage of a dataset that linked preprints to their published counterparts to look at the fraction of papers in a journal that are preprinted. This linkage can be used to answer other interesting questions. Such as: when do authors preprint their papers relative to submission? And does &#8230;<\/div>\n<div style = \"width: 40%; display: inline-block; float:right;\"><\/div>\n<div style=\"clear: both;\"><\/div>\n","protected":false},"author":2887,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[4],"tags":[],"aioseo_notices":[],"jetpack-related-posts":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/posts\/383303"}],"collection":[{"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/users\/2887"}],"replies":[{"embeddable":true,"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/comments?post=383303"}],"version-history":[{"count":393,"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/posts\/383303\/revisions"}],"predecessor-version":[{"id":398029,"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/posts\/383303\/revisions\/398029"}],"wp:attachment":[{"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/media?parent=383303"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/categories?post=383303"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/tags?post=383303"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}