{"id":400597,"date":"2026-04-16T06:04:00","date_gmt":"2026-04-16T12:04:00","guid":{"rendered":"https:\/\/quantixed.org\/?p=3753"},"modified":"2026-04-16T06:04:00","modified_gmt":"2026-04-16T12:04:00","slug":"my-domain-proteome-wide-scanning-of-tmds","status":"publish","type":"post","link":"https:\/\/www.r-bloggers.com\/2026\/04\/my-domain-proteome-wide-scanning-of-tmds\/","title":{"rendered":"My Domain: proteome-wide scanning of TMDs"},"content":{"rendered":"<!-- \r\n<div style=\"min-height: 30px;\">\r\n[social4i size=\"small\" align=\"align-left\"]\r\n<\/div>\r\n-->\r\n\r\n<div style=\"border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;\">\r\n[This article was first published on  <strong><a href=\"https:\/\/quantixed.org\/2026\/04\/16\/my-domain-proteome-wide-scanning-of-tmds\/\"> Rstats \u2013 quantixed<\/a><\/strong>, and kindly contributed to <a href=\"https:\/\/www.r-bloggers.com\/\" rel=\"nofollow\">R-bloggers<\/a>].  (You can report issue about the content on this page <a href=\"https:\/\/www.r-bloggers.com\/contact-us\/\">here<\/a>)\r\n<hr>Want to share your content on R-bloggers?<a href=\"https:\/\/www.r-bloggers.com\/add-your-blog\/\" rel=\"nofollow\"> click here<\/a> if you have a blog, or <a href=\"http:\/\/r-posts.com\/\" rel=\"nofollow\"> here<\/a> if you don't.\r\n<\/div>\n\n<p>I wanted to know:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>how many proteins in the human proteome have transmembrane domains?<\/li>\n\n\n\n<li>of those that do, how many have 1 or 2 or n transmembrane domains?<\/li>\n<\/ol>\n\n\n\n<p>After a little bit of searching, I couldn\u2019t find any answers. So I decided to use R to retrieve the necessary info from Uniprot and calculate it myself. I thought I\u2019d post it here in case it\u2019s useful for others.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Human<\/h2>\n\n\n\n<figure data-wp-context=\"{\"imageId\":\"69e0e19542d22\"}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69e0e19542d22\" class=\"wp-block-image size-large wp-lightbox-container\"><img loading=\"lazy\" fetchpriority=\"high\" decoding=\"async\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/i2.wp.com\/quantixed.org\/wp-content\/uploads\/2026\/04\/human_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1\" alt=\"\" class=\"wp-image-3754\" srcset_temp=\"https:\/\/i2.wp.com\/quantixed.org\/wp-content\/uploads\/2026\/04\/human_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1 1024w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/human_uniprot_tm_counts-300x169.png 300w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/human_uniprot_tm_counts-768x432.png 768w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/human_uniprot_tm_counts-1536x864.png 1536w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/human_uniprot_tm_counts.png 1600w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" data-recalc-dims=\"1\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n\n\n\n<p>We\u2019ll start with the info I wanted. According to Uniprot there are 20,659 proteins in the human proteome. <strong>One quarter of these have one or more TMD<\/strong>. The majority have one TMD and there are almost 1,000 7TM proteins (all those GPCRs, I guess). There\u2019s 413 4TM and 327 2TM proteins. We can find examples of 1 through 17 TMDs, there\u2019s no proteins with 18, 4 proteins with 19TM, 21 with 24TM and 2 with 38TM.<\/p>\n\n\n\n<p>The analysis is done simply by looking at how many <code>TRANSMEM<\/code> Uniprot has for each IDs in the reference proteome. I have not distinguished between helical and partial entries, and of course it\u2019s possible that the annotations are not quite correct.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>TMDs<\/td><td>Count<\/td><td>Frequency (proteome as %)<\/td><td>Frequency (TMDPs as %)<\/td><\/tr><tr><td>1<\/td><td>2402<\/td><td>11.6<\/td><td>45.8<\/td><\/tr><tr><td>2<\/td><td>327<\/td><td>1.6<\/td><td>6.2<\/td><\/tr><tr><td>3<\/td><td>159<\/td><td>0.8<\/td><td>3.0<\/td><\/tr><tr><td>4<\/td><td>413<\/td><td>2.0<\/td><td>7.9<\/td><\/tr><tr><td>5<\/td><td>77<\/td><td>0.4<\/td><td>1.5<\/td><\/tr><tr><td>6<\/td><td>276<\/td><td>1.3<\/td><td>5.3<\/td><\/tr><tr><td>7<\/td><td>947<\/td><td>4.6<\/td><td>18.0<\/td><\/tr><tr><td>8<\/td><td>83<\/td><td>0.4<\/td><td>1.6<\/td><\/tr><tr><td>9<\/td><td>63<\/td><td>0.3<\/td><td>1.2<\/td><\/tr><tr><td>10<\/td><td>123<\/td><td>0.6<\/td><td>2.3<\/td><\/tr><tr><td>11<\/td><td>75<\/td><td>0.4<\/td><td>1.4<\/td><\/tr><tr><td>12<\/td><td>202<\/td><td>1.0<\/td><td>3.8<\/td><\/tr><tr><td>13<\/td><td>24<\/td><td>0.1<\/td><td>0.5<\/td><\/tr><tr><td>14<\/td><td>26<\/td><td>0.1<\/td><td>0.5<\/td><\/tr><tr><td>15<\/td><td>13<\/td><td>0.1<\/td><td>0.2<\/td><\/tr><tr><td>16<\/td><td>1<\/td><td>0.0<\/td><td>0.0<\/td><\/tr><tr><td>17<\/td><td>9<\/td><td>0.0<\/td><td>0.2<\/td><\/tr><tr><td>19<\/td><td>4<\/td><td>0.0<\/td><td>0.1<\/td><\/tr><tr><td>24<\/td><td>21<\/td><td>0.1<\/td><td>0.4<\/td><\/tr><tr><td>38<\/td><td>2<\/td><td>0.0<\/td><td>0.0<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Having written this code, I decided to run some other proteomes to see how they compare.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Model organisms<\/h2>\n\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-2 is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure data-wp-context=\"{\"imageId\":\"69e0e19546faa\"}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69e0e19546faa\" class=\"wp-block-image size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" data-id=\"3757\" src=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2026\/04\/drosophila_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1\" alt=\"\" class=\"wp-image-3757\" srcset_temp=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2026\/04\/drosophila_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1 1024w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/drosophila_uniprot_tm_counts-300x169.png 300w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/drosophila_uniprot_tm_counts-768x432.png 768w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/drosophila_uniprot_tm_counts-1536x864.png 1536w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/drosophila_uniprot_tm_counts.png 1600w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" data-recalc-dims=\"1\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n\n\n\n<figure data-wp-context=\"{\"imageId\":\"69e0e1954959d\"}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69e0e1954959d\" class=\"wp-block-image size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" data-id=\"3756\" src=\"https:\/\/i2.wp.com\/quantixed.org\/wp-content\/uploads\/2026\/04\/worm_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1\" alt=\"\" class=\"wp-image-3756\" srcset_temp=\"https:\/\/i2.wp.com\/quantixed.org\/wp-content\/uploads\/2026\/04\/worm_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1 1024w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/worm_uniprot_tm_counts-300x169.png 300w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/worm_uniprot_tm_counts-768x432.png 768w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/worm_uniprot_tm_counts-1536x864.png 1536w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/worm_uniprot_tm_counts.png 1600w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" data-recalc-dims=\"1\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n\n\n\n<figure data-wp-context=\"{\"imageId\":\"69e0e19549b5f\"}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69e0e19549b5f\" class=\"wp-block-image size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" data-id=\"3758\" src=\"https:\/\/i2.wp.com\/quantixed.org\/wp-content\/uploads\/2026\/04\/yeast_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1\" alt=\"\" class=\"wp-image-3758\" srcset_temp=\"https:\/\/i2.wp.com\/quantixed.org\/wp-content\/uploads\/2026\/04\/yeast_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1 1024w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/yeast_uniprot_tm_counts-300x169.png 300w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/yeast_uniprot_tm_counts-768x432.png 768w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/yeast_uniprot_tm_counts-1536x864.png 1536w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/yeast_uniprot_tm_counts.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" data-recalc-dims=\"1\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n\n\n\n<figure data-wp-context=\"{\"imageId\":\"69e0e19549fff\"}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69e0e19549fff\" class=\"wp-block-image size-large wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" data-id=\"3755\" src=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2026\/04\/zebrafish_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1\" alt=\"\" class=\"wp-image-3755\" srcset_temp=\"https:\/\/i0.wp.com\/quantixed.org\/wp-content\/uploads\/2026\/04\/zebrafish_uniprot_tm_counts-1024x576.png?w=450&#038;ssl=1 1024w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/zebrafish_uniprot_tm_counts-300x169.png 300w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/zebrafish_uniprot_tm_counts-768x432.png 768w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/zebrafish_uniprot_tm_counts-1536x864.png 1536w, https:\/\/quantixed.org\/wp-content\/uploads\/2026\/04\/zebrafish_uniprot_tm_counts.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" data-recalc-dims=\"1\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/figure>\n\n\n\n<p>These four organisms have between 18% and 29% of the proteome made of proteins with TMDs. The pattern of numbers of TMDs are kind of similar although there\u2019s no peak in yeast for 7TM and the peaks for 2, 4, 6 or 12 TMs differ from human.<\/p>\n\n\n\n<p>Maybe this information is out there in some database or other. As I said, I couldn\u2019t find something easily. Even if there are more precise ways of determining the TMDs, I think this data is good enough to know roughly what the proportions are.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The code<\/h2>\n\n\n\n<p>I manually downloaded the fasta.gz files for the reference proteomes. They are currently linked <a href=\"https:\/\/www.uniprot.org\/proteomes?query=proteome_type%3A1\" rel=\"nofollow\" target=\"_blank\">here<\/a>.<\/p>\n\n\n\n<p>To extract all the Uniprot IDs, I used a shell one-liner:<\/p>\n\n\n<pre>\nawk &#039;\/^&gt;sp\\|.*\\|\/{gsub(\/^&gt;sp\\|\/,&quot;&quot;); gsub(\/\\|.*\/,&quot;&quot;); print &quot;&gt;&quot; $0; next} {print}&#039; file.fasta | grep &quot;^&gt;&quot; | sed &#039;s\/&gt;\/\/g&#039; &gt; species_uniprot.txt\n<\/pre>\n\n\n<p>Then I used this R script. The main function can probably be simplified. I had to add several checks to make sure I got all the data back from the API. Before posting, I tried to cut it back to make it easier to read, but Ionly succeeded in breaking the script! This is the working version.<\/p>\n\n\n<pre>\n# if (!require(&quot;BiocManager&quot;, quietly = TRUE)) {\n#   install.packages(&quot;BiocManager&quot;) \n# }\n# BiocManager::install(&quot;biomaRt&quot;)\nlibrary(httr)\nlibrary(stringr)\nlibrary(ggplot2)\nlibrary(biomaRt)\nlibrary(dplyr)\nlibrary(tidyr)\nlibrary(cowplot)\n\n## FUNCTIONS ----\n\nisJobReady &lt;- function(jobId, pollingInterval = 5, maxWaitSeconds = 3600) {\n  if (is.null(jobId) || length(jobId) == 0 || is.na(jobId) || !nzchar(jobId)) {\n    return(FALSE)\n  }\n  nTries &lt;- ceiling(maxWaitSeconds \/ pollingInterval)\n  for (i in 1:nTries) {\n    url &lt;- paste(&quot;https:\/\/rest.uniprot.org\/idmapping\/status\/&quot;, jobId, sep = &quot;&quot;)\n    r &lt;- GET(url = url, accept_json())\n    status &lt;- content(r, as = &quot;parsed&quot;)\n    if (!is.null(status[[&quot;results&quot;]]) || !is.null(status[[&quot;failedIds&quot;]])) {\n      return(TRUE)\n    }\n    if (!is.null(status[[&quot;messages&quot;]])) {\n      print(status[[&quot;messages&quot;]])\n      return(FALSE)\n    }\n    Sys.sleep(pollingInterval)\n  }\n  return(FALSE)\n}\n\nretrieveUniprotInfo &lt;- function(x,\n                                chunk_size = 5000,\n                                maxWaitSeconds = 3600,\n                                taxId = &quot;9606&quot;,\n                                progress = TRUE) {\n  normalize_uniprot_ids &lt;- function(values) {\n    values &lt;- trimws(values)\n    # Accept FASTA-style headers like: sp|P12345|... or tr|A0A...|...\n    m &lt;- str_match(values, regex(&quot;^&gt;?\\\\s*(?:sp|tr)\\\\|([^|]+)\\\\|&quot;, ignore_case = TRUE))\n    values &lt;- ifelse(!is.na(m[, 2]), m[, 2], values)\n    values\n  }\n  \n  ids &lt;- unique(normalize_uniprot_ids(x))\n  ids &lt;- ids[!is.na(ids) &#038; nzchar(ids)]\n  if (length(ids) == 0) {\n    stop(&quot;No valid identifiers were provided to retrieveUniprotInfo().&quot;)\n  }\n  \n  fields &lt;- &quot;accession,id,protein_name,gene_names,ft_transmem,length,cc_function,cc_subcellular_location,go_p,go_c&quot;\n  acc_pattern &lt;- &quot;^[OPQ][0-9][A-Z0-9]{3}[0-9](-[0-9]+)?$|^[A-NR-Z][0-9](?:[A-Z][A-Z0-9]{2}[0-9]){1,2}(-[0-9]+)?$&quot;\n  is_accession &lt;- str_detect(ids, acc_pattern)\n  \n  split_into_chunks &lt;- function(values, chunk_size = chunk_size) {\n    split(values, ceiling(seq_along(values) \/ chunk_size))\n  }\n  \n  get_next_link &lt;- function(link_header) {\n    if (is.null(link_header)) {\n      return(NULL)\n    }\n    links &lt;- unlist(strsplit(link_header, &quot;,\\\\s*&quot;))\n    next_link &lt;- links[str_detect(links, &quot;rel=\\\\\\&quot;next\\\\\\&quot;&quot;)]\n    if (length(next_link) == 0) {\n      return(NULL)\n    }\n    next_url &lt;- str_extract(next_link[1], &quot;(?&lt;=&lt;).+?(?=&gt;)&quot;)\n    if (is.na(next_url) || !nzchar(next_url)) {\n      return(NULL)\n    }\n    next_url\n  }\n  \n  read_tsv_response &lt;- function(resp) {\n    read.table(\n      text = content(resp, as = &quot;text&quot;, encoding = &quot;UTF-8&quot;),\n      sep = &quot;\\t&quot;,\n      header = TRUE,\n      fill = TRUE,\n      quote = &quot;&quot;,\n      comment.char = &quot;&quot;,\n      check.names = FALSE\n    )\n  }\n  \n  fetch_from_redirect &lt;- function(redirect_url) {\n    if (is.null(redirect_url) || length(redirect_url) == 0 ||\n        is.na(redirect_url) || !nzchar(redirect_url)) {\n      return(NULL)\n    }\n    \n    # The paged idmapping results endpoint is capped at size &lt;= 500.\n    # Use stream endpoint to retrieve the full chunk in one response.\n    stream_url &lt;- gsub(&quot;\/results\/&quot;, &quot;\/results\/stream\/&quot;, redirect_url)\n    sep &lt;- ifelse(str_detect(stream_url, &quot;\\\\?&quot;), &quot;&#038;&quot;, &quot;?&quot;)\n    stream_url &lt;- paste0(\n      stream_url,\n      sep,\n      &quot;fields=&quot;, URLencode(fields, reserved = TRUE),\n      &quot;&#038;format=tsv&quot;\n    )\n    \n    r &lt;- GET(url = stream_url)\n    if (status_code(r) &lt; 400) {\n      return(read_tsv_response(r))\n    }\n    \n    # Fallback to paged endpoint if stream is unavailable.\n    sep &lt;- ifelse(str_detect(redirect_url, &quot;\\\\?&quot;), &quot;&#038;&quot;, &quot;?&quot;)\n    url &lt;- paste0(\n      redirect_url,\n      sep,\n      &quot;fields=&quot;, URLencode(fields, reserved = TRUE),\n      &quot;&#038;format=tsv&#038;size=500&quot;\n    )\n    \n    r &lt;- GET(url = url)\n    stop_for_status(r)\n    resultsTable &lt;- read_tsv_response(r)\n    \n    next_url &lt;- get_next_link(headers(r)[[&quot;link&quot;]])\n    while (!is.null(next_url) &#038;&#038; !is.na(next_url) &#038;&#038; nzchar(next_url)) {\n      r &lt;- GET(url = next_url)\n      stop_for_status(r)\n      resultsTable &lt;- rbind(resultsTable, read_tsv_response(r))\n      next_url &lt;- get_next_link(headers(r)[[&quot;link&quot;]])\n    }\n    \n    resultsTable\n  }\n  \n  map_ids &lt;- function(values, from_db, to_db, chunk_size, taxId = NULL,\n                      label = &quot;ids&quot;) {\n    if (length(values) == 0) {\n      return(NULL)\n    }\n    \n    results_list &lt;- list()\n    chunks &lt;- split_into_chunks(values, chunk_size = chunk_size)\n    n_chunks &lt;- length(chunks)\n    for (i in seq_along(chunks)) {\n      chunk &lt;- chunks[[i]]\n      if (isTRUE(progress)) {\n        cat(sprintf(&quot;[UniProt] %s chunk %d\/%d (%d ids)\\n&quot;,\n                    label, i, n_chunks, length(chunk)))\n      }\n      \n      files &lt;- list(\n        ids = paste0(chunk, collapse = &quot;,&quot;),\n        from = from_db,\n        to = to_db\n      )\n      if (!is.null(taxId)) {\n        files$taxId &lt;- taxId\n      }\n      \n      r &lt;- POST(url = &quot;https:\/\/rest.uniprot.org\/idmapping\/run&quot;, body = files,\n                encode = &quot;multipart&quot;, accept_json())\n      stop_for_status(r)\n      submission &lt;- content(r, as = &quot;parsed&quot;, encoding = &quot;UTF-8&quot;)\n      \n      job_id &lt;- submission[[&quot;jobId&quot;]]\n      if (is.null(job_id) || length(job_id) == 0 || is.na(job_id) || !nzchar(job_id)) {\n        if (isTRUE(progress)) {\n          cat(sprintf(&quot;[UniProt] %s chunk %d\/%d: no jobId returned\\n&quot;,\n                      label, i, n_chunks))\n        }\n        next\n      }\n      if (!isJobReady(job_id, maxWaitSeconds = maxWaitSeconds)) {\n        if (isTRUE(progress)) {\n          cat(sprintf(&quot;[UniProt] %s chunk %d\/%d: timeout\/not ready\\n&quot;,\n                      label, i, n_chunks))\n        }\n        next\n      }\n      \n      details_url &lt;- paste(&quot;https:\/\/rest.uniprot.org\/idmapping\/details\/&quot;,\n                           job_id, sep = &quot;&quot;)\n      r &lt;- GET(url = details_url, accept_json())\n      stop_for_status(r)\n      details &lt;- content(r, as = &quot;parsed&quot;, encoding = &quot;UTF-8&quot;)\n      \n      redirect_url &lt;- details[[&quot;redirectURL&quot;]]\n      if (is.null(redirect_url) || length(redirect_url) == 0 ||\n          is.na(redirect_url) || !nzchar(redirect_url)) {\n        if (isTRUE(progress)) {\n          cat(sprintf(&quot;[UniProt] %s chunk %d\/%d: missing redirectURL\\n&quot;,\n                      label, i, n_chunks))\n        }\n        next\n      }\n      chunk_result &lt;- fetch_from_redirect(redirect_url)\n      if (is.null(chunk_result)) {\n        if (isTRUE(progress)) {\n          cat(sprintf(&quot;[UniProt] %s chunk %d\/%d: invalid redirectURL\\n&quot;,\n                      label, i, n_chunks))\n        }\n        next\n      }\n      \n      results_list[[length(results_list) + 1]] &lt;- chunk_result\n      if (isTRUE(progress)) {\n        cat(sprintf(&quot;[UniProt] %s chunk %d\/%d: completed\\n&quot;,\n                    label, i, n_chunks))\n      }\n    }\n    \n    if (length(results_list) == 0) {\n      return(NULL)\n    }\n    do.call(rbind, results_list)\n  }\n  \n  accession_ids &lt;- ids[is_accession]\n  gene_like_ids &lt;- ids[!is_accession]\n  \n  acc_results &lt;- map_ids(\n    values = accession_ids,\n    from_db = &quot;UniProtKB_AC-ID&quot;,\n    to_db = &quot;UniProtKB&quot;,\n    chunk_size = chunk_size,\n    label = &quot;accessions&quot;\n  )\n  \n  gene_results &lt;- map_ids(\n    values = gene_like_ids,\n    from_db = &quot;Gene_Name&quot;,\n    to_db = &quot;UniProtKB-Swiss-Prot&quot;,\n    chunk_size = chunk_size,\n    taxId = taxId,\n    label = &quot;gene_names&quot;\n  )\n  \n  results_list &lt;- list()\n  if (!is.null(acc_results)) {\n    results_list[[length(results_list) + 1]] &lt;- acc_results\n  }\n  if (!is.null(gene_results)) {\n    results_list[[length(results_list) + 1]] &lt;- gene_results\n  }\n  \n  if (length(results_list) == 0) {\n    stop(&quot;No UniProt results were returned. Check identifiers and taxId.&quot;)\n  }\n  \n  resultsTable &lt;- do.call(rbind, results_list)\n  if (&quot;Entry&quot; %in% colnames(resultsTable)) {\n    resultsTable &lt;- resultsTable[!duplicated(resultsTable$Entry), ]\n  }\n  return(resultsTable)\n}\n\n\n\n## SCRIPT ----\n\nspecies &lt;- c(&quot;human&quot;, &quot;zebrafish&quot;, &quot;drosophila&quot;, &quot;worm&quot;, &quot;yeast&quot;)\nsci_names &lt;- c(&quot;human&quot; = &quot;Homo sapiens&quot;, &quot;zebrafish&quot; = &quot;Danio rerio&quot;, &quot;drosophila&quot; = &quot;Drosophila melanogaster&quot;,\n               &quot;worm&quot; = &quot;Caenorhabditis elegans&quot;, &quot;yeast&quot; = &quot;Saccharomyces cerevisiae&quot;)\n\nfor (org in species) {\n  # look up scientific name of org\n  sci_name &lt;- sci_names[org]\n  output_path &lt;- paste0(&quot;Output\/Data\/&quot;, org, &quot;_uniprot.csv&quot;)\n  if(file.exists(output_path)) {\n    message(paste(&quot;File&quot;, output_path, &quot;already exists. Loading&quot;, org))\n    df &lt;- read.csv(output_path)\n  } else {\n    message(paste(&quot;Retrieving UniProt info for&quot;, org))\n    species_ids &lt;- read.delim(paste0(&quot;Data\/&quot;, org, &quot;_uniprot.txt&quot;), header = FALSE)\n    names(species_ids) &lt;- c(&quot;uniprot_id&quot;)\n    df &lt;- retrieveUniprotInfo(species_ids$uniprot_id)\n    # save this result\n    write.csv(df, output_path, row.names = FALSE)\n  }\n  \n  df$tms &lt;- str_count(df$Transmembrane, &quot;TRANSMEM&quot;)\n  tm_counts &lt;- df %&gt;%\n    group_by(tms) %&gt;%\n    summarise(count = n()) %&gt;% \n    filter(tms &gt; 0)\n  \n  p1 &lt;- ggplot(tm_counts, aes(x = tms, y = count)) +\n    geom_col(fill = &quot;#009988&quot;) +\n    labs(x = &quot;Transmembrane domains&quot;, y = &quot;Count&quot;,\n         title = sci_name) +\n    lims(x = c(0.5,NA), y = c(0,NA)) +\n    theme_bw(10)\n  \n  p2 &lt;- SuperPlotR::pieplot(x1 = c(sum(df$tms == 0), sum(df$tms &gt; 0)),\n                            cols = c(&quot;#bbbbbb&quot;, &quot;#009988&quot;)) +\n    # blank background and no legend\n    theme_void() +\n    theme(legend.position = &quot;none&quot;)\n  \n  # inset p2 in p1 and add information about the percentages\n  p &lt;- ggdraw() +\n    draw_plot(p1) +\n    # top right\n    draw_plot(p2, x = 0.9, y = 0.9, hjust = 1, vjust = 1, width = 0.4, height = 0.4) +\n    draw_label(paste0(&quot;Total Proteins: &quot;, nrow(df),\n                      &quot;\\nNo TM: &quot;, round(sum(df$tms == 0) \/ nrow(df) * 100, 1),\n                      &quot;%\\nWith TM(s): &quot;,\n                      round(sum(df$tms &gt; 0) \/ nrow(df) * 100, 1), &quot;%&quot;),\n               x = 0.97, y = 0.85, hjust = 1, vjust = 1, size = 8)\n  plot_path &lt;- paste0(&quot;Output\/Plots\/&quot;, org, &quot;_uniprot_tm_counts.png&quot;)\n  ggsave(plot_path, p, width = 1600, height = 900, units = &quot;px&quot;, dpi = 300)\n}\n<\/pre>\n\n\n<p>Note that I am using <code>{cowplot}<\/code> at the end to inset the pie chart and to add the text onto the main plot.<\/p>\n\n\n\n<p>\u2014<\/p>\n\n\n\n<p>The post title comes from \u201cMy Domain\u201d by Bernard Butler from his People Move On album.<\/p>\n\n<div style=\"border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;\">\r\n<div style=\"text-align: center;\">To <strong>leave a comment<\/strong> for the author, please follow the link and comment on their blog: <strong><a href=\"https:\/\/quantixed.org\/2026\/04\/16\/my-domain-proteome-wide-scanning-of-tmds\/\"> Rstats \u2013 quantixed<\/a><\/strong>.<\/div>\r\n<hr \/>\r\n<a href=\"https:\/\/www.r-bloggers.com\/\" rel=\"nofollow\">R-bloggers.com<\/a> offers <strong><a href=\"https:\/\/feedburner.google.com\/fb\/a\/mailverify?uri=RBloggers\" rel=\"nofollow\">daily e-mail updates<\/a><\/strong> about <a title=\"The R Project for Statistical Computing\" href=\"https:\/\/www.r-project.org\/\" rel=\"nofollow\">R<\/a> news and tutorials about <a title=\"R tutorials\" href=\"https:\/\/www.r-bloggers.com\/how-to-learn-r-2\/\" rel=\"nofollow\">learning R<\/a> and many other topics. <a title=\"Data science jobs\" href=\"https:\/\/www.r-users.com\/\" rel=\"nofollow\">Click here if you're looking to post or find an R\/data-science job<\/a>.\r\n\r\n<hr>Want to share your content on R-bloggers?<a href=\"https:\/\/www.r-bloggers.com\/add-your-blog\/\" rel=\"nofollow\"> click here<\/a> if you have a blog, or <a href=\"http:\/\/r-posts.com\/\" rel=\"nofollow\"> here<\/a> if you don't.\r\n<\/div>","protected":false},"excerpt":{"rendered":"<div style = \"width:60%; display: inline-block; float:left; \"> I wanted to know: After a little bit of searching, I couldn\u2019t find any answers. So I decided to use R to retrieve the necessary info from Uniprot and calculate it myself. I thought I\u2019d post it here in case it\u2019s useful for others. Human We\u2019ll &#8230;<\/div>\n<div style = \"width: 40%; display: inline-block; float:right;\"><\/div>\n<div style=\"clear: both;\"><\/div>\n","protected":false},"author":2887,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[4],"tags":[],"aioseo_notices":[],"jetpack-related-posts":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/posts\/400597"}],"collection":[{"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/users\/2887"}],"replies":[{"embeddable":true,"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/comments?post=400597"}],"version-history":[{"count":10,"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/posts\/400597\/revisions"}],"predecessor-version":[{"id":400774,"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/posts\/400597\/revisions\/400774"}],"wp:attachment":[{"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/media?parent=400597"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/categories?post=400597"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/tags?post=400597"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}