{"id":336481,"date":"2022-12-19T08:15:00","date_gmt":"2022-12-19T14:15:00","guid":{"rendered":"http:\/\/www.r-bloggers.com\/?guid=3f502d01b1fd75c54590c7df5ec074a4"},"modified":"2022-12-19T08:15:00","modified_gmt":"2022-12-19T14:15:00","slug":"hierarchical-clustering-cutting-the-tree-and-colouring-the-tree-leaves-based-on-sample-classes","status":"publish","type":"post","link":"https:\/\/www.r-bloggers.com\/2022\/12\/hierarchical-clustering-cutting-the-tree-and-colouring-the-tree-leaves-based-on-sample-classes\/","title":{"rendered":"Hierarchical clustering, cutting the tree and colouring the tree leaves based on sample classes"},"content":{"rendered":"<!-- \r\n<div style=\"min-height: 30px;\">\r\n[social4i size=\"small\" align=\"align-left\"]\r\n<\/div>\r\n-->\r\n\r\n<div style=\"border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;\">\r\n[This article was first published on  <strong><a href=\"https:\/\/gacatag.blogspot.com\/2022\/12\/hiererchical-clustering-cutting-tree.html\"> gacatag<\/a><\/strong>, and kindly contributed to <a href=\"https:\/\/www.r-bloggers.com\/\" rel=\"nofollow\">R-bloggers<\/a>].  (You can report issue about the content on this page <a href=\"https:\/\/www.r-bloggers.com\/contact-us\/\">here<\/a>)\r\n<hr>Want to share your content on R-bloggers?<a href=\"https:\/\/www.r-bloggers.com\/add-your-blog\/\" rel=\"nofollow\"> click here<\/a> if you have a blog, or <a href=\"http:\/\/r-posts.com\/\" rel=\"nofollow\"> here<\/a> if you don't.\r\n<\/div>\n<p><\/p><div class=\"separator\" style=\"clear: both; text-align: center;\"><a href=\"https:\/\/i1.wp.com\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEiVuW23NnLvKcc6FPO1APWrTQhmlxfL3AO6jzAyUy0POgKREY5XMOIOxSAVNpiAylb6kAGJAQgP8p_gQBbISxeGHoDk5NuR5RBH9mNqTWgIdh3MSQhchorKoyZGtjOBwgCb8dp6FVbAlfLVd0QTo9zr-OgpTpkK3EVRTvN63pSC5xcWTuZF7iQqAAHegQ\/s398\/icon_hclust.jpg?ssl=1\" style=\"margin-left: 1em; margin-right: 1em;\" rel=\"nofollow\" target=\"_blank\"><img loading=\"lazy\" border=\"0\" data-original-height=\"262\" data-original-width=\"398\" height=\"262\" src=\"https:\/\/i1.wp.com\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEiVuW23NnLvKcc6FPO1APWrTQhmlxfL3AO6jzAyUy0POgKREY5XMOIOxSAVNpiAylb6kAGJAQgP8p_gQBbISxeGHoDk5NuR5RBH9mNqTWgIdh3MSQhchorKoyZGtjOBwgCb8dp6FVbAlfLVd0QTo9zr-OgpTpkK3EVRTvN63pSC5xcWTuZF7iQqAAHegQ\/w400-h264\/icon_hclust.jpg?resize=398%2C262&#038;ssl=1\" width=\"398\" data-recalc-dims=\"1\" \/><\/a><\/div><p><\/p><p>Unsupervised machine learning methods such as <a href=\"https:\/\/en.wikipedia.org\/wiki\/Hierarchical_clustering\" rel=\"nofollow\" target=\"_blank\">hierarchical clustering<\/a> allow us to discover the trends and patterns of similarity within the data. Here, I demonstrate by using a test data, how to apply the Hierarchical clustering on columns of a test data matrix. Note that as my main focus is Bioinformatics application, I assume that the columns of the matrix represent individual samples and the rows represent the genes or transcripts or some other biological feature. However, as the application of clustering algorithms are not restricted to biology the rows or the column of the matrix may represent other things based on the field of research ! For the distance metric, I will use the Spearman correlation based distance supported by the Dist function of <a href=\"https:\/\/cran.r-project.org\/web\/packages\/amap\/index.html\" rel=\"nofollow\" target=\"_blank\">amap package<\/a>. For a skewed data, it is\u00a0 a good idea to check the similarity of the orders of the values rather than their linear relationship (i.e. Pearson correlation) or how geometrically close the values are (i.e. Euclidean distance). For more info, you can see an example that I provided in <a href=\"https:\/\/gacatag.blogspot.com\/2021\/05\/correlation-in-r-na-friendliness.html\" rel=\"nofollow\" target=\"_blank\">one of my previous posts<\/a> on how Spearman correlation may discover associations more efficiently for a skewed data.<br \/><\/p><p><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"><br \/>values<- matrix(rnorm(1000),ncol=20)<br \/>colnames(values)<- paste(\"col\",1:20,sep=\"\")<br \/>library(amap)<br \/>hRes<- hclust(Dist(t(values), method=\"spearman\"))<br \/>plot(hRes)<br \/><\/span><\/span><\/p><p><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"><\/span><\/span><\/p><p><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"><\/span><\/span><\/p><p><span style=\"color: #6aa84f;\"><\/span><\/p><div class=\"separator\" style=\"clear: both; text-align: center;\"><span style=\"color: #6aa84f;\"><a href=\"https:\/\/i1.wp.com\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEj4y0i9RYdRamm8Mlbu5UMzLpjnzPntjTHZAs2_jHJFSwsKVYGKGHeSD7DKwzhZr1ZvKYYByHTkheY1BtDb5eFdwUnAUl-xrQlunFfF6_j0ey9lwQRWW0hLBAYn-pBWJSW5uSFErABFRJO9NpJcmdM7vVmJVvBIBgi-LHwGoQKj84obpkmSv_WYUjnRdw\/s1400\/hclust_fig_1.jpg?ssl=1\" style=\"margin-left: 1em; margin-right: 1em;\" rel=\"nofollow\" target=\"_blank\"><img loading=\"lazy\" border=\"0\" data-original-height=\"1400\" data-original-width=\"450\" src=\"https:\/\/i1.wp.com\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEj4y0i9RYdRamm8Mlbu5UMzLpjnzPntjTHZAs2_jHJFSwsKVYGKGHeSD7DKwzhZr1ZvKYYByHTkheY1BtDb5eFdwUnAUl-xrQlunFfF6_j0ey9lwQRWW0hLBAYn-pBWJSW5uSFErABFRJO9NpJcmdM7vVmJVvBIBgi-LHwGoQKj84obpkmSv_WYUjnRdw\/w400-h400\/hclust_fig_1.jpg?resize=450%2C1400&#038;ssl=1\" width=\"450\" data-recalc-dims=\"1\" \/><\/a><\/span><\/div><span style=\"color: #6aa84f;\"><br \/><span style=\"font-family: courier;\"><br \/>\u00a0<\/span><\/span><p><\/p><p><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"><span style=\"font-family: times;\"><span style=\"color: black;\">After running Hierarchical clustering we can cut the result binary tree at a certain depth or request that it be cut in a manner that would result a certain number of clusters. Here, I request that the resulted binary tree be cut in away that would result to 2 sample clusters. Furthermore, I convert the resulted tree to a &#8220;dendogram&#8221; object and colour the branches and the labels of the tree to visualize the 2 clusters. One can use <span style=\"font-family: courier;\"><span style=\"color: #6aa84f;\">color_branches<\/span> <\/span>and <span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\">color_labels<\/span><\/span> functions to cut and colour the trees.<\/span><\/span> <br \/><\/span><\/span><\/p><p><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\">library(dendextend)<br \/><br \/># Cut and colour<br \/>hResDen<- as.dendrogram(hRes)<br \/>hResCut<- cutree(hResDen,2)<br \/>hResDen <- color_branches(hResDen, k= 2)<br \/>hResDen <- color_labels(hResDen, k= 2)<br \/>plot(hResDen)<\/span><\/span><br \/><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"><span style=\"color: #6aa84f;\"><\/span><\/span><\/span><\/p><div class=\"separator\" style=\"clear: both; text-align: center;\"><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"><span style=\"color: #6aa84f;\"><a href=\"https:\/\/i1.wp.com\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEimklT37nz_5HFzm3GVoUFCzaVN5ThQURNFD6Bnj4KzkZrh8XnA0Vno3FVtZy8D-QsQNFRCBNG562N9eXv4gTa_d2TgDMv83fybOWf3yd_gnuWRSuCcDUFGFeZfPAj6eBndzAp1-THEs3X74Bri9PPWema1wvWYyEVYvvEaLc8B89bNpOTZj3zJ-GER1A\/s1400\/hclust_fig_2.jpg?ssl=1\" style=\"margin-left: 1em; margin-right: 1em;\" rel=\"nofollow\" target=\"_blank\"><img loading=\"lazy\" border=\"0\" data-original-height=\"1400\" data-original-width=\"450\" src=\"https:\/\/i0.wp.com\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEimklT37nz_5HFzm3GVoUFCzaVN5ThQURNFD6Bnj4KzkZrh8XnA0Vno3FVtZy8D-QsQNFRCBNG562N9eXv4gTa_d2TgDMv83fybOWf3yd_gnuWRSuCcDUFGFeZfPAj6eBndzAp1-THEs3X74Bri9PPWema1wvWYyEVYvvEaLc8B89bNpOTZj3zJ-GER1A\/w400-h400\/hclust_fig_2.jpg?resize=450%2C1400&#038;ssl=1\" width=\"450\" data-recalc-dims=\"1\" \/><\/a><\/span><\/span><\/span><\/div><p><\/p><p><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"><\/span><\/span><\/p><p><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"><\/span><\/span><\/p><p><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"><\/span><\/span><\/p><p><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"><\/span><\/span><\/p><p><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"><\/span><\/span><\/p><p><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"><\/span><\/span><\/p><p><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"><br \/>\u00a0<\/span><\/span><\/p><p><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"><span style=\"font-family: times;\"><span style=\"color: black;\">Alternatively, one can use <span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\">color_branches<\/span><\/span> and <span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\">color_labels<\/span><\/span> functions to manually define the colours of the labels and the branches of the tree. <\/span><\/span><br \/><\/span><\/span><\/p><p><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"># manual colouring based on cut results<br \/>colours<- c(2,3)<br \/>hResDen<- as.dendrogram(hRes)<br \/>colOrder<- hRes$order<br \/>hResDen <- color_branches(hResDen,clusters=hResCut[colOrder],col=colours)<br \/>lableCol<- colours<br \/>names(lableCol)<- unique(hResCut[colOrder])<br \/>hResDen <- color_labels(hResDen,col=lableCol[as.character(hResCut[colOrder])])<br \/>plot(hResDen)<br \/><\/span><\/span><\/p><p><span style=\"color: #6aa84f;\"><\/span><\/p><div class=\"separator\" style=\"clear: both; text-align: center;\"><span style=\"color: #6aa84f;\"><a href=\"https:\/\/i0.wp.com\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEi92ChF7EsVDmCWUrD1huOCULOs7mvNbqoR1CRpCVAXLyZc7f3qUbKzeJab1RYegI4RbBF9nV_8oi2uV5O4hhc-jeMfLn5ugbPSqW6UwsnOcgAL0cZ4CvZPNBooy2GjM-ay6wIN4SJV7xxvZhkIUOuoH5gHmd0XBaHyDSw_AFhpXCShM34wYk88qPScaQ\/s1400\/hclust_fig_3.jpg?ssl=1\" style=\"margin-left: 1em; margin-right: 1em;\" rel=\"nofollow\" target=\"_blank\"><img loading=\"lazy\" border=\"0\" data-original-height=\"1400\" data-original-width=\"450\" src=\"https:\/\/i0.wp.com\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEi92ChF7EsVDmCWUrD1huOCULOs7mvNbqoR1CRpCVAXLyZc7f3qUbKzeJab1RYegI4RbBF9nV_8oi2uV5O4hhc-jeMfLn5ugbPSqW6UwsnOcgAL0cZ4CvZPNBooy2GjM-ay6wIN4SJV7xxvZhkIUOuoH5gHmd0XBaHyDSw_AFhpXCShM34wYk88qPScaQ\/w400-h400\/hclust_fig_3.jpg?resize=450%2C1400&#038;ssl=1\" width=\"450\" data-recalc-dims=\"1\" \/><\/a><\/span><\/div><span style=\"color: #6aa84f;\"><br \/><span style=\"font-family: courier;\"><br \/><\/span><\/span><p><\/p><p><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"><\/span><\/span><\/p><p><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"><br \/><\/span><\/span><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"><span style=\"font-family: times;\"><span style=\"color: black;\">But what if we want to colour the branches and the labels of the tree based on a predefined grouping of the samples ?  Here, we colour the labels and the edges leading to them to visualize the position of &#8220;class1&#8221;, &#8220;class2&#8221; and &#8220;class3&#8221; samples in the tree.<\/span><\/span><\/span><\/span><br \/><\/span><\/span><\/p><p><span style=\"color: #6aa84f;\"><span style=\"font-family: courier;\"># Manual colouring based on some predefined classes<br \/><br \/>sampleClass<- c(rep(\"class1\",5), rep(\"class2\",6), rep(\"class3\",9))<br \/>colours<- c(\"lightblue\",\"green\", \"red\")<br \/>hResDen<- as.dendrogram(hRes)<br \/>colOrder<- hRes$order<br \/>hResDen <- color_branches(hResDen,clusters=as.numeric(as.factor(sampleClass[colOrder])),col=colours)<br \/>lableCol<- colours<br \/>names(lableCol)<- unique(sampleClass[colOrder])<br \/>hResDen <- color_labels(hResDen,col=lableCol[as.character(sampleClass[colOrder])])<br \/>plot(hResDen)<\/span><br \/><\/span><\/p><div class=\"separator\" style=\"clear: both; text-align: center;\"><a href=\"https:\/\/i0.wp.com\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEh0s4dxFyIRuevKoioTG0iHM9MGh7xyshcPX4277Q7G0UMgjzr2F9w0-SHQmUaoVa0whcjG_vwSeo1XatycjjS1EcPlK3MUjKHCHzQrLsoGTp8XJJ4S38muJubTCIn3FlNKzDjcKfcEyjXnQeB2JL7pHiIVcYoP7y9Rb48jznJXgLv4Q37Uz_w2xnn1Aw\/s1400\/hclust_fig_4.jpg?ssl=1\" style=\"margin-left: 1em; margin-right: 1em;\" rel=\"nofollow\" target=\"_blank\"><img loading=\"lazy\" border=\"0\" data-original-height=\"1400\" data-original-width=\"450\" src=\"https:\/\/i1.wp.com\/blogger.googleusercontent.com\/img\/b\/R29vZ2xl\/AVvXsEh0s4dxFyIRuevKoioTG0iHM9MGh7xyshcPX4277Q7G0UMgjzr2F9w0-SHQmUaoVa0whcjG_vwSeo1XatycjjS1EcPlK3MUjKHCHzQrLsoGTp8XJJ4S38muJubTCIn3FlNKzDjcKfcEyjXnQeB2JL7pHiIVcYoP7y9Rb48jznJXgLv4Q37Uz_w2xnn1Aw\/w400-h400\/hclust_fig_4.jpg?resize=450%2C1400&#038;ssl=1\" width=\"450\" data-recalc-dims=\"1\" \/><\/a><\/div><br \/><p><\/p>\n<div style=\"border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;\">\r\n<div style=\"text-align: center;\">To <strong>leave a comment<\/strong> for the author, please follow the link and comment on their blog: <strong><a href=\"https:\/\/gacatag.blogspot.com\/2022\/12\/hiererchical-clustering-cutting-tree.html\"> gacatag<\/a><\/strong>.<\/div>\r\n<hr \/>\r\n<a href=\"https:\/\/www.r-bloggers.com\/\" rel=\"nofollow\">R-bloggers.com<\/a> offers <strong><a href=\"https:\/\/feedburner.google.com\/fb\/a\/mailverify?uri=RBloggers\" rel=\"nofollow\">daily e-mail updates<\/a><\/strong> about <a title=\"The R Project for Statistical Computing\" href=\"https:\/\/www.r-project.org\/\" rel=\"nofollow\">R<\/a> news and tutorials about <a title=\"R tutorials\" href=\"https:\/\/www.r-bloggers.com\/how-to-learn-r-2\/\" rel=\"nofollow\">learning R<\/a> and many other topics. <a title=\"Data science jobs\" href=\"https:\/\/www.r-users.com\/\" rel=\"nofollow\">Click here if you're looking to post or find an R\/data-science job<\/a>.\r\n\r\n<hr>Want to share your content on R-bloggers?<a href=\"https:\/\/www.r-bloggers.com\/add-your-blog\/\" rel=\"nofollow\"> click here<\/a> if you have a blog, or <a href=\"http:\/\/r-posts.com\/\" rel=\"nofollow\"> here<\/a> if you don't.\r\n<\/div>","protected":false},"excerpt":{"rendered":"<div style = \"width:60%; display: inline-block; float:left; \"> Unsupervised machine learning methods such as hierarchical clustering allow us to discover the trends and patterns of similarity within the data. Here, I demonstrate by using a test data, how to apply the Hierarchical clustering on columns of a test da&#8230;<\/div>\n<div style = \"width: 40%; display: inline-block; float:right;\"><\/div>\n<div style=\"clear: both;\"><\/div>\n","protected":false},"author":17,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[4],"tags":[],"aioseo_notices":[],"jetpack-related-posts":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/posts\/336481"}],"collection":[{"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/users\/17"}],"replies":[{"embeddable":true,"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/comments?post=336481"}],"version-history":[{"count":23,"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/posts\/336481\/revisions"}],"predecessor-version":[{"id":378727,"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/posts\/336481\/revisions\/378727"}],"wp:attachment":[{"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/media?parent=336481"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/categories?post=336481"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.r-bloggers.com\/wp-json\/wp\/v2\/tags?post=336481"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}