{"id":222,"date":"2014-02-10T10:00:00","date_gmt":"2014-02-10T15:00:00","guid":{"rendered":"https:\/\/pyimagesearch.com\/?p=222"},"modified":"2021-04-17T16:34:58","modified_gmt":"2021-04-17T20:34:58","slug":"building-an-image-search-engine-indexing-your-dataset-step-2-of-4","status":"publish","type":"post","link":"https:\/\/pyimagesearch.com\/2014\/02\/10\/building-an-image-search-engine-indexing-your-dataset-step-2-of-4\/","title":{"rendered":"Building an Image Search Engine: Indexing Your Dataset (Step 2 of 4)"},"content":{"rendered":"<p>Last Wednesday&#8217;s blog post reviewed the first step of building an image search engine: <a href=\"https:\/\/pyimagesearch.com\/2014\/01\/29\/building-an-image-search-engine-defining-your-image-descriptor-step-1-of-4\/\" target=\"_blank\" rel=\"noopener noreferrer\">Defining Your Image Descriptor<\/a>.<\/p>\n<p>We then examined the three aspects of an image that can be easily described:<\/p>\n<ul>\n<li><strong>Color:<\/strong> Image descriptors that characterize the color of an image seek to model the distribution of the pixel intensities in each channel of the image. These methods include basic color statistics such as mean, standard deviation, and skewness, along with color histograms, both <a title=\"Clever Girl: A Guide to Utilizing Color Histograms for Computer Vision and Image Search Engines\" href=\"https:\/\/pyimagesearch.com\/2014\/01\/22\/clever-girl-a-guide-to-utilizing-color-histograms-for-computer-vision-and-image-search-engines\/\" target=\"_blank\" rel=\"noopener noreferrer\">&#8220;flat&#8221; and multi-dimensional<\/a>.<\/li>\n<li><strong>Texture:<\/strong> Texture descriptors seek to model the feel, appearance, and overall tactile quality of an object in an image. Some, but not all, texture descriptors convert the image to grayscale and then compute a Gray-Level Co-occurrence Matrix (GLCM) and compute statistics over this matrix, including contrast, correlation, and entropy, to name a few. More advanced texture descriptors such as Fourier and Wavelet transforms also exist, but still utilize the grayscale image.<\/li>\n<li><strong>Shape:<\/strong> Many shape descriptor methods rely on extracting the contour of an object in an image (i.e. the outline). Once we have the outline, we can then compute simple statistics to to characterize the outline, which is exactly what OpenCV&#8217;s Hu Moments does. These statistics can be used to represent the shape (outline) of an object in an image.<\/li>\n<\/ul>\n<p><em><strong>Note:<\/strong> If you haven&#8217;t already seen my fully working image search engine yet, head on over to my <a href=\"https:\/\/pyimagesearch.com\/2014\/01\/27\/hobbits-and-histograms-a-how-to-guide-to-building-your-first-image-search-engine-in-python\/\" target=\"_blank\" rel=\"noopener noreferrer\">How-To guide on building a simple image search engine using Lord of the Rings screenshots<\/a>. <\/em><\/p>\n<p>When selecting a descriptor to extract features from our dataset, we have to ask ourselves what aspects of the image are we interested in describing? Is the color of an image important? What about the shape? Is the tactile quality (texture) important to returning relevant results?<\/p>\n<p>Let&#8217;s take a look at a sample of the <a href=\"http:\/\/www.robots.ox.ac.uk\/~vgg\/data\/flowers\/17\/index.html\" target=\"_blank\" rel=\"noopener noreferrer\">Flowers 17 dataset<\/a>, a dataset of 17 flower species, for example purposes:<\/p>\n<figure id=\"attachment_224\" aria-describedby=\"caption-attachment-224\" style=\"width: 540px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/pyimagesearch.com\/wp-content\/uploads\/2014\/01\/flowers17sample.jpg\"><img decoding=\"async\" class=\" wp-image-224 \" alt=\"Figure 1 - Sample of the Flowers 17 Dataset. As we can see, some flowers might be indistinguishable using color or shape alone. Better results can be obtained by extracting both color and shape features.\" src=\"https:\/\/pyimagesearch.com\/wp-content\/uploads\/2014\/01\/flowers17sample.jpg\" width=\"540\" height=\"401\"><\/a><figcaption id=\"caption-attachment-224\" class=\"wp-caption-text\"><strong>Figure 1<\/strong> &#8211; A sample of the Flowers 17 Dataset. As we can see, some flowers might be indistinguishable using color or shape alone (i.e. Tulip and Cowslip have similar color distributions). Better results can be obtained by extracting both color and shape features.<\/figcaption><\/figure>\n<p>If we wanted to describe these images with the intention of building an image search engine, the first descriptor I would use is <strong>color<\/strong>. By characterizing the color of the petals of the flower, our search engine will be able to return flowers of similar color tones.<\/p>\n<p>However, just because our image search engine will return flowers of similar color, does not mean all the results will be relevant. Many flowers can have the same color but be an entirely different species.<\/p>\n<p>In order to ensure more similar species of flowers are returned from our flower search engine, I would then explore describing the <strong>shape<\/strong> of the petals of the flower.<\/p>\n<p>Now we have two descriptors &#8212; <strong>color<\/strong> to characterize the different color tones of the petals, and <strong>shape<\/strong> to describe the outline of the petals themselves.<\/p>\n<p>Using these two descriptors in conjunction with one another, we would be able to build a simple image search engine for our flowers dataset.<\/p>\n<p>Of course, we need to know how to index our dataset.<\/p>\n<p>Right now we simply know what descriptors we will use to describe our images.<\/p>\n<p>But how are we going to apply these descriptors to our entire dataset?<\/p>\n<p>In order to answer that question, today we are going to explore the second step of building an image search engine: <strong>Indexing Your Dataset<\/strong>.<\/p>\n<h1>Indexing Your Dataset<\/h1>\n<blockquote>\n<p><strong>Definition: <\/strong>Indexing is the process of quantifying your dataset by applying an image descriptor to extract features from each and every image in your dataset. Normally, these features are stored on disk for later use.<\/p>\n<\/blockquote>\n<p>Using our flowers database example above, our goal is to simply loop over each image in our dataset, extract some features, and store these features on disk.<\/p>\n<p>It&#8217;s quite a simple concept in principle, but in reality, it can become very complex, depending on the size and scale of your dataset. For comparison purposes, we would say that the Flowers 17 dataset is <em>small<\/em>. It has a total of only 1,360 images (17 categories x 80 images per category). By comparison, image search engines such as <a href=\"http:\/\/www.tineye.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">TinEye<\/a> have image datasets that number in the billions.<\/p>\n<p>Let&#8217;s start with the first step: instantiating your descriptor.<\/p>\n<h2>1. Instantiate Your Descriptor<\/h2>\n<p>In my <a href=\"https:\/\/pyimagesearch.com\/2014\/01\/27\/hobbits-and-histograms-a-how-to-guide-to-building-your-first-image-search-engine-in-python\/\" target=\"_blank\" rel=\"noopener noreferrer\">How-To guide to building an image search engine<\/a>, I mentioned that I liked to abstract my image descriptors as classes rather than functions.<\/p>\n<p>Furthermore, I like to put relevant parameters (such as the number of bins in a histogram) in the constructor of the class.<\/p>\n<p>Why do I bother doing this?<\/p>\n<p>The reason for using a class (with descriptor parameters in the constructor) rather than a function is because it helps ensure that the exact same descriptor with the exact same parameters is applied to each and every image in my dataset.<\/p>\n<p>This is especially useful if I ever need to write my descriptor to disk using <code>cPickle<\/code> and load it back up again farther down the line, such as when a user is performing a query.<\/p>\n<p>In order to compare two images, you need to represent them in the same manner using your image descriptor. It wouldn&#8217;t make sense to extract a histogram with 32 bins from one image and then a histogram with 128 bins from another image if your intent is to compare the two for similarity.<\/p>\n<p>For example, let&#8217;s take a look at the skeleton code of a generic image descriptor in Python:<\/p>\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"true\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"Generic Image Descriptor Skeleton Class in Python\" data-enlighter-group=\"0\">class GenericDescriptor:\n\tdef __init__(self, paramA, paramB):\n\t\t# store the parameters for use in the 'describe' method\n\t\tself.paramA = paramA\n\t\tself.paramB = paramB\n\n\tdef describe(self, image):\n\t\t# describe the image using self.paramA and self.paramB\n\t\t# as supplied in the constructor\n\t\tpass\n<\/pre>\n\n\n<p>The first thing you notice is the <code>__init__ method<\/code>. Here I provide my relevant parameters for the descriptor.<\/p>\n<p>Next, you see the <code>describe method<\/code>. This method takes a single parameter: the <code>image<\/code> we wish to describe.<\/p>\n<p>Whenever I call the <code>describe<\/code> method, I know that the parameters stored during the constructor will be used for each and every image in my dataset. This ensures my images are described consistently with identical descriptor parameters.<\/p>\n<p>While the class vs. function argument doesn&#8217;t seem like it&#8217;s a big deal right now, when you start building larger, more complex image search engines that have a large codebase, using classes helps ensure that your descriptors are consistent.<\/p>\n<h2>2. Serial or Parallel?<\/h2>\n<p>A better title for this step might be &#8220;Single-core or Multi-core?&#8221;<\/p>\n<p>Inherently, extracting features from images in a dataset is a task that can be made parallel.<\/p>\n<p>Depending on the size and scale of your dataset, it might make sense to utilize multi-core processing techniques to split-up the extraction of feature vectors from each image between multiple cores\/processors.<\/p>\n<p>However, for small datasets using computationally simple image descriptors, such as color histograms, using multi-core processing is not only overkill, it adds extra complexity to your code.<\/p>\n<p>This is especially troublesome if you are just getting started working with computer vision and image search engines.<\/p>\n<p>Why bother adding extra complexity? Debugging programs with multiple threads\/processes is substantially harder than debugging programs with only a single thread of execution.<\/p>\n<p>Unless your dataset is quite large and could greatly benefit from multi-core processing, I would stay away from splitting the indexing task up into multiple processes for the time being. It&#8217;s not worth the headache just yet. Although, in the future I will certainly have a blog post discussing best practice methods to make your indexing task parallel.<\/p>\n<h2>3. Writing to Disk<\/h2>\n<p>This step might seem a bit obvious. But if you&#8217;re going to go through all the effort to extract features from your dataset, it&#8217;s best to write your index to disk for later use.<\/p>\n<p>For small datasets, using a simple Python dictionary will likely suffice. The key can be the image filename (assuming that you have unique filenames across your dataset) and the value the features extracted from that image using your image descriptor. Finally, you can dump the index to file using <code>cPickle<\/code>.<\/p>\n<p>If your dataset is larger or you plan to manipulate your features further (i.e. scaling, normalization, dimensionality reduction), you might be better off using <code><code><\/code><\/code><a href=\"http:\/\/www.h5py.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">h5py<\/a> to write your features to disk.<\/p>\n<p>Is one method better than the other?<\/p>\n<p>It honestly depends.<\/p>\n<p>If you&#8217;re just starting off in computer vision and image search engines and you have a small dataset, I would use Python&#8217;s built-in dictionary type and <code>cPickle<\/code> for the time being.<\/p>\n<p>If you have experience in the field and have experience with NumPy, then I would suggest giving <code>h5py<\/code> a try and then comparing it to the dictionary approach mentioned above.<\/p>\n<p>For the time being, I will be using <code>cPickle<\/code> in my code examples; however, within the next few months, I&#8217;ll also start introducing <code>h5py<\/code> into my examples as well.<\/p>\n<div id=\"pitch\" style=\"padding: 40px; width: 100%; background-color: #F4F6FA;\">\r\n\t<h3>What's next? We recommend <a target=\"_blank\" href=\"https:\/\/pyimagesearch.com\/pyimagesearch-university\/?utm_source=blogPost&utm_medium=bottomBanner&utm_campaign=What%27s%20next%3F%20I%20recommend\">PyImageSearch University<\/a>.<\/h3>\r\n\r\n\t<script src=\"https:\/\/fast.wistia.com\/embed\/medias\/kno0cmko2z.jsonp\" async><\/script><script src=\"https:\/\/fast.wistia.com\/assets\/external\/E-v1.js\" async><\/script><div class=\"wistia_responsive_padding\" style=\"padding:56.25% 0 0 0;position:relative;\"><div class=\"wistia_responsive_wrapper\" style=\"height:100%;left:0;position:absolute;top:0;width:100%;\"><div class=\"wistia_embed wistia_async_kno0cmko2z videoFoam=true\" style=\"height:100%;position:relative;width:100%\"><div class=\"wistia_swatch\" style=\"height:100%;left:0;opacity:0;overflow:hidden;position:absolute;top:0;transition:opacity 200ms;width:100%;\"><img decoding=\"async\" src=\"https:\/\/fast.wistia.com\/embed\/medias\/kno0cmko2z\/swatch\" style=\"filter:blur(5px);height:100%;object-fit:contain;width:100%;\" alt=\"\" aria-hidden=\"true\" onload=\"this.parentNode.style.opacity=1;\" \/><\/div><\/div><\/div><\/div>\r\n\r\n\t<div style=\"margin-top: 32px; margin-bottom: 32px; \">\r\n\t\t<strong>Course information:<\/strong><br\/>\r\n\t\t86+ total classes \u2022 115+ hours hours of on-demand code walkthrough videos \u2022 Last updated: May 2026<br\/>\r\n\t\t<span style=\"color: #169FE6;\">\u2605\u2605\u2605\u2605\u2605<\/span> 4.84 (128 Ratings) \u2022 16,000+ Students Enrolled\r\n\t<\/div>\r\n\r\n\t<p><strong>I strongly believe that if you had the right teacher you could <em>master<\/em> computer vision and deep learning.<\/strong><\/p>\r\n\r\n\t<p>Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?<\/p>\r\n\r\n\t<p>That\u2019s <em>not<\/em> the case.<\/p>\r\n\r\n\t<p>All you need to master computer vision and deep learning is for someone to explain things to you in <em>simple, intuitive<\/em> terms. <em>And that\u2019s exactly what I do<\/em>. My mission is to change education and how complex Artificial Intelligence topics are taught.<\/p>\r\n\r\n\t<p>If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you\u2019ll learn how to <em>successfully<\/em> and <em>confidently<\/em> apply computer vision to your work, research, and projects. Join me in computer vision mastery.<\/p>\r\n\r\n\t<p><strong>Inside PyImageSearch University you'll find:<\/strong><\/p>\r\n\r\n\t<ul style=\"margin-left: 0px;\">\r\n\t\t<li style=\"list-style: none;\">&check; <strong>86+ courses<\/strong> on essential computer vision, deep learning, and OpenCV topics<\/li>\r\n\t\t<li style=\"list-style: none;\">&check; <strong>86 Certificates<\/strong> of Completion<\/li>\r\n\t\t<li style=\"list-style: none;\">&check; <strong>115+ hours hours<\/strong> of on-demand video<\/li>\r\n\t\t<li style=\"list-style: none;\">&check; <strong>Brand new courses released <em>regularly<\/em><\/strong>, ensuring you can keep up with state-of-the-art techniques<\/li>\r\n\t\t<li style=\"list-style: none;\">&check; <strong>Pre-configured Jupyter Notebooks in Google Colab<\/strong><\/li>\r\n\t\t<li style=\"list-style: none;\">&check; Run all code examples in your web browser \u2014 works on Windows, macOS, and Linux (no dev environment configuration required!)<\/li>\r\n\t\t<li style=\"list-style: none;\">&check; Access to <strong>centralized code repos for <em>all<\/em> 540+ tutorials<\/strong> on PyImageSearch<\/li>\r\n\t\t<li style=\"list-style: none;\">&check; <strong> Easy one-click downloads<\/strong> for code, datasets, pre-trained models, etc.<\/li>\r\n\t\t<li style=\"list-style: none;\">&check; <strong>Access<\/strong> on mobile, laptop, desktop, etc.<\/li>\r\n\t<\/ul>\r\n\r\n\t<p style=\"text-align: center;\">\r\n\t\t<a target=\"_blank\" class=\"button link\" href=\"https:\/\/pyimagesearch.com\/pyimagesearch-university\/?utm_source=blogPost&utm_medium=bottomBanner&utm_campaign=What%27s%20next%3F%20I%20recommend\" style=\"background-color: #6DC713; border-bottom: none;\">Click here to join PyImageSearch University<\/a>\r\n\t<\/p>\r\n<\/div>\n<h1>Summary<\/h1>\n<p>Today we explored how to index an image dataset. Indexing is the process of extracting features from a dataset of images and then writing the features to persistent storage, such as your hard drive.<\/p>\n<p>The first step to indexing a dataset is to determine which image descriptor you are going to use. You need to ask yourself, what aspect of the images are you trying to characterize? The color distribution? The texture and tactile quality? The shape of the objects in the image?<\/p>\n<p>After you have determined which descriptor you are going to use, you need to loop over your dataset and apply your descriptor to each and every image in the dataset, extracting feature vectors. This can be done either serially or parallel by utilizing multi-processing techniques.<\/p>\n<p>Finally, after you have extracted features from your dataset, you need to write your index of features to file. Simple methods include using Python&#8217;s built-in dictionary type and <code>cPickle<\/code>. More advanced options include using <code>h5py<\/code>.<\/p>\n<p>Next week we&#8217;ll move on to the third step in building an image search engine: determining how to compare feature vectors for similarity.<\/p>\n<div id=\"download-the-code\" class=\"post-cta-wrap\">\n<div class=\"gpd-post-cta\">\n\t<div class=\"gpd-post-cta-content\">\n\t\t\n\n\t\t\t<div class=\"gpd-post-cta-top\">\n\t\t\t\t<div class=\"gpd-post-cta-top-image\"><img decoding=\"async\" src=\"https:\/\/pyimagesearch.com\/wp-content\/uploads\/2020\/01\/cta-source-guide-1.png\" alt=\"\" \/><\/div>\n\t\t\t\t\n\t\t\t\t<div class=\"gpd-post-cta-top-title\"><h4>Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF<\/h4><\/div>\n\t\t\t\t<div class=\"gpd-post-cta-top-desc\"><p>Enter your email address below to <strong>join the PyImageSearch Newsletter<\/strong> and <strong>download my FREE 17-page Resource Guide PDF<\/strong> on Computer Vision, OpenCV, and Deep Learning.<\/p><\/div>\n\n\n\t\t\t<\/div>\n\n\t\t\t<div class=\"gpd-post-cta-bottom\">\n\t\t\t\t<form class=\"footer-cta\" action=\"https:\/\/www.getdrip.com\/forms\/657075648\/submissions\" method=\"post\" target=\"_blank\" data-drip-embedded-form=\"657075648\">\n\t\t\t\t\t<input name=\"fields[email]\" type=\"email\" value=\"\" placeholder=\"Your email address\" class=\"form-control\" \/>\n\n\t\t\t\t\t<button type=\"submit\">Join the Newsletter!<\/button>\n\n\t\t\t\t\t<div style=\"display: none;\" aria-hidden=\"true\"><label for=\"website\">Website<\/label><br \/><input type=\"text\" id=\"website\" name=\"website\" tabindex=\"-1\" autocomplete=\"false\" value=\"\" \/><\/div>\n\t\t\t\t<\/form>\n\t\t\t<\/div>\n\n\n\t\t\n\t<\/div>\n\n<\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>Last Wednesday&#8217;s blog post reviewed the first step of building an image search engine: Defining Your Image Descriptor. We then examined the three aspects of an image that can be easily described: Color: Image descriptors that characterize the color of&hellip;<\/p>\n","protected":false},"author":1,"featured_media":13956,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"categories":[33],"tags":[22,34,38,40,39,36,37],"class_list":{"0":"post-222","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-image-search-engine-basics","8":"tag-color","9":"tag-image-search-engine","10":"tag-indexing","11":"tag-parallel","12":"tag-serial","13":"tag-shape","14":"tag-texture","15":"entry"},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v19.6.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Building an Image Search Engine: Indexing Your Dataset (Step 2 of 4) - PyImageSearch<\/title>\n<meta name=\"description\" content=\"Step 2 to building an image search engine: Indexing Your Dataset. Extract features from your image dataset and write them to disk. Step 1: Instantiate Your Descriptor. Step 2: Serial or Parallel? Step 3: Write to Disk.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/pyimagesearch.com\/2014\/02\/10\/building-an-image-search-engine-indexing-your-dataset-step-2-of-4\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building an Image Search Engine: Indexing Your Dataset (Step 2 of 4) - PyImageSearch\" \/>\n<meta property=\"og:description\" content=\"Step 2 to building an image search engine: Indexing Your Dataset. Extract features from your image dataset and write them to disk. Step 1: Instantiate Your Descriptor. Step 2: Serial or Parallel? Step 3: Write to Disk.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/pyimagesearch.com\/2014\/02\/10\/building-an-image-search-engine-indexing-your-dataset-step-2-of-4\/\" \/>\n<meta property=\"og:site_name\" content=\"PyImageSearch\" \/>\n<meta property=\"article:published_time\" content=\"2014-02-10T15:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-04-17T20:34:58+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/pyimagesearch.com\/wp-content\/uploads\/2014\/02\/search_engine_indexing.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"700\" \/>\n\t<meta property=\"og:image:height\" content=\"467\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Adrian Rosebrock\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Adrian Rosebrock\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/pyimagesearch.com\/2014\/02\/10\/building-an-image-search-engine-indexing-your-dataset-step-2-of-4\/\",\"url\":\"https:\/\/pyimagesearch.com\/2014\/02\/10\/building-an-image-search-engine-indexing-your-dataset-step-2-of-4\/\",\"name\":\"Building an Image Search Engine: Indexing Your Dataset (Step 2 of 4) - PyImageSearch\",\"isPartOf\":{\"@id\":\"https:\/\/pyimagesearch.com\/#website\"},\"datePublished\":\"2014-02-10T15:00:00+00:00\",\"dateModified\":\"2021-04-17T20:34:58+00:00\",\"author\":{\"@id\":\"https:\/\/pyimagesearch.com\/#\/schema\/person\/5901b399e2f20b986362a00636181cca\"},\"description\":\"Step 2 to building an image search engine: Indexing Your Dataset. Extract features from your image dataset and write them to disk. Step 1: Instantiate Your Descriptor. Step 2: Serial or Parallel? Step 3: Write to Disk.\",\"breadcrumb\":{\"@id\":\"https:\/\/pyimagesearch.com\/2014\/02\/10\/building-an-image-search-engine-indexing-your-dataset-step-2-of-4\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/pyimagesearch.com\/2014\/02\/10\/building-an-image-search-engine-indexing-your-dataset-step-2-of-4\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/pyimagesearch.com\/2014\/02\/10\/building-an-image-search-engine-indexing-your-dataset-step-2-of-4\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/pyimagesearch.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Building an Image Search Engine: Indexing Your Dataset (Step 2 of 4)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/pyimagesearch.com\/#website\",\"url\":\"https:\/\/pyimagesearch.com\/\",\"name\":\"PyImageSearch\",\"description\":\"You can master Computer Vision, Deep Learning, and OpenCV - PyImageSearch\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/pyimagesearch.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/pyimagesearch.com\/#\/schema\/person\/5901b399e2f20b986362a00636181cca\",\"name\":\"Adrian Rosebrock\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/pyimagesearch.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/57e1f3a95feeb9b113f80510f086d7d81b6f62badd9bd69134e51037a8b79925?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/57e1f3a95feeb9b113f80510f086d7d81b6f62badd9bd69134e51037a8b79925?s=96&d=mm&r=g\",\"caption\":\"Adrian Rosebrock\"},\"description\":\"Hi there, I\u2019m Adrian Rosebrock, PhD. All too often I see developers, students, and researchers wasting their time, studying the wrong things, and generally struggling to get started with Computer Vision, Deep Learning, and OpenCV. I created this website to show you what I believe is the best possible way to get your start.\",\"url\":\"https:\/\/pyimagesearch.com\/author\/adrian\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Building an Image Search Engine: Indexing Your Dataset (Step 2 of 4) - PyImageSearch","description":"Step 2 to building an image search engine: Indexing Your Dataset. Extract features from your image dataset and write them to disk. Step 1: Instantiate Your Descriptor. Step 2: Serial or Parallel? Step 3: Write to Disk.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/pyimagesearch.com\/2014\/02\/10\/building-an-image-search-engine-indexing-your-dataset-step-2-of-4\/","og_locale":"en_US","og_type":"article","og_title":"Building an Image Search Engine: Indexing Your Dataset (Step 2 of 4) - PyImageSearch","og_description":"Step 2 to building an image search engine: Indexing Your Dataset. Extract features from your image dataset and write them to disk. Step 1: Instantiate Your Descriptor. Step 2: Serial or Parallel? Step 3: Write to Disk.","og_url":"https:\/\/pyimagesearch.com\/2014\/02\/10\/building-an-image-search-engine-indexing-your-dataset-step-2-of-4\/","og_site_name":"PyImageSearch","article_published_time":"2014-02-10T15:00:00+00:00","article_modified_time":"2021-04-17T20:34:58+00:00","og_image":[{"width":700,"height":467,"url":"https:\/\/pyimagesearch.com\/wp-content\/uploads\/2014\/02\/search_engine_indexing.jpg","type":"image\/jpeg"}],"author":"Adrian Rosebrock","twitter_misc":{"Written by":"Adrian Rosebrock","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/pyimagesearch.com\/2014\/02\/10\/building-an-image-search-engine-indexing-your-dataset-step-2-of-4\/","url":"https:\/\/pyimagesearch.com\/2014\/02\/10\/building-an-image-search-engine-indexing-your-dataset-step-2-of-4\/","name":"Building an Image Search Engine: Indexing Your Dataset (Step 2 of 4) - PyImageSearch","isPartOf":{"@id":"https:\/\/pyimagesearch.com\/#website"},"datePublished":"2014-02-10T15:00:00+00:00","dateModified":"2021-04-17T20:34:58+00:00","author":{"@id":"https:\/\/pyimagesearch.com\/#\/schema\/person\/5901b399e2f20b986362a00636181cca"},"description":"Step 2 to building an image search engine: Indexing Your Dataset. Extract features from your image dataset and write them to disk. Step 1: Instantiate Your Descriptor. Step 2: Serial or Parallel? Step 3: Write to Disk.","breadcrumb":{"@id":"https:\/\/pyimagesearch.com\/2014\/02\/10\/building-an-image-search-engine-indexing-your-dataset-step-2-of-4\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/pyimagesearch.com\/2014\/02\/10\/building-an-image-search-engine-indexing-your-dataset-step-2-of-4\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/pyimagesearch.com\/2014\/02\/10\/building-an-image-search-engine-indexing-your-dataset-step-2-of-4\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/pyimagesearch.com\/"},{"@type":"ListItem","position":2,"name":"Building an Image Search Engine: Indexing Your Dataset (Step 2 of 4)"}]},{"@type":"WebSite","@id":"https:\/\/pyimagesearch.com\/#website","url":"https:\/\/pyimagesearch.com\/","name":"PyImageSearch","description":"You can master Computer Vision, Deep Learning, and OpenCV - PyImageSearch","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/pyimagesearch.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/pyimagesearch.com\/#\/schema\/person\/5901b399e2f20b986362a00636181cca","name":"Adrian Rosebrock","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/pyimagesearch.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/57e1f3a95feeb9b113f80510f086d7d81b6f62badd9bd69134e51037a8b79925?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/57e1f3a95feeb9b113f80510f086d7d81b6f62badd9bd69134e51037a8b79925?s=96&d=mm&r=g","caption":"Adrian Rosebrock"},"description":"Hi there, I\u2019m Adrian Rosebrock, PhD. All too often I see developers, students, and researchers wasting their time, studying the wrong things, and generally struggling to get started with Computer Vision, Deep Learning, and OpenCV. I created this website to show you what I believe is the best possible way to get your start.","url":"https:\/\/pyimagesearch.com\/author\/adrian\/"}]}},"_links":{"self":[{"href":"https:\/\/pyimagesearch.com\/wp-json\/wp\/v2\/posts\/222","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pyimagesearch.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pyimagesearch.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pyimagesearch.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/pyimagesearch.com\/wp-json\/wp\/v2\/comments?post=222"}],"version-history":[{"count":4,"href":"https:\/\/pyimagesearch.com\/wp-json\/wp\/v2\/posts\/222\/revisions"}],"predecessor-version":[{"id":21481,"href":"https:\/\/pyimagesearch.com\/wp-json\/wp\/v2\/posts\/222\/revisions\/21481"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/pyimagesearch.com\/wp-json\/wp\/v2\/media\/13956"}],"wp:attachment":[{"href":"https:\/\/pyimagesearch.com\/wp-json\/wp\/v2\/media?parent=222"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pyimagesearch.com\/wp-json\/wp\/v2\/categories?post=222"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pyimagesearch.com\/wp-json\/wp\/v2\/tags?post=222"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}