Image Relatability Search using Vector Database
Table of Contents
- What is a Vector Database
- Why Use a Vector Database
- Getting Started with Vector Databases
- Setting Up Weaviate Locally
- Demonstrating the Prototype
- Conclusion
Hi. This is kongs from Cloud Solution Group 2.
With recent advancements in AI, vector databases have become increasingly prevalent due to their ability to store data as high-dimensional vectors, enabling efficient querying based on similarity.
Some commonly stored data types include images, textual data, audio files and even videos.
Today, I will walk you through how to setup a locally hosted image search engine using a vector database.
What is a Vector Database
A vector database is, like the name suggests, a database that stores vectors.
However, clearly the explanation glosses over why we would need a vector database to begin with.
To put it simply, many types of data can be expressed as a collection of vectors.
These vectors can have many different dimensions(a lot more than the three dimensional world that we can understand easily).
An easy example would be to imagine that an image would have a vector as a point in a multi-dimensional space, with each dimension representing a feature of the data, such as color, shape, or texture.
Why Use a Vector Database
Have you ever tried those games where you find the differences between two pictures?
(The above 2 images are used as an example to compare differences)
While finding differences between two nearly identical pictures is not that difficult, rating the relatability of two pictures is however quite difficult, especially if you are asked to give a concrete number of how relatable they are.
This issue is further complicated by the fact that this problem cannot be solved simply by comparing the color of each individual pixel.
(An example of two similar images where a conventional pixel by pixel comparison will not work.)
This problem exists for many types of data, such as video, text, audio and many more.
However, with the usage of a vector database to store the data of these files with in a vector format, we now have an objective metric to tell how relatable two images are.
Unlike traditional databases, which rely on exact matches, vector databases enable approximate nearest neighbor (ANN) searches, making them ideal for similarity-based queries.
This allows for a search engine of these data files to be achievable.
Getting Started with Vector Databases
The amount of services that offer vector database hosting has increased over the years. Some popular services include Weaviate, Pinecone and Milvus. Even conventional database providers like PostgreSQL and MongoDB now provide vector database integration.
Today, we will try hosting our own vector database using Weaviate, which is an open-source vector database designed for storing, and searching high-dimensional data, which in our case would be images, represented in the form of embeddings. Using Weaviate allows you to deploy a locally-hosted vector database for prototyping. The fact that it is both open-source and highly customizable makes Weaviate an attractive choice for developers who want to experiment freely and try out new ideas.
Setting Up Weaviate Locally
Before getting started, it is important to note that hosting a vector database, especially in the case of using it as an image search tool, requires your PC having a good amount of RAM, and if possible, a GPU.
We would also be using Docker to host Weaviate in a container, so we recommend having Docker Desktop installed.
Accessing the above link will bring you to the homepage of Weaviate(shown below).
Click on “Documentation” and navigate to the “Installation” section.
Select “How to install” from the dropdown menu on the left side, then the “Docker Compose” option in the “Installation methods” provided.
In the next page, when you scroll downwards, there is a “Configurator” provided, which is a like an easy to use guide for choosing the settings that you wish to configure.
Select the following setting(Recommended):
- For “Weaviate Version”, simply select the latest one that is provided.
- For “Persistent Volume”, select “Persistent volume with named volume”.
- For “Standalone Or Modules”, select “With modules”.
- For “Vectorizers & Retrievers Media Type”, select “Images”. (Note that for the use case of image search and categorization, combinations of image and text will provide better accuracy in the long run)
- For “Image2Vec Model(Vectorizers & Retrievers)”, select “ResNet50(Pytorch)”. (Currently only this option supports GPU related features)
- For “Ref2Vec-Centroid Vectorizer Module”, select “Disabled”.
- In this segment, the “Configurator” will ask if you want to include any popular AI model integration features. Select “Disabled” for all of them. (You can try them out if you are interested)
- Lastly, for “Select Your Desired Runtime”, choose “Docker Compose”.
After selecting the required options, you will be presented with the following:
Copy the command above and execute it in the desired directory. The command will create a detailed docker-compose.yml file with the previously selected configuration.
You can then create and spin up the container with the following command:
docker-compose up -d
Note: This will download the files required to create the container, which is roughly 8GB. Do make sure you have the required storage space.
Demonstrating the Prototype
Using Vue, I have coded a barebones page to demonstrate how Weaviate can be used as an image search engine.
The “Upload” button will upload the selected image onto the website, which will be vectorized into the Weaviate database.
The “Search” button will also vectorize the uploaded image, but it will then search the database for the most similar image and present it back to the user.
We will be using the following set of images:
Let’s start by uploading the 1st image of the peacock.
We will then use the second image to search.
As you can see, the database has fetched the most similar image, which is the following:
Searching with the third image returns the same result, showing that the 1st image is the most similar.
Conclusion
With the above prototype demonstration, we have shown how vector databases can be utilized to perform image similarity search, and thus create our very own image search engine. Vector databases can be used for many other use cases, from managing vast image libraries, e-commerce catalogs, or even AI-driven applications. If you’re exploring ways to integrate such systems, feel free to contact us—let’s start a conversation on how to realize the solution that is right for your business.
