Search Anything Model: Combining Vision and Natural Language in Search

Frederik Hvilshøj
June 20, 2023
5 min read
blog image

In the current AI boom, one thing is certain: data is king. 

Data is at the heart of the production and development of new models; and yet, the processing and structuring required to get data to a form that is consumable by modern AI are often overlooked. 

One of the most primordial elements of intelligence that can be leveraged to facilitate this is search. Search is crucial to understanding data: the more ways to search and group data, the more insights you can extract. The greater the insights, the more structured the data becomes. 

Historically, search capabilities have been limited to uni-modal approaches: models used for images or videos in vision use cases have been distinct from those used for textual data in natural language processing. With GPT-4’s ability to process both images and text, we are only now starting to see the potential impacts of performant multi-modal models that span various forms of data.

Embracing the future of multi-modal data, we propose the Search Anything Model. The unified framework combines natural language, visual property, similarity, and metadata search together in a single package. Leveraging computer vision processing, multi-modal embeddings, LLMs, and traditional search characteristics, Search Anything allows for multiple forms of structured data querying using natural language.

If you want to find all bright images with multiple cats that look similar to a particular reference image, Search Anything will match over multiple index types to retrieve data of the requisite form and conditions.

What is Natural Language Search?

Natural Language Search (NLS) uses human-like language to query and retrieve information from databases, datasets, or documents. Unlike traditional keyword-based searches, NLS algorithms employ Natural Language Processing (NLP) techniques to understand the context, semantics, and intent behind user queries.

By interpreting the query’s meaning, NLS systems provide more accurate and relevant search results, mimicking how humans communicate. The computer vision domain requires a similar general understanding of data content without requiring metadata for visuals. 

💡Encord is a data-centric computer vision company. With Encord Active, you can use the Search Anything Model to explore, curate, and debug your datasets.

What Can You Use the Search Anything Model for?

Let’s dive into some examples of computer vision uses for the Search Anything Model.

Data Exploration

Search Anything simplifies data exploration by allowing users to ask questions in plain language and receive valuable insights. 

Instead of manually formulating complex queries and algorithms that may require pre-existing metadata, you can pose questions such as:

“Which images are blurry?” 


“How is my model performing on images with multiple labels?” 

Search Anything interprets these queries to provide visualizations or summaries of the data quickly and effectively to gain valuable insights. 

Training CTA Asset
Build better ML models with Encord
Book a live demo

Data Curation

Search Anything streamlines data curation, making the process highly efficient and user-friendly. Filter, sort, or aggregate data using only natural language commands

For example, you can request the following: 

“Remove all the very bright images from my dataset” 


“Add an ‘unannotated’ tag to all the data that has not been annotated yet.” 

Search Anything processes these commands, automatically performs the requested actions, and presents the curated data all without complex coding or SQL queries.

Encord Active

Using Encord Active to filter out bright images in the COCO dataset. Use the bulk tagging feature to tag all the data.

Data Debugging

Search Anything expedites the process of identifying and resolving data issues. 

To investigate anomalies to inconsistencies, ask questions or issue commands such as:

 “Are there any missing values for the image difficulty quality metric?” 


 “Find records that are labeled ‘cat’ but don’t look like a typical cat.”

Once again, Search Anything analyzes the data, detects discrepancies, and provides actionable insights to assist you in identifying and rectifying data problems efficiently.

Cataloging Data for E-commerce

Search Anything can also enhance the cataloging process for e-commerce platforms. By understanding product photos and descriptions, Search Anything enable users to search and categorize products efficiently, users can ask: .

“Locate the green and sparkly shoes.” 

Search Anything interprets this query, matches the desired criteria with the product images and descriptions, and displays the relevant products, facilitating improved product discovery and customer experience.

How to Use Search Anything Model with Encord?

At Encord, we are building an end-to-end visual data engine for computer vision. Our latest release, Encord Active, empowers users to interact with visual data only using natural language. 

Let’s dive into a few use cases: 

Use Case 1: Data Exploration

User Query: “red dress,” “denim jeans,” and “black shirts” 

Encord Active identifies the images in the dataset that most accurately corresponds to the query. 

Use Case 2: Data Curation

User query: “Display the very bright images” 

Encord Active displays filtered results from the dataset based on the specified criterion.

Use Case 3: Data Debugging

User Query: “Find all the non-singular images?” 

Encord Active detects any duplicated images in the dataset, and displays images that are not unique within the dataset. 

Can I Use My Own Model?

Yes, Encord Active allows you to leverage your models. Through fine-tuning or integrating custom embedding models, you can tailor the search capabilities to your specific needs, ensuring optimal performance and relevance. 

💡At Encord, we are actively researching how to fine-tune LLMs for the purpose of searching Encord Active projects efficiently. Get in touch if you would like to get involved.

Scale your annotation workflows and power your model performance with data-driven insights
medical banner


Natural Language Search is revolutionizing the way we interact with data, enabling intuitive and efficient exploration, curation, and debugging. 

By harnessing the power of NLP and computer vision models, our Search Anything Model allows you to pose queries, issue commands, and obtain actionable insights using human-like language. Whether you are an ML engineer, a data scientist, or an e-commerce professional, incorporating NLS into your workflow can significantly enhance productivity and unlock the full potential of your data.

Written by Frederik Hvilshøj
Frederik is the Machine Learning Lead at Encord. He has an extensive computer vision and deep learning background and has completed a Ph.D. in Explainable Deep Learning and Generative Models at Aarhus University, and published research in Efficient Counterfactuals from Invertible Neural Ne... see more
View more posts
cta banner

Build better ML models with Encord

Get started today
cta banner

Discuss this blog on Slack

Join the Encord Developers community to discuss the latest in computer vision, machine learning, and data-centric AI

Join the community

Software To Help You Turn Your Data Into AI

Forget fragmented workflows, annotation tools, and Notebooks for building AI applications. Encord Data Engine accelerates every step of taking your model into production.