Author
Akruti Acharya
Akruti is a data scientist and technical content writer with a M.Sc. in Machine Learning & Artificial Intelligence from the University of Birmingham. She enjoys exploring new things and applying her technical and analytical skills to solve challenging problems and sharing her knowledge and experience with her readers.

All blogs by Akruti Acharya

NVLM 1.0: NVIDIA's Open-Source Multimodal AI Model

Automate Text Labeling for Your Image Dataset: A Step-by-Step Guide

Exploring the RarePlanes Dataset

The Complete Guide to Object Tracking [Tutorial]
15 m

Fine-Tuning VLM: Enhancing Geo-Spatial Embeddings

Meta-Transformer: Framework for Multimodal Learning

How To Detect Data Drift on Datasets

How to Choose the Right Data for Your Computer Vision Project
12 m

Object Classification with Caltech 101
7 m

How SAM 2 and Encord Transforms Video Annotation

Visual Foundation Models vs. State-of-the-Art: Exploring Zero-Shot Object Segmentation with Grounding-DINO and SAM

Meta AI's CoTracker: It is Better to Track Together for Video Motion Prediction

The Complete Guide to Image Annotation for Computer Vision
7 m

6 Steps to Build Better Computer Vision Models
12 m

FastViT: Hybrid Vision Transformer with Structural Reparameterization

Text2Cinemagraph: Synthesizing Artistic Cinemagraphs

Med-PaLM: Google Research’s Medical LLM | Explained

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement | Explained

Tracking Everything Everywhere All at Once | Explained

OpenAI o1: A New Era of AI Reasoning

MEGABYTE, Meta AI’s New Revolutionary Model Architecture, Explained

Dual-Stream Diffusion Net for Text-to-Video Generation

Top 10 Open Source Datasets for Machine Learning

Llama 2: Meta AI's Latest Open Source Large Language Model

Slicing Aided Hyper Inference (SAHI) for Small Object Detection | Explained

What is Vector Similarity Search?
5 m

Apple’s MM1.5 Explained

Vision Fine-Tuning with OpenAI's GPT-4: A Step-by-Step Guide
5 m

CVPR 2024: Top Artificial Intelligence and Computer Vision Papers Accepted

Improving Training Data with Outlier Detection

How to Build Your First Machine Learning Model
14 m

DINOv3 Explained: Scaling Self-Supervised Vision Transformers

Improving Data Quality Using End-to-End Data Pre-Processing Techniques in Encord Active
10 m

Gemini Robotics: Advancing Physical AI with Vision-Language-Action Models

Intralogistics: Optimizing Internal Supply Chains with Automation

How to use SAM to Automate Data Labeling in Encord
8 m

What is Data Labeling? The Ultimate Guide [2024]
8 m

Google Launches Gemini, Its New Multimodal AI Model

Comparative Analysis of YOLOv9 and YOLOv8 Using Custom Dataset on Encord Active
8 m

How Poor Data is Killing Your Models and How to Fix It

9 Ways to Balance Your Computer Vision Dataset
15 m

Data Classification 101: Structuring the Building Blocks of Machine Learning

The Python Developer's Toolkit for PDF Processing

YOLOv9: SOTA Object Detection Model Explained
8 m

Introduction to Vision Transformers (ViT)

What to Expect From OpenAI’s GPT-Vision vs. Google’s Gemini

OpenAI Releases New Text-to-Video Model, Sora
3 m

Meta’s V-JEPA: Video Joint Embedding Predictive Architecture Explained
8 m

Meta AI’s I-JEPA, Image-based Joint-Embedding Predictive Architecture, Explained
10 m

Exploring GPT-4 Vision: First Impressions

Florence-2: Microsoft's New Foundation Model Explained

Qwen-VL and Qwen-VL-Chat: Introduction to Alibaba’s AI Models
8 m

LLaVA, LLaVA-1.5, and LLaVA-NeXT(1.6) Explained

Segment Anything Model 2 (SAM 2) & SA-V Dataset from Meta AI

Meta’s Llama 3.1 Explained

GPT-4 Vision vs LLaVA

Instance Segmentation in Computer Vision: A Comprehensive Guide
7 m

How to Use GPT-4o for Model Development with Encord

Exploring Vision-based Robotic Arm Control with 6 Degrees of Freedom
8 m

Mistral Large Explained
5 m

Overfitting in Machine Learning: How to Detect and Avoid Overfitting in Computer Vision?
8 m

Ray-Ban Meta Smart Glasses are Getting an Upgrade with Multimodal AI
5 m

Phi-3: Microsoft’s Mini Language Model is Capable of Running on Your Phone
8 m

Diffusion Transformer (DiT) Models: A Beginner’s Guide
8 m

MM1: Apple’s Multimodal Large Language Models (MLLMs)
10 m

Stable Diffusion 3: Multimodal Diffusion Transformer Model Explained
10 m

YOLO World Zero-shot Object Detection Model Explained
10 m

Claude 3 | AI Model Suite: Introducing Opus, Sonnet, and Haiku
10 m

Apple Vision PRO - Extending Reality to Radiology
8 m

Mistral 7B: Mistral AI's Open Source Model

MiniGPT-v2 Explained