Author

Akruti Acharya

Akruti is a data scientist and technical content writer with a M.Sc. in Machine Learning & Artificial Intelligence from the University of Birmingham. She enjoys exploring new things and applying her technical and analytical skills to solve challenging problems and sharing her knowledge and experience with her readers.

Akruti Acharya

All blogs by Akruti Acharya

sampleImage_nvlm-nvidia-open-source-multimodal-ai-model
NVLM 1.0: NVIDIA's Open-Source Multimodal AI Model
sampleImage_automate-text-labeling
Automate Text Labeling for Your Image Dataset: A Step-by-Step Guide
sampleImage_rare-planes-dataset-guide
Exploring the RarePlanes Dataset
sampleImage_object-tracking-guide
The Complete Guide to Object Tracking [Tutorial]

15 m

sampleImage_fine-tuning-vlm-enhancing-geo-spatial-embeddings
Fine-Tuning VLM: Enhancing Geo-Spatial Embeddings
sampleImage_meta-transformer-explained
Meta-Transformer: Framework for Multimodal Learning
sampleImage_detect-data-drift
How To Detect Data Drift on Datasets
sampleImage_choose-the-best-data-guide-computer-vision
How to Choose the Right Data for Your Computer Vision Project

12 m

sampleImage_object-classification-caltech-101
Object Classification with Caltech 101

7 m

sampleImage_video-annotation-sam-2-encord
How SAM 2 and Encord Transforms Video Annotation
sampleImage_visual-foundation-models-vs-state-of-the-art-exploring
Visual Foundation Models vs. State-of-the-Art: Exploring Zero-Shot Object Segmentation with Grounding-DINO and SAM
sampleImage_cotracker-metai
Meta AI's CoTracker: It is Better to Track Together for Video Motion Prediction
sampleImage_image-annotation-guide
The Complete Guide to Image Annotation for Computer Vision

7 m

sampleImage_six-steps-to-building-to-building-better-computer-vision-models
6 Steps to Build Better Computer Vision Models

12 m

sampleImage_fastvit-vision-transformer
FastViT: Hybrid Vision Transformer with Structural Reparameterization
sampleImage_text-2-cinemagraph-explained
Text2Cinemagraph: Synthesizing Artistic Cinemagraphs
sampleImage_med-palm-explained
Med-PaLM: Google Research’s Medical LLM | Explained
sampleImage_tapir-explained
TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement | Explained
sampleImage_tracking-everything-everywhere-all-at-once-explained
Tracking Everything Everywhere All at Once | Explained
sampleImage_openai-o1
OpenAI o1: A New Era of AI Reasoning
sampleImage_meta-ai-megabyte-model-architecture-explained
MEGABYTE, Meta AI’s New Revolutionary Model Architecture, Explained
sampleImage_dual-stream-diffusion-net
Dual-Stream Diffusion Net for Text-to-Video Generation
sampleImage_open-source-datasets-ml
Top 10 Open Source Datasets for Machine Learning
sampleImage_llama2-explained
Llama 2: Meta AI's Latest Open Source Large Language Model
sampleImage_slicing-aided-hyper-inference-explained
Slicing Aided Hyper Inference (SAHI) for Small Object Detection | Explained
sampleImage_vector-similarity-search
What is Vector Similarity Search?

5 m

sampleImage_apples-mm1.5-explained
Apple’s MM1.5 Explained
sampleImage_vision-fine-tuning-with-openais-gpt-4
Vision Fine-Tuning with OpenAI's GPT-4: A Step-by-Step Guide

5 m

sampleImage_cvpr-2024
CVPR 2024: Top Artificial Intelligence and Computer Vision Papers Accepted
sampleImage_improving-training-data-with-outlier-detection
Improving Training Data with Outlier Detection
sampleImage_building-machine-learning-model-guide
How to Build Your First Machine Learning Model

14 m

sampleImage_dinov3-explained-scaling-self-supervised-vision-tr
DINOv3 Explained: Scaling Self-Supervised Vision Transformers
sampleImage_enhancing-data-quality-in-computer-vision
Improving Data Quality Using End-to-End Data Pre-Processing Techniques in Encord Active

10 m

sampleImage_gemini-robotics
Gemini Robotics: Advancing Physical AI with Vision-Language-Action Models
sampleImage_intralogistics
Intralogistics: Optimizing Internal Supply Chains with Automation
sampleImage_sam-automate-data-labeling-encord
How to use SAM to Automate Data Labeling in Encord

8 m

sampleImage_data-labeling-guide
What is Data Labeling? The Ultimate Guide [2024]

8 m

sampleImage_gemini-google-ai-model
Google Launches Gemini, Its New Multimodal AI Model
sampleImage_performanceyolov9-vs-yolov8-custom-dataset
Comparative Analysis of YOLOv9 and YOLOv8 Using Custom Dataset on Encord Active

8 m

sampleImage_improve-ai-models-data-quality
How Poor Data is Killing Your Models and How to Fix It
sampleImage_balance-computer-vision-datasets
9 Ways to Balance Your Computer Vision Dataset

15 m

sampleImage_data-classification
Data Classification 101: Structuring the Building Blocks of Machine Learning 
sampleImage_pdf-processing-in-python
The Python Developer's Toolkit for PDF Processing
sampleImage_yolov9-sota-machine-learning-object-dection-model
YOLOv9: SOTA Object Detection Model Explained

8 m

sampleImage_vision-transformers
Introduction to Vision Transformers (ViT)
sampleImage_gpt-vision-vs-gemini-expectations
What to Expect From OpenAI’s GPT-Vision vs. Google’s Gemini
sampleImage_open-ai-sora
OpenAI Releases New Text-to-Video Model, Sora

3 m

sampleImage_meta-v-jepa-explained
Meta’s V-JEPA: Video Joint Embedding Predictive Architecture Explained

8 m

sampleImage_i-jepa-explained
Meta AI’s I-JEPA, Image-based Joint-Embedding Predictive Architecture, Explained

10 m

sampleImage_gpt4-vision
Exploring GPT-4 Vision: First Impressions
sampleImage_florence-2-explained
Florence-2: Microsoft's New Foundation Model Explained
sampleImage_qwen-vl-large-scale-vision-language-models
Qwen-VL and Qwen-VL-Chat: Introduction to Alibaba’s AI Models

8 m

sampleImage_llava-large-language-vision-assistant
LLaVA, LLaVA-1.5, and LLaVA-NeXT(1.6) Explained
sampleImage_segment-anything-model-2-sam-2
Segment Anything Model 2 (SAM 2) & SA-V Dataset from Meta AI
sampleImage_llama-3-1-explained
Meta’s Llama 3.1 Explained
sampleImage_gpt-vision-vs-llava
GPT-4 Vision vs LLaVA
sampleImage_instance-segmentation-guide-computer-vision
Instance Segmentation in Computer Vision: A Comprehensive Guide

7 m

sampleImage_gpt-4o-for-model-development
How to Use GPT-4o for Model Development with Encord
sampleImage_robotic-arm-with-6-degrees-of-freedom-using-computer-vision
Exploring Vision-based Robotic Arm Control with 6 Degrees of Freedom

8 m

sampleImage_mistral-large-explained
Mistral Large Explained

5 m

sampleImage_overfitting-in-machine-learning
Overfitting in Machine Learning: ​​How to Detect and Avoid Overfitting in Computer Vision?

8 m

sampleImage_ray-ban-meta-smart-glasses-with-multimodal-ai-and-apple-music
Ray-Ban Meta Smart Glasses are Getting an Upgrade with Multimodal AI

5 m

sampleImage_microsoft-phi-3-small-language-model
Phi-3: Microsoft’s Mini Language Model is Capable of Running on Your Phone

8 m

sampleImage_diffusion-models-with-transformers
Diffusion Transformer (DiT) Models: A Beginner’s Guide

8 m

sampleImage_apple-mm1-multimodal-llm
MM1: Apple’s Multimodal Large Language Models (MLLMs)

10 m

sampleImage_stable-diffusion-3-text-to-image-model
Stable Diffusion 3: Multimodal Diffusion Transformer Model Explained

10 m

sampleImage_yolo-world-object-detection
YOLO World Zero-shot Object Detection Model Explained

10 m

sampleImage_claude-3-explained
Claude 3 | AI Model Suite: Introducing Opus, Sonnet, and Haiku

10 m

sampleImage_vision-radiology-apple-vision-pro-application
Apple Vision PRO - Extending Reality to Radiology

8 m

sampleImage_mistral-7b-open-source-llm-model
Mistral 7B: Mistral AI's Open Source Model
sampleImage_minigpt-v2-explained
MiniGPT-v2 Explained