Back to Blogs

The Hidden Costs of Internal AI Data Tools: Why SwingVision Switched 

Summarize with AI
September 9, 2025
|
5min read
blog image

For many engineering leaders, the decision to build internal tools versus buying a solution seems like a no brainer when internal engineering resources are available. When early-stage AI organisations begin developing their model, building in-house seems cheaper and  more flexible. But as the team at SwingVision discovered, after three years of maintaining their own AI data infrastructure, the true costs of in-house tools added up. The real cost for the team was not only engineering resources, but also how the open-source tooling struggled to keep up as demand for the product quickly grew. For example, while your team is busy fixing bugs and maintaining infrastructure, competitors using purpose-built platforms are deploying models more quickly. 

In this article we are going to dive into how the world’s leading AI-powered racket sports app realized that internal tooling was costing them more than and why they ultimately bought a scalable data info platform. More importantly, this is a decision many CTOs and engineering leaders face when evaluating build-versus-buy tradeoffs.

The Allure of Internal Tooling

When SwingVision was in its early stages, building in-house made sense. As CEO and Co-Founder, Swupnil Sahai explained, funding was limited, the engineering team was small, and open-source tools like CVAT seemed like a good option to get annotation workflows off the ground. CVAT felt like a smart bootstrap decision. It was a way to use internal engineering resources and build data infrastructure to develop the proprietary model. And for the first two years, the open-source route worked. The open-source solution handled annotations and the team could develop new features, while keeping costs low. But as the company scaled, cracks in the in-house tooling began to show.

Where In-House Solutions Could Not Keep Up

Maintenance Burden and Technical Debt

SwingVisions’s forked version of an open-source annotation tool became increasingly difficult to maintain and update, requiring engineers from other teams to help with basic functionality. This became especially apparent as they looked to expand into other racket sports like racketball. With the in-house solution, it was difficult for their team to add new ontologies or change the data infrastructure when flexibility was needed with expanding markets. 

Limited Data Visibility and Understanding

As their dataset started to grow, it became more important for the SwingVision team to visualise their data. With their in-house tooling, they couldn't effectively analyze their growing dataset to understand model performance across different scenarios or identify problematic examples. And in applications like tennis, and sports, where many edge cases could crop up, it was important for the team to be able to curate their data across different environments (ex: sunny vs dark conditions, clay courts vs other court types, etc) and train the model accordingly. 

Inability to Scale Data Analysis

Their infrastructure limitations meant they could only view data they were actively labeling, preventing them from understanding the full composition of their dataset and identifying issues like deprecated camera angles. However, understanding and identifying edge cases of this nature is key to improving model performance at scale. 

Failed Internal Development Projects

Their attempts to build internal data visualization and analysis tools consumed significant development time but never reached a state where they were actually adopted and used across the company, representing a failed investment of engineering resources.

These issues collectively prevented SwingVision from effectively identifying edge cases, understanding their data distribution, and iterating quickly on model improvements,  all critical capabilities for scaling their AI system.

Recognizing the Tipping Point

So when should a company stop building and start buying? SwingVision’s experience highlights the tipping points every engineering leader should watch for:

  • Product-market fit achieved and customer expectations rising
  • Data volumes too large to manually inspect
  • Increasing complexity of features (e.g., supporting multiple domains)
  • Accuracy requirements too strict for “good enough” workflows
  • Engineering time spent on tooling exponentially increasing

Once these conditions emerge, AI leaders should think about evaluating external platforms that can scale, improve efficiency and have robust data curation capabilities. 

The Impact of Switching

First, SwingVision was able to visualize their entire dataset and sort by various criteria and metrics, which was extremely difficult to build in-house.

As CEO, Swupnil Sahai expressed: " I think just being able to visualize your data is like so key... just being able to like sort by different things like different criteria, different metrics, um, that's that's just like massive,  so for us in particular that that was a big part of it"

Encord also immediately revealed composition issues in their dataset that they were previously unaware of, allowing them to prune problematic data. This had great positive impact on the accuracy of their model, which is crucial when calling sports wins and losses.

Additionally, they could identify edge cases, like clay courts, by analyzing model embeddings and finding natural clusters, without having to manually label court surfaces or train separate classification models. 

Additionally, Encord eliminated collision issues between annotators and provided proper task prioritization through queues, replacing their previous "hacky" workarounds. Especially as AI teams scale and the demand of their model or tool increases, they need a data platform that can not only support their data but also their team to grow and work efficiently.

Finally, they gained the ability to systematically identify and analyze high-loss examples and understand model performance across different scenarios, giving them clear direction for improvement efforts.

These benefits enabled SwingVision to move from having "vague intuitions" about their dataset to having concrete, actionable insights that directly informed their model development and retraining strategies.

Lessons for CTOs and Engineering Leaders

SwingVision’s journey underscores that tooling isn’t just a technical choice, it’s a strategic business decision. The cheapest path upfront can become the most expensive at scale.

Key lessons for leaders:

  • Track the true cost of internal tools, including opportunity cost
  • Define adoption and time allocation thresholds that trigger external evaluations
  • Plan migration strategies before hitting crisis points
  • Treat data tooling as a business investment, not a side project

Ultimately, the companies that win with AI are those that keep their best engineers focused on core innovation, not reinventing infrastructure. By switching away from in-house, open-source tooling,, SwingVision was able to double down on what truly mattered: delivering the world’s most accurate and trusted AI-powered sports experiences.

Explore our products