New Study Finds Over 96% of Computer Vision (CV) Teams Already Using Synthetic Data for Training and Testing of Visual Machine Learning Models

A survey by Datagen reveals widespread adoption of synthetic data throughout the CV field to advance AI/ML applications

TEL AVIV, Israel, Dec. 21, 2021 (GLOBE NEWSWIRE) -- Datagen, the leader in synthetic data generation on a mission to bring data simulation to every computer vision engineer, today announced the release of a new research study, “Synthetic Data: Key to Production-Ready AI in 2022,” exploring training data in the field of Computer Vision (CV). The study reveals a once fragmented field beginning to coalesce around the promise of synthetic data to help mitigate frequent project delays and cancellations.

The study emphasizes that training data has become a significant stumbling block for computer vision professionals, who cited a number of data-related complications hindering their organization’s progress in CV. Among the data-related issues experienced, the most prevalent were:

  • Wasted time and/or resources caused by a need to retrain the system often (52%)
  • Poor annotation resulting in quality issues (48%)
  • Poor data coverage of the intended application’s domain (47%)
  • Lack of sufficient amount of data (44%)

All four of these problems can seriously jeopardize a project’s progress, making their widespread presence of significant concern to CV teams. As a result of these issues, the overwhelming majority of computer vision teams struggle with frequent, lengthy project delays, and even outright cancellations. Inadequate training data has led to an environment in which:

  • 99% of respondents have experienced project cancellations
  • 80% have experienced project delays lasting at least 3 months
  • 33% have experienced project delays lasting 7 months or more   

The frequency, length, and ubiquity of data-driven project disruptions in the field of computer vision are immense. However, the study also revealed several trends that indicate a growing appetite for synthetic data. The research revealed that a staggering 96% of computer vision teams reported already using synthetic data in the training and testing of their computer vision models.

Based on the survey findings, this surge in synthetic data adoption can be attributed to the fact that its many benefits are both broadly understood and broadly experienced by the computer vision community. For example, when asked what the primary motivation was behind their organization’s use of synthetic data, CV teams reported testing, training, and addressing edge-cases in near equal measure. Similarly, when asked about their first-hand experience, respondents reported experiencing the following benefits of synthetic data:

  • Reduced time-to-production (40%)
  • Elimination of privacy concerns (46%)
  • Reduced bias (46%)
  • Fewer annotation and labeling errors (53%)
  • Improvements in predictive modeling (56%)

“Synthetic data is the future of data. This is the new way to control and consume the data our AI systems need,” said Ofir Chakon, founder and CEO of Datagen. “As simulation gets better over time, with all its benefits, it will take over the place of labor-intensive manual data collection that is no longer scalable at the speed the world is evolving.”

The survey, which was commissioned by Datagen and conducted by Wakefield Research, polled 300 computer vision professionals, from 300 unique organizations across a variety of industries. The survey set out to better understand how computer vision teams obtain and use AI/ML training data for computer vision systems and applications, and how these choices impact their work. The accompanying report also features commentary and insights from leading industry experts and innovators. To access the full report:

About Datagen
Datagen is leading the AI revolution by generating synthetic data to train computer vision systems, with expertise in creating data for human-centric computer vision applications. We developed a self-serve synthetic data generation technology that delivers visual data with unmatched domain coverage and high-variance. Using our platform, CV teams generate high-fidelity 3D data with associated ground truth, in a seamless and scalable way. Datagen customers include Fortune 100 companies across a variety of industries including AR/VR, Security, Automotive, Robotics and more. Founded in 2018, Datagen is led by recognized AI experts and is backed by AI industry luminaries. For more information, visit

Media Contact:
Kelsey Bates
Scratch Marketing + Media for Datagen