NEW YORK, Oct. 21, 2025 (GLOBE NEWSWIRE) -- A comprehensive new study by Springboards , an AI platform inspiring creativity in advertising, found that popular AI tools like ChatGPT, Gemini, Claude and others perform much more similarly on creative tasks than many people think. Creativity Benchmark , conducted in collaboration with the 4As , ACA , APG , D&AD , IAA , IPA , and The One Club for Creativity , challenges the idea that there's a single "best" AI tool for creative work and shows agencies need more efficient ways to test AI tools for their specific needs.

Sixteen different AI systems – from OpenAI, Google, Anthropic, Meta, DeepSeek, Alibaba and others – were tested on real marketing challenges across 100 notable brands. Over 600 creative professionals from ad agencies, marketing teams, and strategy firms made over 11,000 comparisons to see which ones worked best. The biggest surprise? There was no clear winner. The differences between the "best" and "worst" AI tools were much smaller than expected.

"Everyone assumes some AI tools are way better than others for creative work," said Pip Bingemann, CEO and co-founder of Springboards. "But our tests showed the results were pretty close. Why? Because these models are machines designed to recognize patterns and give you the most probable answer—and 'probable' has never been called 'creative.' Keeping humans in the loop and optimizing for a wider range of varied ideas is crucial.”

The study looked at three types of creative challenges: finding surprising insights about consumers, creating big campaign ideas, and coming up with bold, attention-grabbing concepts.

Key Findings:

Different AI Tools Win at Different Tasks: No single AI system was best at everything. Some were better at strategic thinking, others at wild, creative ideas. This means agencies might want to use different tools for different jobs.

No single AI system was best at everything. Some were better at strategic thinking, others at wild, creative ideas. This means agencies might want to use different tools for different jobs. Variety of Ideas Matters Most: Some AI tools generated lots of different creative options for the same brief. Others kept suggesting similar ideas over and over. For real creative work, having many different options is just as important as having good ones.

Some AI tools generated lots of different creative options for the same brief. Others kept suggesting similar ideas over and over. For real creative work, having many different options is just as important as having good ones. AI Can't Judge Creative Work Well: When researchers had AI systems evaluate creative ideas, they gave very different scores than human experts. This means agencies can't rely on AI to pick the best creative concepts – they still need human judgment.

When researchers had AI systems evaluate creative ideas, they gave very different scores than human experts. This means agencies can't rely on AI to pick the best creative concepts – they still need human judgment. Standard Creativity Tests Don't Work for Marketing: Traditional creativity tests used in psychology don't predict which AI will be better at marketing-specific creative tasks. Brand work requires its own way of measuring creativity.

Traditional creativity tests used in psychology don't predict which AI will be better at marketing-specific creative tasks. Brand work requires its own way of measuring creativity. Creative Preferences Vary by Location: Interestingly, creative professionals in different countries preferred different AI tools, suggesting that cultural differences affect what people consider good creative work.



“LLMs aren’t a one-size-fits-all solution—they're general purpose tools that require human creativity to unlock breakthrough outcomes," said Jeremy Lockhorn, SVP, Creative Technologies & Innovation, 4As. "These findings suggest agencies and brands should continue to evaluate which models are best suited for creative work - and that a multi-model approach may well be the best path forward."

“This study highlights that creativity isn’t about which AI you use, it’s about how you use it,” remarked Tony Hale, CEO, Advertising Council Australia. “The results reinforce what we see across the industry: the human spark remains essential to transforming good ideas into great ones. For agencies, the real opportunity is learning how to collaborate with these systems to expand, not replace, creative thinking.”

Methodology

The study involved 678 advertising professionals of diverse backgrounds, who participated in blind A/B idea judgments, likened to a "Tinder for Ideas." The data, collected over four weeks starting June 10, 2025, comprised 11,012 human comparisons across various brands, prompts, and models. This was analyzed using Bradley-Terry modeling and cosine distance for diversity scoring.

The research used four different ways to test AI creativity:

Real Creative Professionals Made the Calls: Nearly 700 people working in advertising, marketing, and strategy compared AI-generated ideas side-by-side. They didn't know which AI created which idea, so they couldn't play favorites. The study covered ideas for 100 major brands across 12 different business categories.

Nearly 700 people working in advertising, marketing, and strategy compared AI-generated ideas side-by-side. They didn't know which AI created which idea, so they couldn't play favorites. The study covered ideas for 100 major brands across 12 different business categories. Tested How Many Different Ideas AI Can Create: Researchers asked each AI system to create 10 different responses to the same creative brief, then measured how different those responses were from each other. Some AI tools generated very similar ideas every time, while others came up with lots of variety.

Researchers asked each AI system to create 10 different responses to the same creative brief, then measured how different those responses were from each other. Some AI tools generated very similar ideas every time, while others came up with lots of variety. Checked If AI Can Judge Its Own Work: The team had three leading AI systems evaluate the same creative ideas that humans had already scored, to see if AI judges agreed with human experts. They didn't.

The team had three leading AI systems evaluate the same creative ideas that humans had already scored, to see if AI judges agreed with human experts. They didn't. Tried Standard Creativity Tests: The AI systems took adapted versions of creativity tests that psychologists use on humans, measuring things like how many ideas they generate and how original those ideas are.

All tests used the same settings and compared current AI systems from companies like OpenAI, Google, Anthropic, and Meta.

To access the full research white paper, visit https://arxiv.org/abs/2509.09702 .

