WellSaid Labs Invents a Game Changer AI Voice Model for Content Creators

Latest AI Voice Model Considers Context, Not Just Sounds


SEATTLE, Aug. 11, 2022 (GLOBE NEWSWIRE) -- SEATTLE, August 11 - WellSaid Labs, the leading AI text-to-speech technology company, has invented the most natural speech markup language available to content creators. WellSaid Labs’ entirely new respelling system allows a content creator the ability to give precise instructions to the AI, delivering more control over word pronunciation and desired emphasis. With this more intuitive AI able to capture the natural human performances in a voice actors’ delivery, the AI can now more freely predict how the actual voice actor would have read such content, delivering companies and content creators huge time savings.

Improved pronunciation, intonation, and user controls

Up until now, the Text to Speech (TTS) industry has only relied on a phonetic layer dictating how to pronounce words. However, voice-actors don’t read phonemes, they read graphemes, and now so does the WellSaid Labs model as well as having a pronunciation layer.  Having only phonetic transcription can limit a model’s breadth of knowledge and therefore limit its ability to predict the pronunciation and delivery of new and unique words. Also it is difficult to empower users with a consistent system for guiding a voice avatar to pronounce words according to the user’s preferences, such as with correct vowel sounds and syllabic emphasis. WellSaid Labs has made an enormous breakthrough in overcoming these limitations. 

“Customer feedback on using our new voice model is incredible,” says Rhyan Johnson, WellSaid Labs Senior Voice Data Engineer. “Using our new respelling system, content creators love the fact that words are being pronounced the way they choose, with the right intonation or regional preference to meet their brand’s voice identity. You say tomato, I say tomahto. And, so do the WellSaid Labs' voice avatars.”

New model focuses on improving the voice avatar’s correctness

“More words are pronounced correctly, more often. Sentence intonation is generally more natural, including questions, which are tough for other systems. We’ve also created our own text verbalization model to empower the AI to be smarter with non-standard words such as a dollar amount, a year, or a phone number. And it also does better with specialized text and speaking URLs, acronyms, or abbreviations,” explained Johnson. 

WellSaid Labs’ Voice Avatars all come from real voice actors. Content creators now have even greater ability to ensure pronunciation and tone are exactly what they want whether narration, promotional, conversational or for a unique custom character. Users can now type $30M, or the year 2022, and the system should interpret the text correctly as “thirty million dollars” or “twenty twenty two,” instead of “dollar thirty M” or “two thousand and twenty two”, for example. Other text verbalization support includes:

  • Ordinals - 1st, 2nd, 10th, 30th
  • Times - 10:34 am, 10PM
  • Phone Numbers - (890) 345-1234, 1-888-CALL-NOW
  • Number Ranges - 1-3 as ‘one to three’
  • Percent, Number Signs - 12.3% as ‘twelve point three percent’, #1, #2 as ‘number 1, number 2’
  • URLs - wellsaidlabs.com, https://www.myurl.com/reference, name@company.com
  • Acronyms - NASA, OPEC 
  • Initialisms - ESPN, NSA
  • Abbreviations – 
    • Measurements (1 ft., 2 in., 4 oz.)
    • Titles (Mr. as ‘Mister’, ‘Sr.’ as ‘Senior’)
    • Others (St. for street, Apt.` for apartment, etc. as etcetera)
  • Generic Numbers and Symbols

WellSaid Labs powers the synthetic media industry along with more than 7,000 customers across dozens of industries and offers a faster, more accurate way to turn words into voice.  Customers value WellSaid Labs’ incredibly high level of realistic human-voice capability and rely on WellSaid Labs’ business mission critical enterprise infrastructure.  And now they will have an even easier experience using WellSaid Labs’ voice avatars.  

WellSaid Labs’ Voice Avatar library provides access to 50 AI voices companies can use for their productions. Many WellSaid Labs customers also choose to create their own AI Voice Avatars to spec — capturing the likeliness, style, and uniqueness of the voice needed to tell their stories in exactly the right way.

Last July, WellSaid Labs announced its Series A round.

About WellSaid Labs

WellSaid Labs is the leading AI text-to-speech technology company and first synthetic media service to achieve human-parity in voice. Creators, product developers, and brands alike power up their stories and digital experiences with a wide variety of voice styles, accents and languages — at scale. WellSaid Labs was named in the 2022 AI 100, CB Insights’ annual ranking of the most promising private startups in the AI space and landed on the Intelligent Applications Top 40. #IA40 www.ia40.com. For more information go to www.wellsaidlabs.com      

Additional Quote:

"For the most difficult phrases and words, WellSaid Studio customers have gained value from using the Pronunciation Library to save commonly used vernacular, " says Rhyan Johnson, Senior Voice Data Engineer, "Now, Studio has an even more powerful pronunciation tool: Respelling. Some AI voice providers ask customers to learn the complicated and archaic International Phonetic Alphabet of symbols to specify pronunciation. WellSaid Labs went in a different direction. Now, content creators have a Respelling catalog of English sound combinations to use that is approachable and intuitive.”

Attachment

 
WellSaid Labs AI Voice Model Considers Context

Coordonnées