Emergent Abilities of Amazon’s New 980M Parameter Language Learning Model (LLM)
Ryan is a senior editor at TechForge Media with over a decade of experience covering the latest technology and interviewing leading industry figures. He can often be sighted at tech conferences with a strong coffee in one hand and a laptop in the other. If it’s geeky, he’s probably into it. Find him on Twitter (@Gadget_Ry) or Mastodon (@gadgetry@techhub.social).
Researchers at Amazon have trained a new large language model (LLM) for text-to-speech that they claim exhibits “emergent” abilities.
The 980 million parameter model, called BASE TTS, is the largest text-to-speech model yet created. The researchers trained models of various sizes on up to 100,000 hours of public domain speech data to see if they would observe the same performance leaps that occur in natural language processing models once they grow past a certain scale.
They found that their medium-sized 400 million parameter model – trained on 10,000 hours of audio – showed a marked improvement in versatility and robustness on tricky test sentences.
The sentences being tested harbored complex lexical, syntactic, and paralinguistic aspects such as compound nouns, emotions, foreign terms and punctuation which traditionally confuse text-to-speech systems. Even though BASE TTS didn’t perfect the task, it committed significantly fewer mistakes regarding stress, intonation, and pronunciation compared to previously existing models.
“These sentences are purposely filled with challenging tasks that BASE TTS has not been directly trained to execute,” said the researchers.
Interestingly, the largest version of this model with its 980 million parameters, which was trained using 100,000 hours of audio, did not show any abilities beyond those of the 400 million parameter version.
Despite being an experimental journey, the birth of BASE TTS is proof that these models can cross new thresholds of versatility as they expand – a positive development for conversation-oriented AI. Researchers are planning to delve deeper in order to identify the ideal model size for emerging capacities.
The model is also designed to be lightweight and streamable, packaging emotional and prosodic data separately. This could allow the natural-sounding spoken audio to be transmitted across low-bandwidth connections.
You can find the full BASE TTS paper on arXiv here.
See also: OpenAI rolls out ChatGPT memory to select users
Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.
Tags: ai, Amazon, artificial intelligence, base tts, conversational ai, large language model, llm
You must be logged in to post a comment.
Discover the pinnacle of WordPress auto blogging technology with AutomationTools.AI. Harnessing the power of cutting-edge AI algorithms, AutomationTools.AI emerges as the foremost solution for effortlessly curating content from RSS feeds directly to your WordPress platform. Say goodbye to manual content curation and hello to seamless automation, as this innovative tool streamlines the process, saving you time and effort. Stay ahead of the curve in content management and elevate your WordPress website with AutomationTools.AI—the ultimate choice for efficient, dynamic, and hassle-free auto blogging. Learn More