Bland AI released a brand-new **Bland TTS, declaring it to be **the first “Uncanny Valley” product.
- Treasure Valley: It means that AI’s voice or face is not perfect when it’s not perfect. Bland TTS claims it’s broken, making AI’s voice ** almost impossible to distinguish from the real person**.
Bland TTS** only requires a short audio ** to:
At its core is the use of large language models (LLMs) for direct voice generation, rather than relying on traditional layer-by-storey structures. The system has unprecedented emotional expression, style control, multi-talker understanding, non-verbal sound generation, and has achieved more real, controlled, and contextualized speech synthesis through self-researched audio Token systems (SNACs).
Activate bright
#1 Style Transfer
##2 Sound Generations
Not only can synthesizing languages, but can also produce sound effects**, such as:
simulates laughter on behalf of dog barking as long as you provide a punctuated text and audio examples, the model will remember the correspondence.
#3 # Voice Blending
By providing multiple voice examples, the system automatically “combines” a new voice, preserving the identity of multiple speakers and maintaining a consistent tone.
- Brand voice design;
- Unanimous multilingual output;
- Virtual image role creation.
# # 4 #
The system is no longer word-for-word, but really changes the tone from context to context.
- More rational technical orientation;
- Comfortable content is warmer;
- Questions and answers are more natural.
# Core technology: reshaping traditional TTS processes
** The pain of the traditional TTS**
In the past, TTS was a waterline approach:
Text # Sylvester # Rhythm # Wave # Synthetic sound
Each step can be wrong, and the end effect is often “lack of emotion, sound splitting.” This is because traditional methods** are to understand content first and then to “assemble” the voice** and it is difficult to communicate the tone and emotions naturally.
** Programme Bland: integrated modelling**
The new Bland AI technology, which connects the entire process, uses ** Large Language Models to directly predict sound**, as follows:
Text input Model output " Audio Token" directly and then restore to real sound
It's like, "You tell it what it says, it makes a voice out of the tone and emotion of understanding" instead of a collator to progressively "translate."
# # A breakthrough at the data level: a thousand times higher
The bottom of any generation system is data quality. The Bland team believes that public voice data is not enough, especially for real dialogue modelling.
They constructed a large-scale voice data set for **the industry,** with the following characteristics:

# Technology architecture core: from text LLM to voice LLM
# # The common thinking of LLM #
The traditional LLM approach is:
Cut the text into Token. Learn to predict the next Token to restore it to full sentence.
Bland's method:
Sever text to predict the corresponding " Audio Token" and restore it to voice wave form
Here's ** Audio Token** is a discrete expression of SNAC coding, taking into account:
- Macro beats (e.g. speed of speech, pause);
- Micro-details (e.g. pronunciation, sounds).
This approach allows the model to really master the “content plus expression” at the same time, right and right.
# **Application scene and user population**
# 1. Creatives
- Turn text into a real AI voice or sound**
- Support** fine control styles and emotions**
- Design scenes suitable for content such as podcasting, audio programming, audio novels, films, etc.
#2. # Developers
- Access your application via API
- Products used to construct custom voice functions (e.g. voice assistants, educational products, broadcasting systems, etc.)
##3. # Enterprise users
- Construction of commercial voice services such as **AI customer service systems, telephone assistants, etc.**
- The sound is natural. The client will even keep it as a contact.
- A dialogue with AI can be tried directly on the website**
Official presentation: https://www.bland.ai/blogs/new-tts-announcement
Quick Start Link:
- Developer portal: https://t.co/qBpGkJh2Gp
- Enterprise portal: https://t.co/Szf9KNwfHs