Text-to-Speech (TTS) Feature
Overview
The TTS (Text-to-Speech) feature allows users to convert text messages into audio format using various AI voices. This feature is integrated into the chat interface and provides both real-time streaming and downloadable audio files.
Features
๐๏ธ Multiple Voice Options
- Zephyr: Clear, professional voice
- Puck: Friendly, casual tone
- Charon: Deep, authoritative voice
- Kore: Default balanced voice
- Fenrir: Dynamic, energetic voice
- Leda: Smooth, elegant voice
๐ Audio Formats
- Streaming Audio: Real-time audio playback
- WAV Downloads: High-quality downloadable files
- Progressive Loading: Audio streams as itโs generated
โก Integration Points
- Chat Interface: Convert assistant responses to speech
- Frontend API:
/api/tts-stream/endpoint for streaming - Backend Service: Direct TTS processing at
localhost:1956
User Interface
TTS Controls in Chat
The TTS feature appears as audio controls in the chat interface:
- Play Button: Convert message to speech and play
- Voice Selector: Choose from available voices
- Volume Control: Adjust playback volume
- Download Option: Save audio as WAV file
Voice Selection
Users can select different voices for different use cases:
- Presentations: Use Charon for authority
- Casual Chat: Use Puck for friendliness
- Default: Kore provides balanced tone
Technical Implementation
Frontend Integration
// TTS streaming endpoint
const response = await fetch('/api/tts-stream/', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
text: messageText,
voice: selectedVoice
})
});
Backend Processing
The backend provides multiple endpoints:
/tts/health- Service health check/tts/voices- Available voice list/tts- Generate downloadable WAV/tts/stream- Real-time audio streaming
Audio Pipeline
- Text Processing: Clean and prepare text for TTS
- Voice Synthesis: Generate audio using selected voice
- Streaming: Progressive audio delivery to frontend
- Playback: Browser audio playback with controls
API Reference
Get Available Voices
curl -X GET http://localhost:1956/tts/voices
Generate Speech
curl -X POST http://localhost:1956/tts \
-H "Content-Type: application/json" \
-d '{"text": "Hello world", "voice": "Zephyr"}' \
--output speech.wav
Stream Audio
curl -X POST http://localhost:1956/tts/stream \
-H "Content-Type: application/json" \
-d '{"text": "Streaming audio", "voice": "Kore"}' \
--output stream.wav
Configuration
Voice Settings
Default voice can be configured per user:
const defaultVoice = 'Kore';
const userPreferredVoice = getUserPreference('tts-voice') || defaultVoice;
Quality Settings
Audio quality parameters:
- Sample Rate: 22050 Hz (standard)
- Bit Depth: 16-bit
- Format: WAV (uncompressed)
- Mono: Single channel audio
Usage Examples
Converting Chat Messages
// Convert assistant response to speech
const convertToSpeech = async (messageText: string, voice: string) => {
const audio = await generateTTS(messageText, voice);
playAudio(audio);
};
Downloading Audio Files
// Save message as audio file
const downloadSpeech = async (text: string, voice: string) => {
const audioBlob = await fetch('/api/tts-stream/', {
method: 'POST',
body: JSON.stringify({ text, voice })
}).then(r => r.blob());
downloadFile(audioBlob, 'speech.wav');
};
Performance Considerations
Optimization
- Text Chunking: Long messages split into manageable chunks
- Caching: Frequently used phrases cached for faster delivery
- Progressive Loading: Audio streams while generating
- Compression: Efficient audio encoding for faster transmission
Limitations
- Text Length: Maximum ~500 characters per request
- Rate Limiting: Prevents API abuse
- Voice Availability: Some voices may have regional restrictions
- Network Dependent: Requires stable internet for streaming
Troubleshooting
Common Issues
No Audio Playback
- Check browser audio permissions
- Verify volume settings
- Test with different voice
Slow Generation
- Check network connection
- Try shorter text segments
- Use default voice for faster processing
Voice Not Available
- Verify voice name spelling
- Check available voices endpoint
- Use fallback to default voice
API Test Commands
1. Health Check
curl -X GET http://localhost:1956/tts/health
2. Get Available Voices
curl -X GET http://localhost:1956/tts/voices
3. Generate Speech (Download WAV file)
# Basic test with default voice (Kore)
curl -X POST http://localhost:1956/tts \
-H "Content-Type: application/json" \
-d '{"text": "Hello, this is a test of the text to speech system"}' \
--output test_speech.wav
# Test with specific voice
curl -X POST http://localhost:1956/tts \
-H "Content-Type: application/json" \
-d '{"text": "Hello world, this is Zephyr speaking", "voice": "Zephyr"}' \
--output zephyr_speech.wav
# Test with longer text
curl -X POST http://localhost:1956/tts \
-H "Content-Type: application/json" \
-d '{"text": "The quick brown fox jumps over the lazy dog. This is a longer sentence to test the text to speech capabilities.", "voice": "Puck"}' \
--output long_speech.wav
4. Stream Audio (Get raw audio data)
# Stream audio directly
curl -X POST http://localhost:1956/tts/stream \
-H "Content-Type: application/json" \
-d '{"text": "This is streamed audio", "voice": "Kore"}' \
--output stream_test.wav
# Stream with different voice
curl -X POST http://localhost:1956/tts/stream \
-H "Content-Type: application/json" \
-d '{"text": "Streaming with Charon voice", "voice": "Charon"}' \
--output charon_stream.wav
5. Error Testing
# Test missing text field
curl -X POST http://localhost:1956/tts \
-H "Content-Type: application/json" \
-d '{"voice": "Kore"}'
# Test invalid voice
curl -X POST http://localhost:1956/tts \
-H "Content-Type: application/json" \
-d '{"text": "Testing invalid voice", "voice": "InvalidVoice"}'
# Test empty JSON
curl -X POST http://localhost:1956/tts \
-H "Content-Type: application/json" \
-d '{}'
# Test no JSON body
curl -X POST http://localhost:1956/tts
6. Batch Testing Different Voices
# Test multiple voices quickly
voices=("Zephyr" "Puck" "Charon" "Kore" "Fenrir" "Leda")
for voice in "${voices[@]}"; do
echo "Testing voice: $voice"
curl -X POST http://localhost:1956/tts \
-H "Content-Type: application/json" \
-d "{\"text\": \"Hello, this is $voice speaking\", \"voice\": \"$voice\"}" \
--output "${voice,,}_test.wav"
echo "Saved to ${voice,,}_test.wav"
done
7. Performance Testing
# Time the request
time curl -X POST http://localhost:1956/tts \
-H "Content-Type: application/json" \
-d '{"text": "Performance test message", "voice": "Kore"}' \
--output performance_test.wav
# Test with verbose output to see response headers
curl -v -X POST http://localhost:1956/tts \
-H "Content-Type: application/json" \
-d '{"text": "Verbose test", "voice": "Kore"}' \
--output verbose_test.wav
Expected Responses
Successful Voice List Response:
{
"voices": ["Zephyr", "Puck", "Charon", ...],
"default": "Kore"
}
Successful Health Check:
{
"status": "healthy",
"service": "TTS API"
}
Error Response (Invalid Voice):
{
"error": "Invalid voice. Available voices: ['Zephyr', 'Puck', ...]"
}
Error Response (Missing Text):
{
"error": "Text field is required"
}
Notes:
- Replace
localhost:1956with your actual server address if different - WAV files will be saved to your current directory
- Use
--silentflag to suppress curl progress output:curl --silent -X POST ... - Add
--failflag to make curl return non-zero exit code on HTTP errors - The generated WAV files should be playable in any audio player