Modulate - Developer APIs

Transcription - Low Cost, Low WER

Multilingual batch and streaming transcription at just $0.03 / hr. Includes emotion detection, PII redaction, accent detection and specialized vocabulary for medical, geographic, and political terms.

Deepfake Detection

#1 Ranked Deepfake Detection Model on 🤗 Hugging Face's Deepfake Speech Arena Leaderboard. Just $0.25 / hr, over 100x lower cost other providers.

Conversation Understanding

Coming soon — The only voice-native model that delivers true conversation understanding by combining emotion detection, behavior identification, intent signals, and much more, into a single API call.

Built for Developers

REST and WebSocket APIs
Simple API key authentication
Credit-based pricing with free tier
Usage dashboard and billing controls

Example API call

curl -X POST \
  https://modulate-developer-apis.com/api/velma-2-stt-batch \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "upload_file=@audio.mp3" \
  -F "speaker_diarization=true" \
  -F "emotion_signal=true"

Learn More

Modulate.ai

Learn about Modulate's mission and technology

Ensemble Listening Models

Multi-model voice analysis for conversations

Velma Preview

Try Modulate's voice analysis in the browser

Build with Velma-2 Models