Explore Modulate's Velma-2 models — Deepfake Detection, Speech-to-text Transcription, Emotion Detection, and more.
Multilingual batch and streaming transcription at just $0.03 / hr. Includes emotion detection, PII redaction, accent detection and specialized vocabulary for medical, geographic, and political terms.
#1 Ranked Deepfake Detection Model on 🤗 Hugging Face's Deepfake Speech Arena Leaderboard. Just $0.25 / hr, over 100x lower cost other providers.
Coming soon — The only voice-native model that delivers true conversation understanding by combining emotion detection, behavior identification, intent signals, and much more, into a single API call.
Example API call
curl -X POST \
https://modulate-developer-apis.com/api/velma-2-stt-batch \
-H "X-API-Key: YOUR_API_KEY" \
-F "upload_file=@audio.mp3" \
-F "speaker_diarization=true" \
-F "emotion_signal=true"