Build with Velma-2 Models

Explore Modulate's Velma-2 models — Deepfake Detection, Speech-to-text Transcription, Emotion Detection, and more.

Transcription - Low Cost, Low WER

Multilingual batch and streaming transcription at just $0.03 / hr. Includes emotion detection, PII redaction, accent detection and specialized vocabulary for medical, geographic, and political terms.

Deepfake Detection

#1 Ranked Deepfake Detection Model on 🤗 Hugging Face's Deepfake Speech Arena Leaderboard. Just $0.25 / hr, over 100x lower cost other providers.

Conversation Understanding

Coming soon — The only voice-native model that delivers true conversation understanding by combining emotion detection, behavior identification, intent signals, and much more, into a single API call.

Built for Developers

  • REST and WebSocket APIs
  • Simple API key authentication
  • Credit-based pricing with free tier
  • Usage dashboard and billing controls

Example API call

curl -X POST \
  https://modulate-developer-apis.com/api/velma-2-stt-batch \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "upload_file=@audio.mp3" \
  -F "speaker_diarization=true" \
  -F "emotion_signal=true"

Learn More

Ready to Build?

Create a free account and start making API calls in minutes.

Get Started