Velma-2 Models
Velma-2's synthetic voice detection models achieve state-of-the-art performance detecting speech deepfakes on single-speaker audio.
| Batch English Fast | Batch Multilingual | Streaming Multilingual | |
|---|---|---|---|
| Description | High-throughput English batch processing with >200x real-time speed | Multilingual batch transcription in 70+ languages with full feature set | Real-time streaming transcription in 70+ languages via WebSocket |
| Endpoint |
/api/velma-2-stt-batch-english-vfast
|
/api/velma-2-stt-batch |
/api/velma-2-stt-streaming
|
| Type | Batch | Batch | Streaming |
| API | REST API | REST API | WebSocket |
| Accepted Files | AAC, AIFF, FLAC, MOV, MP3, MP4, OGG, Opus, WAV, WebM | AAC, AIFF, FLAC, MOV, MP3, MP4, OGG, Opus, WAV, WebM | AAC, AIFF, FLAC, MOV, MP3, MP4, OGG, Opus, WAV, WebM |
| Pricing | $0.025 / hour | $0.03 / hour | $0.06 / hour |
| Built-in Features | |||
| Transcription | |||
| Auto Capitalization | |||
| Auto Punctuation | |||
| Language | English | 70+ languages | 70+ languages |
| Real-Time | |||
| Optional Features | |||
| Speaker Diarization | |||
| Emotion Detection | |||
| Accent Identification | |||
| PII/PHI Tagging | |||
| Deepfake Batch | Deepfake Streaming | |
|---|---|---|
| Description | Speech deepfake prediction for audio files with a single speaker | Real-time speech deepfake detection for streaming audio with a single speaker |
| Accuracy | 98.9% average — #1 on Speech DF Arena | 98.9% average — #1 on Speech DF Arena |
| Endpoint |
/api/velma-2-synthetic-voice-detection-batch
|
/api/velma-2-synthetic-voice-detection-streaming
|
| Type | Deepfake Batch | Deepfake Streaming |
| API | REST API | WebSocket |
| Accepted Files | AAC, AIFF, FLAC, MOV, MP3, MP4, OGG, Opus, WAV, WebM | Raw 16kHz mono PCM (int16 LE) |
| Pricing | $0.25 / hour | $0.25 / hour |
| Built-in Features | ||
| Per-Window Prediction | ||
| Confidence Scoring | ||
| Silence Detection | ||
| Flexible Chunk Size | ||
| Real-Time | ||
| First Prediction | After full file upload | At 500ms of audio |
Authentication & Rate Limits
Authentication
REST endpoints require an API key via the
X-API-Key header. The WebSocket endpoint uses
an api_key query parameter.
REST: X-API-Key: your_api_key_here
WebSocket: wss://...?api_key=your_api_key_here
Rate Limits & Billing
- Per-model concurrency and monthly usage quotas
- Credit-based billing with free tier included
- Usage tracked in real time via the Usage Dashboard
Models
Select a model
Select a model from the sidebar to view its documentation, API spec, and quickstart guide.
Example Projects Coming Soon
We're preparing example projects to help you get started with the Velma-2 API. Check back soon for ready-to-use code samples and integration guides.
API Specs
Select an API spec
Select an API spec from the sidebar to view its OpenAPI definition.