Overview
Smallest AI provides real-time speech-to-text transcription through a WebSocket-based integration with their Waves API. The service uses the Pulse model to stream audio continuously and receive interim and final transcription results with low latency.Smallest AI STT API Reference
Complete API reference for all parameters and methods
Example Implementation
Complete example with WebSocket streaming
Installation
Prerequisites
- Smallest AI Account: Sign up at Smallest AI
- API Key: Generate an API key from your account dashboard
Configuration
Smallest AI API key for authentication.
Base WebSocket URL for the Smallest API. Override for custom or proxied
deployments.
Audio encoding format.
Audio sample rate in Hz. When
None, uses the pipeline’s configured sample
rate.Runtime-configurable settings. See Settings below.
P99 latency from speech end to final transcript in seconds. Used for
processing metrics.
Settings
Runtime-configurable settings passed via thesettings constructor argument using SmallestSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | pulse | Model identifier. Currently only pulse is supported. |
language | Language | str | Language.EN | Language code for transcription. |
word_timestamps | bool | False | Include word-level timestamps in transcription results. |
full_transcript | bool | False | Include cumulative transcript in results. |
sentence_timestamps | bool | False | Include sentence-level timestamps in transcription results. |
redact_pii | bool | False | Redact personally identifiable information from transcripts. |
redact_pci | bool | False | Redact payment card information from transcripts. |
numerals | str | auto | Convert spoken numerals to digits. Options: auto, always, or none. |
diarize | bool | False | Enable speaker diarization to identify different speakers. |
Usage
Basic Setup
With Advanced Features
Updating Settings at Runtime
Transcription settings can be changed mid-conversation usingSTTUpdateSettingsFrame:
Notes
- WebSocket streaming: The service uses WebSocket connections for real-time streaming. The connection is automatically managed and will reconnect if interrupted.
- VAD integration: Uses Pipecat’s VAD to detect when the user stops speaking and sends a finalize message to flush the final transcript.
- Keepalive: The service sends periodic keepalive messages (every 5 seconds) to prevent idle timeouts on the WebSocket connection.
- Language support: Supports 31 languages including Bulgarian, Bengali, Czech, Danish, German, English, Spanish, Estonian, Finnish, French, Gujarati, Hindi, Hungarian, Italian, Kannada, Lithuanian, Latvian, Malayalam, Marathi, Maltese, Dutch, Odia, Punjabi, Polish, Portuguese, Romanian, Russian, Slovak, Swedish, Tamil, Telugu, and Ukrainian.
Event Handlers
Smallest AI STT supports the standard service connection events:| Event | Description |
|---|---|
on_connected | Connected to Smallest AI WebSocket |
on_disconnected | Disconnected from Smallest WebSocket |
on_connection_error | WebSocket connection error occurred |