Cambridge speech AI with 55+ language transcription and voice agent API
Speechmatics is a Cambridge-based speech AI company providing automatic speech recognition, real-time transcription, and voice agent technology across 55+ languages and dialects. Its Flow API enables developers to build voice-interactive AI agents with sub-second latency and enterprise-grade accuracy.
Headquarters
Cambridge, United Kingdom
Founded
2006
Pricing
EU Data Hosting
Yes
Employees
201-500
Free
Pay-as-you-go
Contact Sales
Billing: pay-as-you-go, annual
The automatic speech recognition market is dominated by American companies. Deepgram, AssemblyAI, and OpenAI's Whisper API have captured developer mindshare with aggressive pricing and English-first accuracy. Speechmatics, founded in Cambridge in 2006, has quietly built an alternative — one with deeper language coverage, a voice agent API with no US equivalent in capability terms, and deployment options that keep data entirely within sovereign infrastructure.
The company traces its roots to Dr. Tony Robinson's research at Cambridge, where recurrent neural networks were first applied to speech recognition in the late 1980s. That academic lineage has shaped the company's technical culture: Speechmatics has consistently prioritised accuracy and language breadth over marketing spend. It registered as Speechmatics Limited (Companies House 07037524) and raised $90.6 million in total, including a $62 million Series B in June 2022 led by Susquehanna Growth Equity.
The customer base reflects this positioning. Broadcasters, healthcare systems, and enterprise software platforms rely on Speechmatics for production transcription. In 2025, AI-Media processed over 79 million caption minutes through Speechmatics infrastructure. The medical speech-to-text model, launched in September 2025, reached 93% general accuracy on real-world clinical audio — a benchmark that matters when transcription errors affect patient care.
Most speech-to-text providers support transcription in a handful of dominant languages and treat dialects as an afterthought. Speechmatics takes a different view. The 55+ language portfolio includes regional dialect variants that broader tools miss — accents that cause transcription failures in competitor systems produce consistent results in Speechmatics. A broadcaster covering local news in multiple European markets, or a call centre handling multilingual customer interactions, encounters fewer failure cases as a result.
Two model tiers — Enhanced and Standard — let developers trade accuracy for speed and cost. Enhanced delivers best-in-class accuracy; Standard processes faster at lower cost. For batch transcription of archival media, Standard may be sufficient. For real-time captioning where accuracy is broadcast-critical, Enhanced is the appropriate choice.
Flow is the product that positions Speechmatics beyond the transcription category. The API combines real-time ASR, a large language model, and text-to-speech into a complete speech-to-speech pipeline. Developers build voice-interactive AI agents — customer service bots, clinical documentation assistants, in-car interfaces — without assembling separate STT, LLM, and TTS services and handling the latency of passing data between them.
Flow handles the conversational complexity developers would otherwise need to build themselves: interruptions, multiple simultaneous speakers, background noise suppression, speaker-aware responses (addressing people by name, ignoring background voices). Python, React, and JavaScript SDKs are available. The API connects to LangGraph agents and MCP servers for teams building more complex AI orchestration pipelines.
The 2024 launch of Flow represents Speechmatics shifting from an ASR data provider to a voice AI platform. That distinction matters for developers evaluating long-term infrastructure choices.
In September 2025, Speechmatics demonstrated 93% accuracy on general medical speech-to-text, powered by NVIDIA hardware. This was achieved alongside healthcare customers who reported returning 30 million minutes to the clinical workforce through automated documentation. For health IT teams, this is the specific number that unlocks procurement decisions — not a marketing claim, but a measured workflow impact.
Specialised vocabulary support (custom dictionary and formatting preferences) extends this principle across other domains. Legal transcription, financial call recording, and technical support conversations all have vocabulary that generic models mishandle. Custom dictionary configuration addresses this without requiring a full custom model training engagement.
This is the feature that separates Speechmatics from Deepgram and AssemblyAI at the enterprise level. Enterprise customers can deploy Speechmatics entirely within their own infrastructure, including air-gapped environments where data never travels to an external network. Multi-region cloud deployment options satisfy data residency requirements for organisations operating under EU, UK, or sector-specific data sovereignty mandates.
Deepgram does not offer comparable on-premise deployment. AssemblyAI is US-hosted by default. For a defence contractor, a national health service, or a financial institution operating under strict data localisation policies, the on-premise option shifts Speechmatics from "comparable product" to "only viable option."
Speechmatics' documentation is comprehensive, well-maintained, and covers both basic REST API integration and complex voice agent architectures. The free tier provides 480 minutes of speech-to-text per month plus 1 million text-to-speech characters — enough for meaningful development and testing work before a paid commitment. No credit card is required to start.
The SDK coverage (Python, React, JavaScript) targets the languages most developers actually use for AI integration work. WebSocket support enables real-time streaming applications without polling.
Speechmatics operates a usage-based model after the free tier. Pro pricing starts at $0.24/hour with a 20% automatic discount kicking in above 500 hours per month. The 480 free minutes per month persist on paid accounts, meaning small-volume users effectively receive partial subsidy on their monthly bills.
At $0.24/hour, Speechmatics is mid-range in the ASR market. Deepgram's Nova-2 model can reach lower per-minute costs for English-only workloads at high volume. For multilingual workloads or voice agent development where Flow replaces three separate API subscriptions, the Speechmatics cost structure becomes competitive on a feature-adjusted basis.
Enterprise pricing is custom and includes volume discounts at 24,000+ hours annually, dedicated Customer Success Manager support, Solutions Engineer access, and on-premise deployment. The Pro tier is capped at 6,000 hours/month — organisations exceeding that threshold move to Enterprise automatically.
The Startup Program offers $50,000 in API credits with dedicated onboarding support, which is meaningful for well-funded startups building voice AI products on top of Speechmatics infrastructure.
Speechmatics is a UK company operating under UK GDPR, which directly mirrors EU GDPR requirements. ISO 27001 certification demonstrates a documented security management system independently audited. Data processing agreements are available, satisfying the contractual requirements of regulated-sector procurement.
The on-premise deployment option is the strongest compliance feature in the catalogue. No other speech AI provider at this scale offers comparable sovereign deployment flexibility. For organisations processing healthcare audio, legal proceedings, or classified government communications, on-premise deployment eliminates the fundamental data sovereignty concern rather than mitigating it.
Post-Brexit, Speechmatics' UK jurisdiction may require additional assessment for EU-based organisations operating under strict data transfer regulations. The company offers multi-region cloud options, including EU-hosted deployment, which addresses this for most use cases.
If you are building voice-interactive AI agents and need a single API for the full speech-to-speech pipeline, Flow removes significant integration complexity. Deepgram and AssemblyAI offer ASR; Speechmatics offers the complete voice agent stack.
If your application requires multilingual transcription including regional dialects, Speechmatics' 55+ language portfolio outperforms competitors on coverage and consistency. English-first tools fail on European language breadth.
If data sovereignty is non-negotiable — defence, healthcare, government — on-premise deployment makes Speechmatics the only enterprise-grade option in the European speech AI market.
If you are a developer evaluating ASR APIs, the free tier's 480 minutes per month is among the most generous in the category and requires no credit card commitment.
Speechmatics has spent two decades building what Deepgram and AssemblyAI built in four: an ASR system that handles human speech in its actual, messy, multilingual reality. The addition of Flow as a voice agent API in 2024 transformed the product from a transcription service into a platform. On-premise deployment, ISO 27001 certification, and genuine 55+ language accuracy make it the strongest European option for enterprise speech AI. The limitations are real — it is a developer product with no consumer interface, mid-range per-hour pricing for English-only workloads, and a Pro tier cap that requires Enterprise conversations sooner than some teams expect. For the use cases where Speechmatics is the right fit, the fit is very good.
Yes. Speechmatics operates under UK GDPR, which mirrors EU GDPR requirements. The company holds ISO 27001 certification and offers on-premise deployment for complete data sovereignty. EU-region cloud hosting is available for EU-based organisations with data transfer concerns.
Speechmatics supports 55+ languages and dialects, including regional accent variants. In September 2025, its medical speech-to-text model achieved 93% accuracy on real-world clinical audio. Accuracy is highest in core languages (English, Spanish, German, French); regional variants and less common languages perform at varying accuracy levels.
Flow is a speech-to-speech API combining real-time ASR, LLM reasoning, and text-to-speech in a single pipeline. It is designed for developers building voice-interactive AI agents — customer service systems, clinical documentation tools, or any application requiring natural spoken conversation. Python, React, and JavaScript SDKs are available.
Speechmatics Pro starts at $0.24/hour with a 20% discount above 500 hours/month. Deepgram's Nova-2 model can be cheaper for high-volume English-only transcription. For multilingual workloads or voice agent development (where Flow replaces three separate services), Speechmatics' feature-adjusted cost is competitive. The free tier of 480 minutes/month compares favourably to Deepgram's offerings.
Yes. Enterprise customers can deploy Speechmatics entirely within their own infrastructure, including air-gapped environments with no external network access. This option is not available on Deepgram or AssemblyAI at comparable scale, making Speechmatics the de facto choice for regulated industries with strict data sovereignty requirements.
LLM optimisation and deployment platform for enterprise AI
Alternative to Openai
Sovereign AI for European enterprises and government institutions
AI-powered translation that outperforms Google Translate in quality
Alternative to Google Translate
European open-source feature store and ML platform for the data-for-AI lifecycle