How We Built Nagovori: Architecture, Security, and Scale
How We Built Nagovori: Architecture, Security, and Scale
We regularly get questions: "Where are my recordings processed?", "Who can access my data?", "Why is it so fast?" This post answers those questions with a candid look at our architecture — enough detail to understand how things work without exposing implementation specifics.
High-Level Architecture
Nagovori is a web application with a job queue. When you upload a file, here's what happens:
- Upload — the file is transmitted over an encrypted connection (HTTPS/TLS 1.3)
- Queue — the file enters a processing queue. You see your position and estimated wait time
- Processing — a worker picks the file, converts it to the required format, and runs it through the speech recognition model
- Result — the transcript is saved and appears in your dashboard
The system is built on a microservice architecture: the web frontend, API server, processing workers, and storage are separate components that scale independently.
Recognition Models
We use multiple models depending on the task:
- Primary model — optimized for Russian, delivers 95%+ accuracy on clean audio
- Multilingual model — for English and other supported languages
- Lightweight model — for short voice messages where speed matters more than marginal accuracy gains
Models run on GPU servers with NVIDIA hardware. This lets us process one hour of audio in 2–5 minutes.
Data Security
Encryption
- All data in transit uses HTTPS (TLS 1.3)
- Files at rest are encrypted
- Internal service-to-service communication is secured
Storage
- Servers are located in Russia
- Files are retained for processing and remain accessible in user history
- Users can delete their data at any time through their dashboard
Access Control
- Only the account owner can access their transcriptions and files
- System administrators do not have access to user audio content or transcripts
- Authentication is handled through a dedicated identity service (OIDC-based)
Compliance
- Personal data processing complies with Russian Federal Law No. 152-FZ
- Privacy policy and terms of service are publicly available
- Data is not shared with third parties for marketing purposes
Messenger Integrations
Bots for Telegram, VK, and Max follow the same pattern:
- User forwards a voice message to the bot
- The bot receives the file via the messenger's API
- The file enters the same processing queue as web uploads
- The result is returned to the user in the chat
Important: bots don't store messages beyond processing, don't read conversations, and don't have access to chats where they haven't been added.
Text-to-Speech (TTS)
TTS works in reverse: the user inputs text, the system sends it to a synthesis model, and returns an audio file. Multiple professional-quality voices are available. TTS and transcription share the same minute balance — one account, one pool of minutes.
Scaling
The job queue handles traffic spikes gracefully. If 100 users upload files simultaneously, the system doesn't crash — files queue up and process sequentially. When load increases, we add more workers. This approach is simpler and more reliable than trying to process everything in real time.
Current processing capacity handles thousands of hours of audio per day. During peak periods, average wait times stay under 5 minutes for a one-hour file.
What We're Building Next
Our roadmap includes:
- Faster processing — targeting 1 minute for a 1-hour file
- Speaker diarization — identifying who said what
- Developer API — RESTful API for integrating transcription into third-party products
- Improved punctuation — better handling of complex sentence structures
Technology Choices
A few notable decisions:
- Next.js for the frontend — server-side rendering for SEO, React for interactivity
- Go for the backend — concurrency model fits the queue-based architecture
- PostgreSQL + ClickHouse — relational data in Postgres, analytics and metrics in ClickHouse
- Docker and Kubernetes — containerized deployment with automated scaling
Conclusion
Nagovori's architecture prioritizes simplicity for the user and security for their data. Upload a file, wait briefly, use the text — everything else happens behind the scenes. If you have specific questions about our security practices, reach out through our support channels.