Build a Telegram Transcription Bot in 30 Minutes

In this tutorial, you'll build a Telegram bot that transcribes voice messages and audio files using the Nagovori API. The bot receives a voice message, sends it to Nagovori for transcription, and replies with the text.

Prerequisites

Python 3.10+
A Telegram bot token from @BotFather
A Nagovori API key (create one in your Profile)

Setup

Install dependencies:

pip install python-telegram-bot requests

Set environment variables:

export TELEGRAM_BOT_TOKEN="your-telegram-bot-token"
export NAGOVORI_API_KEY="nag_your_api_key"

The Code

Create bot.py:

import os
import time
import tempfile
import requests
from telegram import Update
from telegram.ext import Application, MessageHandler, filters, ContextTypes

TELEGRAM_TOKEN = os.environ["TELEGRAM_BOT_TOKEN"]
API_KEY = os.environ["NAGOVORI_API_KEY"]
BASE_URL = "https://api.nagovori.ru/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}


async def handle_voice(update: Update, context: ContextTypes.DEFAULT_TYPE):
    """Handle voice messages and audio files."""
    message = update.message
    if not message:
        return

    # Get the file from Telegram
    if message.voice:
        file = await message.voice.get_file()
        filename = "voice.ogg"
        content_type = "audio/ogg"
    elif message.audio:
        file = await message.audio.get_file()
        filename = message.audio.file_name or "audio.mp3"
        content_type = message.audio.mime_type or "audio/mpeg"
    elif message.document and message.document.mime_type and \
         message.document.mime_type.startswith("audio/"):
        file = await message.document.get_file()
        filename = message.document.file_name or "audio.mp3"
        content_type = message.document.mime_type
    else:
        return

    await message.reply_text("Transcribing... please wait.")

    # Download from Telegram
    with tempfile.NamedTemporaryFile(suffix=".ogg", delete=False) as tmp:
        await file.download_to_drive(tmp.name)
        file_size = os.path.getsize(tmp.name)

        # 1. Presign upload
        presign = requests.post(
            f"{BASE_URL}/uploads/presign",
            headers=HEADERS,
            json={
                "filename": filename,
                "content_type": content_type,
                "size_bytes": file_size,
            },
        ).json()

        # 2. Upload to Nagovori
        with open(tmp.name, "rb") as audio:
            requests.put(
                presign["upload_url"],
                data=audio,
                headers={"Content-Type": content_type},
            )

    os.unlink(tmp.name)

    # 3. Create transcription
    job = requests.post(
        f"{BASE_URL}/transcriptions",
        headers=HEADERS,
        json={
            "object_key": presign["object_key"],
            "filename": filename,
            "content_type": content_type,
            "size_bytes": file_size,
            "language": "auto",
        },
    ).json()

    # 4. Poll for result
    while job["status"] in ("queued", "processing"):
        time.sleep(2)
        job = requests.get(
            f"{BASE_URL}/transcriptions/{job['id']}",
            headers=HEADERS,
        ).json()

    if job["status"] == "completed":
        text = job["transcript_text"]
        # Telegram messages have a 4096 character limit
        for i in range(0, len(text), 4000):
            await message.reply_text(text[i:i + 4000])
    else:
        error = job.get("error_message", "Unknown error")
        await message.reply_text(f"Transcription failed: {error}")


def main():
    app = Application.builder().token(TELEGRAM_TOKEN).build()
    app.add_handler(MessageHandler(
        filters.VOICE | filters.AUDIO | filters.Document.AUDIO,
        handle_voice,
    ))
    print("Bot started. Listening for voice messages...")
    app.run_polling()


if __name__ == "__main__":
    main()

Running

python bot.py

Send a voice message to your bot — it will reply with the transcribed text within seconds.

Deployment

For production, consider:

Webhooks instead of polling for lower latency
Async polling with asyncio instead of time.sleep
Error handling with retry logic for API failures
Docker container for easy deployment

Docker Example

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY bot.py .
CMD ["python", "bot.py"]

Extending the Bot

Ideas for improvements:

Add /lang command to set preferred language
Support video messages (extract audio track)
Add /summary command to use AI postprocessing
Store transcription history in a database
Add inline mode for transcribing forwarded voice messages

Cost

Each voice message uses your Nagovori minute balance. A typical 1-minute voice message uses 1 minute from your balance. Check your remaining minutes in the Profile.