Back to blog
8 minNagovori

Build a Telegram Transcription Bot in 30 Minutes

telegramtutorialpythonbot

Build a Telegram Transcription Bot in 30 Minutes

In this tutorial, you'll build a Telegram bot that transcribes voice messages and audio files using the Nagovori API. The bot receives a voice message, sends it to Nagovori for transcription, and replies with the text.

Prerequisites

  • Python 3.10+
  • A Telegram bot token from @BotFather
  • A Nagovori API key (create one in your Profile)

Setup

Install dependencies:

pip install python-telegram-bot requests

Set environment variables:

export TELEGRAM_BOT_TOKEN="your-telegram-bot-token"
export NAGOVORI_API_KEY="nag_your_api_key"

The Code

Create bot.py:

import os
import time
import tempfile
import requests
from telegram import Update
from telegram.ext import Application, MessageHandler, filters, ContextTypes

TELEGRAM_TOKEN = os.environ["TELEGRAM_BOT_TOKEN"]
API_KEY = os.environ["NAGOVORI_API_KEY"]
BASE_URL = "https://api.nagovori.ru/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}


async def handle_voice(update: Update, context: ContextTypes.DEFAULT_TYPE):
    """Handle voice messages and audio files."""
    message = update.message
    if not message:
        return

    # Get the file from Telegram
    if message.voice:
        file = await message.voice.get_file()
        filename = "voice.ogg"
        content_type = "audio/ogg"
    elif message.audio:
        file = await message.audio.get_file()
        filename = message.audio.file_name or "audio.mp3"
        content_type = message.audio.mime_type or "audio/mpeg"
    elif message.document and message.document.mime_type and \
         message.document.mime_type.startswith("audio/"):
        file = await message.document.get_file()
        filename = message.document.file_name or "audio.mp3"
        content_type = message.document.mime_type
    else:
        return

    await message.reply_text("Transcribing... please wait.")

    # Download from Telegram
    with tempfile.NamedTemporaryFile(suffix=".ogg", delete=False) as tmp:
        await file.download_to_drive(tmp.name)
        file_size = os.path.getsize(tmp.name)

        # 1. Presign upload
        presign = requests.post(
            f"{BASE_URL}/uploads/presign",
            headers=HEADERS,
            json={
                "filename": filename,
                "content_type": content_type,
                "size_bytes": file_size,
            },
        ).json()

        # 2. Upload to Nagovori
        with open(tmp.name, "rb") as audio:
            requests.put(
                presign["upload_url"],
                data=audio,
                headers={"Content-Type": content_type},
            )

    os.unlink(tmp.name)

    # 3. Create transcription
    job = requests.post(
        f"{BASE_URL}/transcriptions",
        headers=HEADERS,
        json={
            "object_key": presign["object_key"],
            "filename": filename,
            "content_type": content_type,
            "size_bytes": file_size,
            "language": "auto",
        },
    ).json()

    # 4. Poll for result
    while job["status"] in ("queued", "processing"):
        time.sleep(2)
        job = requests.get(
            f"{BASE_URL}/transcriptions/{job['id']}",
            headers=HEADERS,
        ).json()

    if job["status"] == "completed":
        text = job["transcript_text"]
        # Telegram messages have a 4096 character limit
        for i in range(0, len(text), 4000):
            await message.reply_text(text[i:i + 4000])
    else:
        error = job.get("error_message", "Unknown error")
        await message.reply_text(f"Transcription failed: {error}")


def main():
    app = Application.builder().token(TELEGRAM_TOKEN).build()
    app.add_handler(MessageHandler(
        filters.VOICE | filters.AUDIO | filters.Document.AUDIO,
        handle_voice,
    ))
    print("Bot started. Listening for voice messages...")
    app.run_polling()


if __name__ == "__main__":
    main()

Running

python bot.py

Send a voice message to your bot — it will reply with the transcribed text within seconds.

Deployment

For production, consider:

  • Webhooks instead of polling for lower latency
  • Async polling with asyncio instead of time.sleep
  • Error handling with retry logic for API failures
  • Docker container for easy deployment

Docker Example

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY bot.py .
CMD ["python", "bot.py"]

Extending the Bot

Ideas for improvements:

  • Add /lang command to set preferred language
  • Support video messages (extract audio track)
  • Add /summary command to use AI postprocessing
  • Store transcription history in a database
  • Add inline mode for transcribing forwarded voice messages

Cost

Each voice message uses your Nagovori minute balance. A typical 1-minute voice message uses 1 minute from your balance. Check your remaining minutes in the Profile.