Unlock the Power of Optical Character Recognition with a Telegram Chatbot

Imagine having a chatbot that can extract text from images and videos sent to it. In this tutorial, we’ll explore how to build a Telegram chatbot capable of performing Optical Character Recognition (OCR) using Node.js and several powerful libraries.

Getting Started

We’ll use the following modules to build our bot:

  • Telegraf: A Telegram bot framework for Node.js
  • Node-Tesseract-OCR: A Node.js wrapper for the Tesseract OCR API
  • Node-FFmpeg: A FFmpeg module for Node.js
  • Dotenv: A module for loading environment variables from a.env file
  • Axios: A promise-based HTTP client for Node.js

Understanding Our Bot Logic

Our bot will have two independent scenes: imageScene and videoScene. The imageScene will handle extracting text from images, while the videoScene will handle extracting text from frames in videos.

Creating Our Working Directory

Let’s create a new directory for our bot and install the necessary dependencies:

mkdir ocr-bot
cd ocr-bot
npm init -y
npm install telegraf node-tesseract-ocr node-ffmpeg dotenv axios

Registering Our Bot

To register our bot, we’ll need to contact the BotFather, a bot that helps create new bot accounts and manage existing ones. Follow the instructions to create a new bot account and obtain an access token.

Creating the Main File

In this step, we’ll create our main bot file, main.js. This file will import the necessary modules and create a new bot instance:
“`
const { Telegraf } = require(‘telegraf’);
const dotenv = require(‘dotenv’);

dotenv.config();

const bot = new Telegraf(process.env.BOT_TOKEN);

//… (rest of the code)
“`
Creating the Image Scene

In this step, we’ll create the imageScene.js file, which will handle extracting text from images:
“`
const { WizardScene } = require(‘telegraf’);
const fileManager = require(‘./fileManager’);
const ocr = require(‘./ocr’);

const imageScene = new WizardScene(‘imageScene’,
async (ctx) => {
//… (rest of the code)
}
);
“`
Creating the Video Scene

In this step, we’ll create the videoScene.js file, which will handle extracting text from frames in videos:
“`
const { WizardScene } = require(‘telegraf’);
const fileManager = require(‘./fileManager’);
const ocr = require(‘./ocr’);

const videoScene = new WizardScene(‘videoScene’,
async (ctx) => {
//… (rest of the code)
}
);
“`
Creating the File Manager

In this step, we’ll create the fileManager.js file, which will handle downloading and deleting files sent by the user:
“`
const axios = require(‘axios’);
const fs = require(‘fs’);
const path = require(‘path’);

const downloadFile = async (fileUrl, fileUniqueId) => {
//… (rest of the code)
};

const deleteFile = async (filePath) => {
//… (rest of the code)
};
“`
Creating the OCR File

In this step, we’ll create the ocr.js file, which will handle extracting text from images and frames in videos:
“`
const tesseract = require(‘node-tesseract-ocr’);
const ffmpeg = require(‘fluent-ffmpeg’);

const extractText = async (imagePath) => {
//… (rest of the code)
};

const videoOCR = async (videoPath, frame) => {
//… (rest of the code)
};
“`
Running Our Bot

Finally, let’s run our bot and interact with it on Telegram:

node main.js

Open your Telegram client and add the bot that you’ve created. Start a conversation with it by sending /start or clicking the start button if available. Click the “Extract from 🖼️” button to enter the imageScene, and then send an image to extract text from it. Repeat the process for the videoScene.

With this tutorial, you’ve learned how to build a Telegram chatbot capable of extracting text from images and videos using Node.js and several powerful libraries.

Leave a Reply