Transcribing and Translating Subtitles from a YouTube Video
Introduction
Recently, I worked on a personal project to transcribe and translate subtitles from a YouTube video. The reason I wanted to do this was because I wanted to watch a Chinese video that had no subtitles. It was an educational exercise to learn more about tools like yt-dlp, Whisper, and ffmpeg. Here’s how I did it step by step.
Step 1: Downloading the Video
First, I used yt-dlp, a command-line tool, to download the video from YouTube.
yt-dlp <youtube_video_url>
Replace
<youtube_video_url>
Step 2: Transcribing the Video
To transcribe the audio, I used Whisper. Since the video was in Mandarin, I specified the language and task:
whisper video.mp4 --language Mandarin --task transcribe
This generates a
.srt
Step 3: Translating the Subtitles
Next, I translated the subtitles using a Node.js script and a translation API (e.g., Google Translate or DeepL). The script reads the original
audio.srt
node translateSrt.js
You can find the full Node.js script in this repository, which includes detailed instructions for use.
const fs = require("fs");
const path = require("path");
const translate = require("google-translate-api-x");
const cliProgress = require("cli-progress");
(async () => {
try {
// Read your SRT file
const inputFilePath = path.join(__dirname, "audio.srt");
const outputFilePath = path.join(__dirname, "audio_translated.srt");
const data = fs.readFileSync(inputFilePath, "utf-8");
const lines = data.split("\n");
const translatedLines = [];
// Set up the progress bar
const progressBar = new cliProgress.SingleBar(
{
format: "Translating |{bar}| {value}/{total} lines",
},
cliProgress.Presets.shades_classic
);
// Start the bar at 0 with the total set to the number of lines
progressBar.start(lines.length, 0);
for (let i = 0; i < lines.length; i++) {
const line = lines[i];
const trimmedLine = line.trim();
if (
trimmedLine.match(/^\d+$/) ||
trimmedLine.includes("-->") ||
trimmedLine === ""
) {
// If it's an index line, timing line, or empty line, just keep it
translatedLines.push(line);
} else {
// Attempt to translate the line
try {
const res = await translate(trimmedLine, { from: "auto", to: "en" });
translatedLines.push(res.text);
} catch (err) {
console.error("Translation error:", err);
// Fall back to original text if there's an error
translatedLines.push(line);
}
}
// Update progress
progressBar.update(i + 1);
}
// Stop the progress bar
progressBar.stop();
// Write out the translated file
fs.writeFileSync(outputFilePath, translatedLines.join("\n"), "utf-8");
console.log("Translation complete:", outputFilePath);
} catch (err) {
console.error("Error:", err);
}
})();
Step 4: Adding Translated Subtitles to the Video
I used ffmpeg to embed the translated subtitles back into the video:
ffmpeg -i video.mp4 -vf subtitles=translated.srt output_video.mp4
Conclusion
This process was purely for educational purposes and allowed me to explore various tools. If you attempt this, ensure you respect copyright laws and use videos you have permission to modify.