Add Live Subtitles and Translation to your Livestreams! (OpenAI's Whisper AI)

5.5K vistas｜1 Resumido｜2 año atrás

💫 Resumen

這部影片介紹如何使用OpenAI Whisper AI為直播添加即時字幕與翻譯，涵蓋安裝、設置及雙電腦配置的詳細步驟，並提供調整字幕視覺效果與延遲的實用技巧。

✨ Destacados📊 Transcripción

Copiar

Chatear con el video

✦介紹如何安裝和設置實時翻譯和字幕的程式。
00:00
分享了實時翻譯和字幕的原型，並決定將其製作成易於使用的程式。
提供了安裝程序和雙電腦設置的指南，並提到需要安裝CUDO驅動程式。
用戶可以在設定中選擇不同的模型，從最快到最慢，並根據網站上的基準來選擇語言模型。

✦使用大型模型進行直播時，可選擇翻譯或轉錄功能。
02:08
在雙電腦設置中，使用大型模型效果最佳。
可選擇只轉錄或翻譯語言，並設置上下文以提供先前的語句。
可以將字幕發送至Twitch，讓觀眾自由開關字幕。
可調整字幕延遲，以更好地匹配口型。

✦設定和使用OBS來顯示即時字幕的步驟。
04:13
首先需要關閉Windows的快速編輯模式，以避免翻譯暫停的問題。
在OBS中創建新的瀏覽器來源，輸入相關的IP和端口。
可以自訂字幕的字體、大小、顯示的最大字數和背景等視覺設置。
完成後，OBS會記住這些設置，並可以隱藏設置界面，只顯示字幕。

✦設定雙電腦直播字幕的步驟。
06:22
右鍵點擊音源，添加渲染延遲以同步音訊和視頻。
在遊戲電腦上安裝OBS插件和NDI運行時，推送音訊到推理電腦。
在推理電腦上使用NDI工具，設置網絡攝像頭和音訊設備。
這樣可以實現從遊戲電腦到推理電腦的實時音訊串流。

00:00 Hey everyone! Last week I shared my prototype of real-time translation and subtitles on Twitter and

00:07 it had pretty good reception so I decided to kind of make it into an easy to use program

00:14 and share it with you. In this video I will show you how to get it, how to install it and how to

00:21 set up a dual pc setup if you would prefer running it on a dual pc setup. Also the whole time I will

00:28 be using the real-time subtitles and I can also speak Slovak and the subtitles will be in English.

00:41 So yeah, let's get started. You are going to be able to find more information about my program

00:46 on my website which is going to be in the description or right here on the video. It is

00:51 accessible to my tier 2 supporters on Ko-fi so if you subscribe on the tier 2 there you're going to

00:58 be able to get a role on Discord which will give you permissions for the program forum where you

01:05 will be able to get all the relevant links. After you've downloaded them, open them up,

01:13 install the CUDO driver which is essential for you to run the inference and open the folder.

01:19 Of course after you've unzipped it of the program, find settings.exe. There you will be able to set

01:27 up some important settings. You are able to choose the model. You can go from fastest to slowest. Of

01:36 course the slowest one is going to be the best but it's going to be pretty slow and you're going to

01:40 need a pretty good GPU. Ideally using it on a dual pc setup. You can also choose the models according

01:48 to some resources I've put on my website. You can click on the benchmarks and it's going to show you

01:54 some benchmarks of some of the supported languages. You could also go to openai's resources and find

02:02 some benchmarks for all the languages as well. I speak Slovak which is somewhere in the middle

02:08 and I found the large model works the best so at the moment when I'm streaming in my native

02:14 language I'm using it on a dual pc setup. If you would only be speaking and transcribing English

02:20 you could probably go with the tiny or base model and that might even work on your CPU.

02:26 You can also here choose if you want to use an English only model which are smaller to download

02:32 then you can choose if you want to translate or just transcribe which would mean if I would speak

02:38 Slovak I could get Slovak subtitles if I would only choose to transcribe but if I turn on

02:45 translate it will translate my Slovak speech to English. Then you are able to turn on context

02:51 which is a simple algorithm I've written for the ability to give the AI previous context from

02:59 sentences you've said before within the context time. You can also choose if you want to use a

03:06 GPU or CPU float 16 or int8 are GPU and CPU is CPU. If you'd like to speed up the computation

03:16 you could use int8 but you might lose some precision. Choose the language you are using

03:22 for example I would speak Slovak so I would choose Slovak but right now I'm gonna speak

03:28 English so I'm gonna go for English. You are also able to send the closed captions to Twitch

03:33 which gives your viewers the ability to turn on or off the captions in the Twitch player which is

03:41 a nice feature to have. You can enable it here but you also have to go to tools in OBS click

03:49 websocket enable websocket server you can turn off authentication then you don't need to worry

03:55 about the password. Only do that on your home network though then everything should work.

04:01 You can also delay the Twitch subtitles by a number of seconds to better match your lips

04:08 as you're speaking for example. You can also choose to censor the subtitles if that's something

04:13 you need. I will save the settings and then I can turn on the program.

04:24 When you first turn on the program it will download the model and then it's going to tell

04:28 you it's ready to go. An important thing here is that on some instances of Windows you need to

04:36 change one setting in the command prompt by right clicking on the top left corner going to properties

04:41 and turn off quick edit mode this make sure it's turned off. If you wouldn't do that it could

04:50 sometimes pause the translation that's just a quirk of Windows sadly. As you see the transcription

04:58 has already started and at the top we can see some IPs that are relevant for our use. Remember the

05:04 first one and open your OBS and create a new browser source. Write the IP in the URL

05:12 including the port. Put the size the same as you have your OBS canvas and click okay

05:21 and there you will be able to see the settings and as you can see the subtitles have already synced.

05:29 You can right click and click interact to be able to set all relevant visual settings that

05:38 you would like to change. You can choose a font from Google fonts, you can change the size,

05:46 you can change max words that are shown, you can do all kinds of stuff and change the background etc.

05:52 After you're done you can close this. OBS is going to remember the settings which is nice

05:58 and then you can hold alt and just hide the settings and only the subtitles are going to be

06:06 visible. You might also want to turn on a delay if that's something you feel like is needed.

06:14 To do that for audio you can click on this and go advanced audio properties and for example I'm

06:22 using two seconds for delay on audio and to do the same for video you can right click your source,

06:29 go filters and add render delay. The max delay for one instance is 500 milliseconds

06:38 so you can just duplicate and add as many as you need. For me when I'm using the large model

06:44 on a dual pc setup two seconds is perfect. To run the subtitles on a dual pc setup

06:50 you can also download the dual pc file from my discord. It's going to have

06:57 two folders one for your gaming pc that you will use for gaming and one for the pc that's doing the

07:04 inference. So first install the OBS plugin and the runtime for NDI on your pc that you are using the

07:13 microphone on and install NDI tools on the pc that's running the inference. After that on your

07:21 gaming pc go to your OBS, click on filters but make sure you click on it on your microphone,

07:32 add a filter that's called dedicated NDI output, give it a name and click on apply changes. Now on

07:39 your pc that's going to do the inference open NDI tools, click on webcam, find it in the toolbar,

07:47 in the toolbar click on it, click on the cog and find your pc and click on the name you've

07:56 set previously. After you've done that find your audio settings and set the webcam as your default

08:04 device. That should allow you to real-time stream audio from your gaming pc to your inference pc.

08:11 That's it. Hopefully that was comprehensive enough.

Ver video original