Add Live Subtitles and Translation to your Livestreams! (OpenAI's Whisper AI)
5.5K vistas|1 Resumido|2 año atrás
💫 Resumen
這部影片介紹如何使用OpenAI Whisper AI為直播添加即時字幕與翻譯,涵蓋安裝、設置及雙電腦配置的詳細步驟,並提供調整字幕視覺效果與延遲的實用技巧。
✦
介紹如何安裝和設置實時翻譯和字幕的程式。
00:00分享了實時翻譯和字幕的原型,並決定將其製作成易於使用的程式。
提供了安裝程序和雙電腦設置的指南,並提到需要安裝CUDO驅動程式。
用戶可以在設定中選擇不同的模型,從最快到最慢,並根據網站上的基準來選擇語言模型。
✦
使用大型模型進行直播時,可選擇翻譯或轉錄功能。
02:08在雙電腦設置中,使用大型模型效果最佳。
可選擇只轉錄或翻譯語言,並設置上下文以提供先前的語句。
可以將字幕發送至Twitch,讓觀眾自由開關字幕。
可調整字幕延遲,以更好地匹配口型。
✦
設定和使用OBS來顯示即時字幕的步驟。
04:13首先需要關閉Windows的快速編輯模式,以避免翻譯暫停的問題。
在OBS中創建新的瀏覽器來源,輸入相關的IP和端口。
可以自訂字幕的字體、大小、顯示的最大字數和背景等視覺設置。
完成後,OBS會記住這些設置,並可以隱藏設置界面,只顯示字幕。
✦
設定雙電腦直播字幕的步驟。
06:22右鍵點擊音源,添加渲染延遲以同步音訊和視頻。
在遊戲電腦上安裝OBS插件和NDI運行時,推送音訊到推理電腦。
在推理電腦上使用NDI工具,設置網絡攝像頭和音訊設備。
這樣可以實現從遊戲電腦到推理電腦的實時音訊串流。
00:00 Hey everyone! Last week I shared my prototype of real-time translation and subtitles on Twitter and
00:07 it had pretty good reception so I decided to kind of make it into an easy to use program
00:14 and share it with you. In this video I will show you how to get it, how to install it and how to
00:21 set up a dual pc setup if you would prefer running it on a dual pc setup. Also the whole time I will
00:28 be using the real-time subtitles and I can also speak Slovak and the subtitles will be in English.
00:41 So yeah, let's get started. You are going to be able to find more information about my program
00:46 on my website which is going to be in the description or right here on the video. It is
00:51 accessible to my tier 2 supporters on Ko-fi so if you subscribe on the tier 2 there you're going to
00:58 be able to get a role on Discord which will give you permissions for the program forum where you
01:05 will be able to get all the relevant links. After you've downloaded them, open them up,
01:13 install the CUDO driver which is essential for you to run the inference and open the folder.
01:19 Of course after you've unzipped it of the program, find settings.exe. There you will be able to set
01:27 up some important settings. You are able to choose the model. You can go from fastest to slowest. Of
01:36 course the slowest one is going to be the best but it's going to be pretty slow and you're going to
01:40 need a pretty good GPU. Ideally using it on a dual pc setup. You can also choose the models according
01:48 to some resources I've put on my website. You can click on the benchmarks and it's going to show you
01:54 some benchmarks of some of the supported languages. You could also go to openai's resources and find
02:02 some benchmarks for all the languages as well. I speak Slovak which is somewhere in the middle
02:08 and I found the large model works the best so at the moment when I'm streaming in my native
02:14 language I'm using it on a dual pc setup. If you would only be speaking and transcribing English
02:20 you could probably go with the tiny or base model and that might even work on your CPU.
02:26 You can also here choose if you want to use an English only model which are smaller to download
02:32 then you can choose if you want to translate or just transcribe which would mean if I would speak
02:38 Slovak I could get Slovak subtitles if I would only choose to transcribe but if I turn on
02:45 translate it will translate my Slovak speech to English. Then you are able to turn on context
02:51 which is a simple algorithm I've written for the ability to give the AI previous context from
02:59 sentences you've said before within the context time. You can also choose if you want to use a
03:06 GPU or CPU float 16 or int8 are GPU and CPU is CPU. If you'd like to speed up the computation
03:16 you could use int8 but you might lose some precision. Choose the language you are using
03:22 for example I would speak Slovak so I would choose Slovak but right now I'm gonna speak
03:28 English so I'm gonna go for English. You are also able to send the closed captions to Twitch
03:33 which gives your viewers the ability to turn on or off the captions in the Twitch player which is
03:41 a nice feature to have. You can enable it here but you also have to go to tools in OBS click
03:49 websocket enable websocket server you can turn off authentication then you don't need to worry
03:55 about the password. Only do that on your home network though then everything should work.
04:01 You can also delay the Twitch subtitles by a number of seconds to better match your lips
04:08 as you're speaking for example. You can also choose to censor the subtitles if that's something
04:13 you need. I will save the settings and then I can turn on the program.
04:24 When you first turn on the program it will download the model and then it's going to tell
04:28 you it's ready to go. An important thing here is that on some instances of Windows you need to
04:36 change one setting in the command prompt by right clicking on the top left corner going to properties
04:41 and turn off quick edit mode this make sure it's turned off. If you wouldn't do that it could
04:50 sometimes pause the translation that's just a quirk of Windows sadly. As you see the transcription
04:58 has already started and at the top we can see some IPs that are relevant for our use. Remember the
05:04 first one and open your OBS and create a new browser source. Write the IP in the URL
05:12 including the port. Put the size the same as you have your OBS canvas and click okay
05:21 and there you will be able to see the settings and as you can see the subtitles have already synced.
05:29 You can right click and click interact to be able to set all relevant visual settings that
05:38 you would like to change. You can choose a font from Google fonts, you can change the size,
05:46 you can change max words that are shown, you can do all kinds of stuff and change the background etc.
05:52 After you're done you can close this. OBS is going to remember the settings which is nice
05:58 and then you can hold alt and just hide the settings and only the subtitles are going to be
06:06 visible. You might also want to turn on a delay if that's something you feel like is needed.
06:14 To do that for audio you can click on this and go advanced audio properties and for example I'm
06:22 using two seconds for delay on audio and to do the same for video you can right click your source,
06:29 go filters and add render delay. The max delay for one instance is 500 milliseconds
06:38 so you can just duplicate and add as many as you need. For me when I'm using the large model
06:44 on a dual pc setup two seconds is perfect. To run the subtitles on a dual pc setup
06:50 you can also download the dual pc file from my discord. It's going to have
06:57 two folders one for your gaming pc that you will use for gaming and one for the pc that's doing the
07:04 inference. So first install the OBS plugin and the runtime for NDI on your pc that you are using the
07:13 microphone on and install NDI tools on the pc that's running the inference. After that on your
07:21 gaming pc go to your OBS, click on filters but make sure you click on it on your microphone,
07:32 add a filter that's called dedicated NDI output, give it a name and click on apply changes. Now on
07:39 your pc that's going to do the inference open NDI tools, click on webcam, find it in the toolbar,
07:47 in the toolbar click on it, click on the cog and find your pc and click on the name you've
07:56 set previously. After you've done that find your audio settings and set the webcam as your default
08:04 device. That should allow you to real-time stream audio from your gaming pc to your inference pc.
08:11 That's it. Hopefully that was comprehensive enough.