Yann LeCun on World Models, AI Threats and Open-Sourcing | Eye On AI #150

Eye on AI

#AI Evolution#Oracle Cloud Infrastructure#Yann LeCun#Embodied Turing Test#World Model AI#Neuro-AI#AI Energy Consumption#Open Source AI#Augmented LLMs#AI Threat Debate#ConvNets

9.6K Visualizzazioni｜1 Riepilogato｜2 anno fa

💫 Riepilogo

인공지능 연구에서의 오픈 리서치와 AI의 위험성에 대한 Yann LeCun의 견해 및 세계 모델 훈련에 대한 이야기.

✨ Punti salienti📊 Trascrizione

Copia

Chatta con il video

✦AI 연구에서의 오픈 소싱은 사회에 긍정적인 영향을 믿고 인간과 사회 그리고 민주주의를 신뢰할 때 가장 진보된 방향으로 나아갈 수 있다.
00:00
시스템이 예측할 수 있는 세계 모델을 가지더라도, 복잡한 상황들을 다루기 위해서는 시스템이 자체적으로 조정해야 한다.
AI 연구의 조직 방식은 AI의 결과에 대한 사람들의 두려움에 따라 결정된다.
AI는 다른 산업에도 큰 영향을 미치고 있으며, 많은 투자가 이루어지고 있다.

✦이미지와 비디오의 맥락에서 작업을 할 때, 픽셀을 예측하려는 것은 좋지 않으며, 이미지와 비디오로부터 학습하는 것은 텍스트로부터 학습하는 것과 근본적으로 다르다.
06:57
픽셀을 예측하는 것은 좋은 표현을 얻지 못하고 흐릿한 예측을 얻는다.
텍스트는 이산적이기 때문에 단어를 예측하기 쉽다.
JEPA 아키텍처를 사용하여 언어에도 적용할 수 있다.
모델을 크게 만들면 더 잘 작동하는 경향이 있다.

✦Yann LeCun은 AI 시스템이 현재로서는 악의적인 사람들에게 큰 도움을 제공하지 못한다고 말하며, AI 시스템이 공개적으로 사용 가능한 데이터를 사용하고 있다는 이유를 제시한다.
13:54
Geoff와 Yoshua는 강력한 추론 능력과 도구 사용 능력을 가진 언어 모델을 우려하고 있다.
AI 시스템은 현재로서는 악의적인 사람들에게 유의미한 도움을 제공하지 못한다.
AI 시스템은 공개적으로 사용 가능한 데이터를 사용하여 훈련되며, 새로운 것을 창조할 수 없다.

✦인공지능 시스템은 우리에게 복종적이며 안전을 보장하는 가이드레일을 갖게 될 것이다.
27:49
인공지능 시스템은 우리가 설정한 목표의 하위 목표를 정의할 수 있지만, 우리에게 복종적일 것이다.
작은 규모의 인공지능 시스템부터 시작하여 점진적으로 발전시킬 것이다.
뉴로인공지능은 뇌과학에서 영감을 받아 구축되는 아이디어이다.

✦비디오를 통해 지속적으로 학습하고 더 똑똑해지는 로봇이나 모델이 언젠가는 가능할 것이다.
34:50
비디오를 통해 환경을 학습하는 모델이 필요하다.
복잡한 세상에서 fine tuning이 필요하다.
지속적인 학습이 필요하다.

✦인공지능 연구의 방향성은 AI의 결과에 대한 사람들의 두려움에 따라 결정될 것이다.
41:49
긍정적인 시각을 가진 사람들은 연구를 공개하는 것이 진전을 이루는 가장 좋은 방법이라고 생각한다.
그러나 일부 사람들은 AI의 결과에 대해 불안감을 가지고 있어 규제를 요구하고 있다.
오픈 소스 코드의 접근을 제한하는 것은 큰 논쟁거리이다.

✦AI 시스템을 훈련시키기 위해 공공 콘텐츠 사용이 불법이 되면, 오픈 소스나 다른 형태의 AI 개발이 어려워질 수 있음.
48:46
법적 이유와 정치적 결정에 따라 AI 개발이 제한될 수 있음.
Hugging Face, Mistral, LAION과 같은 기업과 학문적 노력들이 오픈 소스 AI 개발에 참여하고 있음.
미래에는 모든 사람들의 디지털 세계와의 상호작용이 AI 어시스턴트를 통해 이루어질 것으로 예상됨.

00:00Even if you train a system to have a world model that can predict what's going to happen next,

00:03the world is really complicated and there's probably all kinds of situations that the system

00:07hasn't been trained on and need to, you know, fine-tune itself as it goes. The question of how

00:13we organize AI research going forward, which is somewhat determined by how afraid people are of

00:18the consequences of AI. So if you have a rather positive view of the impact of AI on society and

00:24you trust humanity and society and democracies to use it in good ways, then the best way to make

00:29progress is to open research.

00:31AI might be the most important new computer technology ever. It's storming every industry

00:37and literally billions of dollars are being invested, so buckle up. The problem is that

00:43AI needs a lot of speed and processing power. So how do you compete without cost spiraling

00:49out of control? It's time to upgrade to the next generation of the cloud Oracle

00:56Cloud Infrastructure, or OCI. OCI is a single platform for your infrastructure, database,

01:03application development and AI needs. OCI has four to eight times the bandwidth of other clouds,

01:12offers one consistent price instead of variable regional pricing and, of course,

01:18nobody does data better than Oracle. So now you can train your AI models at twice the speed and

01:25less than half the cost of other clouds. If you want to do more and spend less, like Uber 8x8,

01:33and Databricks Mosaic, take a free test drive of OCI at oracle.com/eyeonai. That's E-Y-E-O-N-A-I

01:46all run together oracle.com/eyeonai.

01:52Hi, I'm Craig Smith. This is Eye on AI. In this episode, I speak again with Yann LeCun,

01:59one of the founders of deep learning and someone who followers of AI should need no introduction

02:05to. Yann talks about his work on developing world models, on why he does not believe AI research

02:13poses a threat to humanity and why he thinks open source AI models are the future. In the course of

02:21the conversation we talk about a new model Gaia I, developed by a company called Wayve.AI. I'll

02:29have an episode with Wayve's founder to further explore that world model, which has produced

02:36some startling results. I hope you find the conversation with Yann as enlightening as I did.

02:43I mean, first, the notion of a world model is the idea that the system would get some

02:48idea of the state of the world and be able to predict the sort of following states of the

02:54world resulting from just the natural evolution of the world or resulting from an action that

02:58the agent might take. If you have an idea of the state of the world and you imagine

03:04an action that you're going to take and you can predict the resulting state of the world,

03:11then that means you can predict what's going to happen as a consequence of a sequence of actions.

03:14That means you can plan a sequence of actions to arrive at a particular goal. That's really what

03:20a world model is. At least that's what the Wayve people have understood the word in other contexts,

03:28like in the context of optimal control and robotics and things like that. That's

03:34what a world model is. Now there's several levels of complexity of those world models,

03:38whether they model yourself, the agent, or whether they model the external world, which is much more

03:45complicated. Training a world model basically consists in just observing the world go by and

03:56then learning to predict what's going to happen next, or observing the world taking an action and

04:01then observing the resulting effect, an action that you take as an agent or an action that you see

04:08other agents taking. That establishes causality. Essentially, you could think of this as a causal

04:16model. Those models don't need to predict all the details about the world, they don't need to be

04:24generative, they don't need to predict exactly every pixels in a video, for example, because

04:31what you need to be able to predict is enough details, some sort of abstract representation,

04:37to allow you to plan. You're assembling something out of wood and you're going to put two planks

04:49together and attach them with screws. It doesn't matter the details of which type of screwdriver

04:56you're using or the size of the screw within some limits and things like that. There are details

05:01that in the end don't matter as to what the end result will be or the precise grain of the wood

05:08and things of that type. You need to have some abstract level of representation within which you

05:14can make the prediction without having to predict every detail. That's why those JEPA architectures

05:21I've been advocating are useful. Models like the Gaia 1 model from Wayve actually make predictions

05:29in an abstract representation space. There's been a lot of work in that area for years, also at

05:34FAIR (Facebook AI Research), but generally the abstract representations were pre-trained. So, the encoders that would take

05:41images from videos and then encode them into some representation were trained in some other way. The

05:47progress we've made over the last six months in self-supervised learning for images and video

05:53is that now we can train the entire system to make those predictions simultaneously. We have systems

06:00now that can learn good representations of images. The basic idea is very simple. You take an image,

06:10you run it through an encoder, then you corrupt that image. You mask parts of it,

06:17for example, or you transform it in various ways. You blur it, you change the colors, you change the

06:24framing a little bit and you run that corrupted image through the same encoder or something

06:29very similar, and then you train the encoder to predict the features of the complete image from

06:36the features of the corrupted one. You're not trying to reconstruct the perfect image, you're

06:47just trying to predict the representation of it, and this is different. This is not generative in

06:52the sense that it does not produce pixels, and that's the secret to getting self-supervised learning to

06:57work in the context of images and video. You don't want to be predicting pixels. It doesn't work. You

07:04can produce pixels as an afterthought, which is what the Gaia system is doing by sticking a

07:08decoder on it and with some diffusion models that will produce a nice image. But that's kind of a

07:13second step. If you train the system by predicting pixels, you just don't get good representations,

07:19you don't get good predictions, you get blurry predictions most of the time. So

07:24that's what makes learning from images and video fundamentally different from learning from text,

07:31because in text you don't have that problem. It's easy to predict words, even if you cannot make a

07:37perfect prediction, because language is discrete. So language is simple compared to the real world.

07:43And there's a lot written right now about the energy required

07:52in the computational resources, the GPUs required to train language

07:59models. Is it less in training a world model, like using I-JEPA architecture?

08:07Well, it's hard to tell because there is no equivalent training procedure,

08:13self-supervised training procedure for video,

08:15for example, that does not use JEPA. The ones that are generative don't really work.

08:22Yeah, yeah Well, but this architecture could also be applied to language, couldn't it?

08:30Oh yeah, absolutely yeah. So you could very well use a JEPA architecture that

08:36makes predictions in representation space and apply it to language.

08:41Yeah, definitely, and in that case would it be less computationally

08:47intense than training a large language model. It's possible.

08:54It's not entirely clear either. I mean, there is some advantage, regardless of what technique

09:00you're using, to making those models really big. They just seem to work better if you make them

09:05big. So if you make them bigger, right. So scaling is useful. Contrary to some claims, I do not

09:15believe that scaling is sufficient. So, in other words, we're not going to get anywhere close to

09:20human level AI, in fact, not even animal level AI, by simply scaling up language models, even

09:32multi-model language models that we apply to video. We're going to have to find new concepts,

09:36new architectures and I've written a vision paper about this a while back of a different

09:44type of architecture that would be necessary for this. So scaling is necessary, but not sufficient,

09:52and we're missing some basic ingredients to get to human level AI. We're fooled by the fact that LLMs

10:01are fluent and so we think that they have human level intelligence because they can manipulate

10:06language, but that's false and in fact, there's a very good symptom for this, which is that we have

10:17systems that can pass the bar exam, but answering questions from text by basically regurgitating

10:25what they've learned more or less by rote, but we don't have completely autonomous level five self

10:33driving cars, or at least no system that can learn to do this in about 20 hours of practice just like

10:40any 17-year-old, and we certainly don't have any domestic robot that can clear up the dinner table

10:48and fill up the dishwasher - at task that any 10-year-old can learn in one shot. So clearly

10:54we're missing something big, and that something is an ability to learn how the world works and

10:59the world is much more complicated than language and also being able to plan and reason. Basically

11:06having a mental world model of what goes on that allows us to plan and predict consequences

11:12of actions. That's what we're missing. And it's going to take a while before we figure this out.

11:18You were on another paper that talked about augmented language models and the

11:30embodied Turing test. Was that the same paper? The embodied Turing test? Can you

11:36talk about that? First of all, what is the embodied Turing test? I didn't.

11:41I didn't quite understand that. Well, okay, it's, it's a different concept,

11:48but it's basically the idea that you it's based on the, on the, the Moravec paradox,

11:56right? So Moravec many years ago noticed that things that appeared difficult for

12:03humans turned out to sometimes be very easy for computers to do, like playing chess,

12:09much better than humans, or, I don't know, computing integrals or whatever,

12:13certainly doing arithmetics. But then there are things that we take for granted as humans that we

12:19don't even consider them intelligent tasks that we are incapable of reproducing with computers. And

12:25so that's where the embodied Turing test comes in. Like you know, observe what a cat can do,

12:30or how fast a cat can learn new, new, new tricks. Or you know how a cat can plan to jump on,

12:39you know, a bunch of different furniture to get to the top of wherever it wants to go. That's an

12:45amazing feat that we can't reproduce with robots today. So that's kind of the embodied Turing test,

12:52if you want, like you know, can you make a robot that can behave, have behaviors that are

13:00indistinguishable from those of animals first of all, and can acquire new ones, at the same, with the

13:06same efficiency as animals. Then the augmented LLM paper is different. It's about how do you sort of

13:15minimally change large language models so that they can use tools, so they can to some extent

13:22plan actions? Like you know, you need to compute the product of two numbers, right, you just call

13:27a calculator and you know you're going to get the product of those two numbers. And LLMs are

13:31notoriously bad for arithmetics, so they need to do this kind of stuff or do a search, you know, using

13:37a search engine or database look up, or something like that. So there's a lot of work on this right

13:42now and it's somewhat incremental, like you know. How can you sort of minimally change LLM and

13:46take advantage of their current capabilities but still augment them with the ability to use tools?

13:54Yeah, and I don't want to get too much into the threat debate, but you know you're on

14:02one side. Your colleagues Geoff and Yoshua are on the other. I recently saw a picture of the

14:09three of you. I think you put that up on social media saying how you know you can disagree but

14:17still be friends. This idea of augmenting language models with stronger reasoning

14:27capabilities and the ability and agency, the ability to use tools, is precisely what Geoff

14:34and Yoshua are worried about. Can you just get into why are you not worried about that?

14:43Okay, so first, first of all, this is not necessarily what you're describing,

14:49is not necessarily what they are afraid of. They, they, they are alerting people

14:57and various governments and others about various dangers that they perceive. Okay, so one danger,

15:03one set of dangers are relatively short term. There are things like, you know, bad people will

15:09use technology for bad things. What can bad people use powerful AI systems for? And one concern

15:16that you know governments have been worried about and Intelligence agencies encounter intelligence

15:24and stuff like that is, you know, could badly intentioned organizations or countries use LLMs

15:31to help them, I don't know, design pathogens or chemical weapons or other things, or cyb, or

15:38cyber attacks? You know things like that. Right Now, those problems are not new. Those problems

15:43have been with us for a long time and the question is what incremental help would AI systems bring to

15:51the table? So my opinion is that, as of today, AI systems are not sophisticated enough to provide

16:00any significant help for such badly intentioned people, because those systems are trained with

16:07public data that is publicly available on the internet and they can't really invent anything.

16:11They're going to regurgitate with a little bit of interpolation if you want, but they cannot produce

16:19anything that you can't get from a search engine in a few minutes, so that actually that claim

16:27is being tested at the moment. There are people who are actually kind of trying to figure it out,

16:31like is it the case that you can actually do something - You're able to do something more

16:36dangerous with the sort of current AI technology that you can't do with a search engine? Results

16:42are not out yet, but my hunch is that you know it's not going to enable a lot of people to do

16:49significantly bad things. Then there is the issue of things like code generation for cyber, cyber

16:55attacks and things like this, and those problems have been with us for years. And the interesting

17:00thing that most people should know, like you know also, for disinformation or attempts to corrupt the

17:06electoral process and things like this, and what's very important for everyone to know is that the

17:12best countermeasures that we have against all of those attacks currently use AI massively. Okay,

17:19so AI is used as a defense mechanism against those attacks. It's not actually used to do the

17:26attacks yet, and so now it becomes the question of you know who has the better system? Like other

17:33countermeasures, is the AI used by counter, by the countermeasures, but significantly better

17:39than the AI used by the attackers so that, you know,

17:43the problem is satisfactorily mitigated, and that's where we are. Now, the good news is that

17:50there are many more good guys than bad guys at the moment. They're usually much more competent,

17:56they're usually much more sophisticated, they're usually much better funded and they have a strong

18:02incentive to take down the attackers. So it's a game of cat and mouse, just like every yeah, every

18:10security that's ever existed. There's nothing new there. Okay, no, nothing quite so new.

18:16Yeah, but okay, but then there is the question of existential risk, right, and this is something

18:25that both Geoff and Yoshua have been thinking of fairly recently. So for Geoff, it's only sort of

18:32just before last summer that he became he started thinking about this because before he thought he

18:38was convinced that the kind of algorithms that we had were significantly inferior to the kind of

18:43learning algorithm that the brain used, and the epiphany he had was that, in fact, no, because

18:51looking at the capabilities of large language models that can do pretty amazing things with a

18:56relatively small number of neurons and synapses, he said maybe they're more efficient than the

19:00brain and maybe the learning algorithm that we use, back propagation, is actually better than whatever

19:04it is that the brain uses. So he started thinking about, like you know what are the consequences,

19:09and but that's very recent and in my opinion he hasn't thought about this enough. Yoshua went

19:17to a similar epiphany last winter where he started thinking about the long-term consequences and came

19:27to the conclusion also that there was a potential danger. They're both convinced that AI has

19:33enormous potential benefits. They're just worried about the dangers. And they're both worried

19:38about the dangers because they have some doubts about the ability of our institutions to do the

19:46best with technology Whether they are political, economic, geopolitical, financial institutions or

19:57industrial to do the right thing, to be motivated by the right thing. So if you trust the system,

20:08if you trust humanity and democracy, you might be entitled to believe that society is going to make

20:21the best use of future technology. If you don't believe in the solidity of those institutions,

20:28then you might be scared. I think I'm more confident in humanity and democracy than they are,

20:35and whatever current systems than they are. I've been thinking about this problem for much longer,

20:40actually, since at least 2014. So when I started FAIR at Facebook at the time, it became pretty

20:49clear pretty early on that deploying AI systems was going to have big consequences on people and

20:57society, and we got confronted to this very early, and so I started thinking about those problems

21:02very early on. Things like countermeasures against bias in AI systems, systematic bias,

21:10countermeasures against attacks, or detection of hate speech in every language these are things

21:18that people at FAIR (Facebook AI Research) worked on and then were eventually deployed. To just to give you

21:23an example, the proportion of hate speech that was taken down automatically by AI systems five

21:28years ago, in 2017, was about 20% to 25%. Last year it was 95%, and the difference is entirely

21:37due to progress in natural language understanding. Entirely due to transformers that are pretrained,

21:43self-supervised and can essentially detect hate speech in any language. Not perfectly

21:48Nothing is perfect, it's never perfect. But AI is just massively there and that's the solution. So

21:54I started thinking about those issues, including existential risk, very early on, in fact, in 2015,

22:01early 2016, actually, I organized a conference hosted at NYU on the future of AI where a lot

22:07of those questions were discussed. I invited people like Nick Bostrom and Eric Schmidt and

22:16Mark Schroepfer, who was the CTO of Facebook at the time, Demis Hassabis, a lot of people,

22:23both from the academic and AI research side and from the industry side, and there were two days,

22:29a public day and kind of a more private day. What came out of this is the creation of an

22:33institution called the Partnership on AI. This is a discussion I had with Demis Hassabis, which was:

22:41would it be useful to have a forum where we can discuss, before they happen, sort of bad things

22:46that could happen as a consequence of deploying AI? Pretty soon, we brought on board Eric Horvitz

22:54and a bunch of other people. We co-founded this thing called the Partnership on AI, which

22:58basically has been funding studies about AI ethics and consequences of AI and publishing guidelines

23:09about how you do it right to minimize harm. So this is not a new thing for me. I've been thinking

23:14about this for 10 years essentially, whereas for Yoshua and Geoff it's much more recent.

23:20Yeah, but nonetheless, this augmented AI or augmented language models that have

23:29stronger reasoning and agency raises the threat,

23:37regardless of whether or not it can be countered to a higher level.

23:42Right, okay. So I guess the question there becomes what is the blueprint of future AI systems that

23:50will be capable of reasoning and planning, will understand how the world works, will be able to

23:58use tools and have agency and things like that? Right? And I tell you they will not

24:04be autoregressive LLMs. So the problems that we see at the moment of autoregressive LLMs are the

24:11fact that they hallucinate, they sometimes say really stupid things, they don't really have

24:17a good understanding of the world. People claim that they have some simple world model, but it's

24:22very implicit and it's really not good at all. For example, you can tell an LLM that A is the

24:29same as B and then you ask if B is the same as A and it will say I don't know or no, right? I mean,

24:36those things don't really understand logic or anything like that, right? So the type of system

24:44that we're talking about that might approach animal level intelligence, let alone human level intelligence,

24:52have not been designed. They don't exist, and so discussing their danger and their potential harm

25:00is a bit like discussing the sex of angels at the moment, or, to be a little more accurate,

25:08perhaps it would be kind of like discussing how we're going to make transatlantic flight at

25:14near the speed of sound safe when we haven't yet invented the turbojet in 1925. We can speculate,

25:24but how did we make a turbojet safe? It required decades of really careful engineering to make

25:33them incredibly reliable and now we can run like halfway around the world with a two-engine

25:43turbojet aircraft. I mean, that's an incredible feat. And it's not like people were discussing

25:51sort of philosophical questions about how you make turbojet safe. It's just really careful

25:55and complicated engineering that no one none of us would understand. So you know, how can we ask the

26:07AI community now to explain how AI systems are going to be safe? We haven't invented them yet,

26:12yeah, okay. That said, I have some idea about how we can design them so that they have these

26:19capabilities and, as a consequence, how they will be safe. I call this objective-driven AI, so what

26:27that means is essentially systems that produce their answer by planning their answer so as to

26:36satisfy an objective or a set of objectives. So this is very different from current LLMs. Current

26:42LLMs just produce one word after the other, or one token which is a transferable unit. It doesn't

26:47matter. They don't really think and plan ahead. As we said before, they just produce one word

26:52after the other. That's not controllable. The only thing we can do is see if what they've produced,

26:59check if what they've produced satisfies some criterion or set of criteria, and then not

27:05produce an answer or produce a non-answer if the answer that was produced isn't appropriate. But we

27:13can't really force them to produce an answer that satisfies a set of objectives. So objective-driven

27:21AI is the opposite. The only thing that the system can produce are answers that satisfy

27:29a certain number of objectives. So what objective would be? Did you answer the question? Another

27:35objective could be, Is your answer understandable by a 13-year-old, because you're talking to a

27:4013-year-old? Another would be is this, I don't know, terrorist propaganda or something? You can have a

27:49number of criteria, like these, guardrails that would guarantee that the answer that's produced

27:55satisfies certain criteria, whatever they are? Same for a robot, you could guarantee that the

28:00sequence of actions that is produced will not hurt anyone. Like you can have very low level

28:06guardrails of this type that say okay, you have humans nearby and you're cooking, so you have a

28:12big knife in your hand, don't flair your arms, okay, that would be a very simple guardrail to

28:17impose, and you can imagine having a whole bunch of guardrails like this that will guarantee that

28:22the behavior of those systems would be safe and that their primary goal would be to be basically

28:30subservient to us. So I do not believe that we'll have AI systems that can work, that will not be

28:39subservient to us, will define their own goals - they will define their own sub-goals, but those

28:44sub-goals would be sub-goals of goals that we set them, and will not have all kinds of guardrails

28:50that will guarantee their safety. And we're not going to, It's not like we're going to invent

28:54a system and make a gigantic one that we know will have human-level AI and just turn it on and then,

28:59from the next minute, is going to take over the world. That's completely preposterous. What we're

29:04going to do is try with small ones, maybe as smart as a mouse or something, maybe a dog,

29:09maybe a cat, maybe a dog maybe and work our way up and then put some more guardrails, basically like

29:16we've engineered more and more powerful and more reliable turbojets. It's an engineering problem.

29:22Yeah, yeah, you were also on a paper. Maybe this is the one that talked about

29:29the embodied Turing test on neuro-AI. Can you explain what neuro-AI is?

29:40Okay. Well, it's the idea that we should get some inspiration from neuroscience to build

29:48AI systems and that there is something to be learned from neuroscience and from cognitive

29:57science to drive the design of AI systems. Some inspiration, something to be learned,

30:05as well as the other way around. What's interesting right now is that the best models

30:09that we have of how, for example, the visual cortex works is convolutional neural networks,

30:16which are also the models that we use to recognize images, primarily in artificial systems. There is

30:24information being exchanged both ways. One way to make progress in AI is to ignore nature and

30:35just try to solve problems in an engineering fashion, if you want. I found interaction with

30:45neuroscience always thought-provoking. You don't want to be copying nature too closely,

30:52because there are details in nature that are irrelevant and there are principles on which

30:59natural intelligence is based that we haven't discovered. But there is some inspiration to have,

31:04certainly convolutional nets were inspired by the architecture of the visual

31:08cortex. The whole idea of neural nets and deep learning came out of the idea that intelligence

31:14can emerge from a large collection of simple elements that are connected with each other and

31:19change the nature of their interactions. That's the whole idea. Inspiration from neuroscience

31:27has been extremely beneficial so far, and the idea of neural AI is that you should keep going.

31:34You don't want to go too far. Going too far, for example, is trying to reproduce some aspect of the

31:41functioning of neurons with electronics. I'm not sure that's a good idea. I'm skeptical about this,

31:49for example. So your research right now, are you, your main focus is

31:57on furthering the JEPA architecture into other modalities, or where are you headed?

32:06Yeah, so, I mean, the long term goal is, you know, to get machines to be as intelligent and learn

32:14as efficiently as animals and humans. Okay, and the reason for this is that we need this because

32:19we need to amplify human intelligence, and so intelligence is the most needed commodity that

32:25we want in the world, right? And so we could, you know, possibly bring a new renaissance to

32:32humanity if we could amplify human intelligence using machines, which we are already doing with

32:37computers, right, I mean, that's pretty much what they've been designed to do. But even more,

32:42you know, imagine a future where every one of us has an intelligent assistant with us at all times.

32:53They can be smarter than us. We shouldn't feel threatened by that. We should feel like we are,

32:59like, you know, a director of a big lab or a CEO of a company that has a staff working for them of

33:06people who are smarter than themselves. I mean, we're used to this already. I'm used to this,

33:10certainly working with people who are smarter than me. So we shouldn't feel threatened by this,

33:15but it's going to empower a lot of us, right, and humanity as a whole. So I think that's a

33:23good thing. That's the overall practical goal. If you want right. Then there's a scientific question

33:28that's behind this, which is really what is intelligence and how do you build it? And

33:33then which is you know, how can a system learn the way animals and humans seem to be learning

33:38so efficiently? And the next thing is, how do we learn how the world works? By observation,

33:45by watching the world go by, through vision and all the other senses. And animals can do

33:52this without language, right? So it has nothing to do with language. It has to do with learning

33:57from sensory percepts and learning mostly without acting, because any action you take

34:03can kill you. So it's better to be able to learn as much as you can without actually acting at all,

34:08just observing, which is what babies do in the first few months of life. They can't hardly do

34:13anything, right? So they mostly observe and learn how the world works by observation.

34:18So what kind of learning takes place there? So that's obviously kind of self-supervised,

34:23right, it's learning by prediction. That's an old idea from cognitive science, and the thing is,

34:30you know, we can learn to predict videos. But then we noticed that predicting videos,

34:34predicting pixels in video, is so infinitely complicated that it doesn't work. And

34:39so then came this idea of JEPA right. Learn representations so that you can make predictions

34:44in representation space, and that turned out to work really well for learning image features,

34:50and now we're working on getting this to work for video and eventually we'll be able to use this to

34:56learn world models where you show a piece of video and then you say I'm going to take this action,

35:03predict what's going to happen next in the world and you know, which is a bit where the

35:10Gaia system from Wayve is doing at a high level. But we need this at various levels of abstraction so

35:16that we can build, you know, systems that are more general than autonomous driving.

35:23Okay, that's the yeah, and it's my fault so I won't go over the hour,

35:33but is it conceivable that someday there will be a model that you,

35:44maybe embodied in a robot that is ingesting video from its environment and learning,

35:54as it's just continuously learning and getting smarter, and smarter, and smarter?

36:01Yeah, I mean, that's kind of a bit of a necessity, the reason being that you know,

36:09even if you train a system to have a world model that can predict what's going to happen next.

36:13The world is really complicated and there's probably all kinds of situations that you,

36:17you know the system hasn't been trained on and needs to, you know,

36:21fine tune itself as it goes. So you know, animals and humans do this early in life by playing. So

36:32play is a way of learning your world model in situations that basically you won't hurt you,

36:42and but then during life, of course, you know, when we learn to drive, there's all

36:46kinds of these mistakes that we do initially, that we don't do after having some experience,

36:52and that's because we're fine tuning our world model to some extent. Yeah, learning a new task,

36:58we're basically just learning a new version of our world model, right? So, yeah, I mean, this

37:04type of continuous, continual learning is going to have to be present, but the overall power and

37:10intelligence of the system will be limited by, you know, how much a computer neural nets it is

37:15using and various other constraints. You know, computational constraints basically.

37:20You know you're still young and this. I'm not sure about that. Well,

37:28you're younger than Geoff. Let me put it that way.

37:30I'm younger than Geoff, I'm older than Yoshua, yeah.

37:35But this, the progress you've made on world models is fairly rapid from my point of view

37:42watching it. Are you hopeful that within your career you'll have embodied robots that are

37:55building world models through their interaction in reality and then being able to? Well, I guess

38:01the other question on world models: do you then combine it with a language model to do reasoning,

38:11or is the world model able to do reasoning on its own? But are you hopeful that in your career

38:17you'll get to the point where you'll have this continuous learning in a world model?

38:22Yeah, I sure hope so. I might have another, you know, 10 useful years or something like this in

38:28research before my brain, you know, turns into bechamel sauce, but, or something like that.

38:36You know 15 years, if I'm lucky, so, or perhaps less, but yeah, I hope that there's going to be

38:44breakthroughs in that direction during that time. Now, whether that will result in the

38:50kind of artifact that you're describing you know robots that can Like you know domestic robots, for

38:57example, or self-driving cars that can learn fairly quickly by themselves, I don't know, because there

39:04might be all kinds of obstacles that we have not envisaged that may appear on the way. No, it's a

39:12constant in the history of AI that you have some new idea and a breakthrough and you think that's

39:19going to solve all the world's problems, and then you're going to hit a limitation and you have

39:25to go beyond that limitation. So it's like you know you're climbing a mountain. You find a way

39:30to climb the mountain that you're seeing and you know that once you get to the top you will have

39:36the problem solved, because now it's, you know, a gentle slope down and once you get to the top,

39:42you realize that there is another mountain behind it that you hadn't seen. So that's been

39:48the history of AI right, where people have come up with sort of new concepts, new ideas, new ways to

39:55approach AI, reasoning, whatever perception, and then realize that their idea basically was very

40:04limited. And so you know this. Inevitably we're trying to figure out what's the next revolution in

40:17AI. That's what I'm trying to figure out, and so you know, learning how the world works from video,

40:22having systems that have world models, allow systems to reason and plan. And there's something

40:29I want to be very clear about, which is an answer to your question, which is that you can have

40:37systems that reason and plan without manipulating language. Animals are capable of amazing feats of

40:44planning and also, to some extent, reasoning. They don't have language, at least most of them don't

40:52and so many of them don't have culture because they are mostly solitary animals. So you know,

41:01it's only the animals that have some level of culture. So the idea that a system can plan and

41:10reason is not connected with the idea that you can manipulate language. Those are two different

41:16things. It needs to be able to manipulate abstract notions, but those notions do not necessarily

41:23correspond to linguistic entities like words or things like that. We can have mental images. If

41:29you want to think like you do, ask a physicist or mathematician you know how their reason is very

41:35much in terms of sort of mental models, nothing to do with language Then you can turn things into,

41:40into language. But that's a different story. That's the second, second step. So you know,

41:49we're going to have to figure out how to do this. Reasoning, hierarchical planning in machines

41:55reproduce this first and then, of course, you know, sticking language on top of it will help,

42:01like it will make those systems smarter and be able you know, it will allow us to communicate

42:05with them and teach them things and they're going to be able to teach us things and stuff like that.

42:10But this is a different question really, the question of how we organize AI research

42:16going forward, which is somewhat determined by how afraid people are of the consequences of AI. So if

42:21you have a rather positive view of the impact of AI on society and you trust humanity and society

42:28and democracy is to use it in good ways, then the best way to make progress is to open research and

42:37for the people who are afraid of the consequences, whether they are societal or geopolitical,

42:44they're putting pressure on governments around the world to regulate AI in ways that basically

42:49limit access, particularly of open source code and things like that, and it's a big debate at

42:57the moment. I'm very much on the side, so is Meta, very much on the side of open research.

43:03Yeah, actually that was something I was going to ask you and now that you brought it up,

43:09because I've been talking to people about this and there is a view that, aside from the risks of open

43:18source again Geoff Hinton saying, would you open source thermonuclear weapons? Aside from that,

43:28the question is whether open source can marshal the resources to compete with proprietary models

43:40and because of the tremendous resources required for when you're scaling these models. And there's

43:49a question as to whether or not Meta will continue to open source future versions

43:55of Llama or not continue to open source, but whether it'll continue to invest the

44:02resources needed to push the open source models. So what do you think about that?

44:11Okay, there's a lot to say about this, Okay. So first thing is there's no question that Meta will

44:16continue to invest the resources to build better and better AI systems because it needs it for its

44:21own products. So the resources will be invested. Now the next question is will we continue to open

44:30source the base models? And the answer is probably yes, because that creates an ecosystem on top

44:37of which an entire industry can be built, and there is no point having 50 different companies

44:44building proprietary, closed systems when you can have one good open source base model that

44:52everybody can use. It's wasteful and it's not a good idea. And another reason for having open

45:00source models is that nobody has, no entity as powerful as it thinks it is, has a monopoly

45:09on good ideas. And so if you want people we can have good, new, innovative ideas to contribute,

45:15you need an open source platform. If you want the academic world to contribute, you need

45:19open source platforms. If you want the startup world to be able to build customized products,

45:24you need open source base models, because they don't have the resources to build, to

45:28train large models. And then there is the history that shows that for foundational technology, for

45:39infrastructure type technology, open source always wins. It's true of the software infrastructure of

45:50the internet. In the early 90s and mid 90s there was a big battle between Sun Microsystems and

45:55Microsoft to deliver the software infrastructure of the internet - operating systems, web servers,

46:05web browsers and various server side and client side frameworks. They both lost. Nobody is talking

46:12about them anymore. The entire world of the web is using Linux and Apache and MySQL and JavaScript,

46:24and even the basic core code for web browsers is open source. So open source won by a huge margin.

46:35Why? Because it's safer, it gathers more people to contribute. All the features are unnecessary,

46:42it's more reliable, Vulnerabilities are fixed faster and it's customizable. So anybody

46:51can customize Linux to run on whatever hardware they want. So open source wins.

46:57But it's the same thing.

47:00It's going to be the same thing. It's inevitable. The people now who are climbing up like OpenAI,

47:08their system is based on publications from all of us and from open platforms

47:17like PyTorch. ChatGPT is built using PyTorch. PyTorch was produced originally by Meta. Now

47:22it's owned by the Linux Foundation. It's open source. They've contributed to it, by the way,

47:29their LLM is based on transformer architectures invented at Google. All the tricks to train,

47:36all those things came out of various papers from all kinds of different institutions,

47:41including academia. All the fine-tuning techniques are the same. So nobody works in a vacuum. The

47:48thing is, nobody can keep their advance and their advantage for very long if they are secretive.

47:57Yeah, except that with these models, because they're so compute intensive and they cost so

48:02much money to train, you need somebody like Meta who's going to be willing to

48:09build them and open source them. That's why, when I was asking whether they'll continue,

48:17obviously Meta will continue building resource intensive models,

48:24but the question is whether they'll continue to open source them.

48:30I'm telling you the only reason why Meta could stop open sourcing models is legal. So if

48:38there is a law that outlaws open source AI systems above a certain level of sophistication,

48:46then of course we can't do it. If there are laws that, in the US or across the world,

48:55make it illegal to use public content to train AI systems, then it's the end of AI for everybody,

49:03not just for open source, or at least the end of the type of AI that we are talking about today.

49:09We might have new AI in the future, but that doesn't require as much data. And then there

49:15is liability. If you believe that someone is doing something bad with an AI system that was

49:28open sourced by Meta, then Meta is liable. Then Meta will have a big incentive not to release it,

49:35obviously. So the entire question about this is around legal reasons and political decisions.

49:41But on the idea of open source winning, don't you need more people or more companies like

49:47Metal building the foundation models and open sourcing them? Or could it be

49:53an open source ecosystem win based on a single company building the models?

50:00No, I mean you need two or three, and there are two or three, right. I mean,

50:03there is this Hugging Face. There is Mistral in France, who is also embracing open source LLM

50:11they're very good LLM. It's a small one, but it's very good. There are academic efforts like LAION.

50:20They don't have all the resources they need, but they collect the data that is used by everyone,

50:24so everybody can contribute. One thing that I think is really important to understand

50:28also is that there is a future in which I described earlier, in which every one of us,

50:35every one of our interactions with the digital world, would be mediated by an AI assistant,

50:41and this is going to be true for everyone around the world, right? Everyone who has any kind of

50:46smart device. Eventually, it's going to be in our augmented reality glasses, but for the time being,

50:52in our smartphones. And so imagine that future where you are, I don't know, from Indonesia or

51:06Senegal or France and your entire digital diet is done through the mediation of an AI system.

51:19Your government is not going to be happy about it. Your government is going to want the local culture

51:24to be present in that system. It doesn't want that system to be closed sourced and controlled

51:30by a company on the west coast of the US. So just for reasons of preserving the diversity

51:40of culture across the world and not having our entire information diet being biased by whatever

51:47it is that some company on the west coast of the US thinks there's going to need to be open source

51:53platforms and they're going to be predominant in at least outside the US for that reason, including

52:03China. There is all those talks about oh what if China puts their hand on our open source code? I

52:08mean, China wants control over its own LLM because they're a citizen to have access to certain types

52:15of information. So they're not going to use our LLMs, they're going to train theirs. That they

52:20already have. And nobody is particularly ahead of anybody else by more than about a year.

52:28And China is pushing open source. I mean, they're very pro open source within their ecosystems.

52:36Some of them. There is no unified opinion there, but I mean it's the same in the West, right,

52:44there are some governments that are too afraid of the risks and then, or are thinking about it,

52:51and some others that are all for open source because they see this as the only way for

52:56them to have any influence on the information, the type of information and culture that would

53:03be mediated by those systems. So it's going to have to be like Wikipedia, right? Wikipedia is built

53:14by millions of people who contribute to it from all around the world, in all kinds of languages,

53:20and it has a system for vetting the information. The way AI systems of the future will be taught

53:26and will be fine tuned will have to be the same way. It will have to be crowd sourced,

53:31because something that matters to a farmer in Southern India is probably not going to

53:39be taken into account by the fine tuning done by some company on the West Coast of the US.

53:46AI might be the most important new computer technology ever. It's storming every industry

53:52and literally billions of dollars are being invested, so buckle up. The problem is that

53:57AI needs a lot of speed and processing power. So how do you compete without cost spiraling

54:04out of control? It's time to upgrade to the next generation of the cloud Oracle

54:10Cloud Infrastructure, or OCI. OCI is a single platform for your infrastructure, database,

54:18application development and AI needs. OCI has four to eight times the bandwidth of other clouds,

54:27offers one consistent price instead of variable regional pricing. And, of course,

54:32nobody does data better than Oracle. So now you can train your AI models at twice the speed and

54:39less than half the cost of other clouds. If you want to do more and spend less, like Uber,

54:478x8 and Databricks Mosaic, take a free test drive of OCI at oracle.com/eyeonai. That's E-Y-E-O-N-A-I

55:00all run together: oracle.com/eyeonai.

55:06That's it for this episode. I want to thank Yann for his time. If you want to read a transcript

55:12of this conversation, you can find one on our website eye-on.ai, that's eye-on.ai. And remember

55:22the singularity may not be near, but AI is changing your world, so best pay attention.

Guarda il video originale