💫 Riepilogo
인공지능 연구에서의 오픈 리서치와 AI의 위험성에 대한 Yann LeCun의 견해 및 세계 모델 훈련에 대한 이야기.
✨ Punti salienti📊 Trascrizione
Copia
Chatta con il video
AI 연구에서의 오픈 소싱은 사회에 긍정적인 영향을 믿고 인간과 사회 그리고 민주주의를 신뢰할 때 가장 진보된 방향으로 나아갈 수 있다.
00:00
시스템이 예측할 수 있는 세계 모델을 가지더라도, 복잡한 상황들을 다루기 위해서는 시스템이 자체적으로 조정해야 한다.
AI 연구의 조직 방식은 AI의 결과에 대한 사람들의 두려움에 따라 결정된다.
AI는 다른 산업에도 큰 영향을 미치고 있으며, 많은 투자가 이루어지고 있다.
이미지와 비디오의 맥락에서 작업을 할 때, 픽셀을 예측하려는 것은 좋지 않으며, 이미지와 비디오로부터 학습하는 것은 텍스트로부터 학습하는 것과 근본적으로 다르다.
06:57
픽셀을 예측하는 것은 좋은 표현을 얻지 못하고 흐릿한 예측을 얻는다.
텍스트는 이산적이기 때문에 단어를 예측하기 쉽다.
JEPA 아키텍처를 사용하여 언어에도 적용할 수 있다.
모델을 크게 만들면 더 잘 작동하는 경향이 있다.
Yann LeCun은 AI 시스템이 현재로서는 악의적인 사람들에게 큰 도움을 제공하지 못한다고 말하며, AI 시스템이 공개적으로 사용 가능한 데이터를 사용하고 있다는 이유를 제시한다.
13:54
Geoff와 Yoshua는 강력한 추론 능력과 도구 사용 능력을 가진 언어 모델을 우려하고 있다.
AI 시스템은 현재로서는 악의적인 사람들에게 유의미한 도움을 제공하지 못한다.
AI 시스템은 공개적으로 사용 가능한 데이터를 사용하여 훈련되며, 새로운 것을 창조할 수 없다.
인공지능 시스템은 우리에게 복종적이며 안전을 보장하는 가이드레일을 갖게 될 것이다.
27:49
인공지능 시스템은 우리가 설정한 목표의 하위 목표를 정의할 수 있지만, 우리에게 복종적일 것이다.
작은 규모의 인공지능 시스템부터 시작하여 점진적으로 발전시킬 것이다.
뉴로인공지능은 뇌과학에서 영감을 받아 구축되는 아이디어이다.
비디오를 통해 지속적으로 학습하고 더 똑똑해지는 로봇이나 모델이 언젠가는 가능할 것이다.
34:50
비디오를 통해 환경을 학습하는 모델이 필요하다.
복잡한 세상에서 fine tuning이 필요하다.
지속적인 학습이 필요하다.
인공지능 연구의 방향성은 AI의 결과에 대한 사람들의 두려움에 따라 결정될 것이다.
41:49
긍정적인 시각을 가진 사람들은 연구를 공개하는 것이 진전을 이루는 가장 좋은 방법이라고 생각한다.
그러나 일부 사람들은 AI의 결과에 대해 불안감을 가지고 있어 규제를 요구하고 있다.
오픈 소스 코드의 접근을 제한하는 것은 큰 논쟁거리이다.
AI 시스템을 훈련시키기 위해 공공 콘텐츠 사용이 불법이 되면, 오픈 소스나 다른 형태의 AI 개발이 어려워질 수 있음.
48:46
법적 이유와 정치적 결정에 따라 AI 개발이 제한될 수 있음.
Hugging Face, Mistral, LAION과 같은 기업과 학문적 노력들이 오픈 소스 AI 개발에 참여하고 있음.
미래에는 모든 사람들의 디지털 세계와의 상호작용이 AI 어시스턴트를 통해 이루어질 것으로 예상됨.
00:00Even if you train a system to have a world model  that can predict what's going to happen next,  
00:03the world is really complicated and there's  probably all kinds of situations that the system  
00:07hasn't been trained on and need to, you know,  fine-tune itself as it goes. The question of how  
00:13we organize AI research going forward, which is  somewhat determined by how afraid people are of  
00:18the consequences of AI. So if you have a rather  positive view of the impact of AI on society and  
00:24you trust humanity and society and democracies  to use it in good ways, then the best way to make  
00:29progress is to open research.   
00:31AI might be the most important new computer  technology ever. It's storming every industry  
00:37and literally billions of dollars are being  invested, so buckle up. The problem is that  
00:43AI needs a lot of speed and processing power.  So how do you compete without cost spiraling  
00:49out of control? It's time to upgrade to  the next generation of the cloud Oracle  
00:56Cloud Infrastructure, or OCI. OCI is a single  platform for your infrastructure, database,  
01:03application development and AI needs. OCI has  four to eight times the bandwidth of other clouds,  
01:12offers one consistent price instead of  variable regional pricing and, of course,  
01:18nobody does data better than Oracle. So now you  can train your AI models at twice the speed and  
01:25less than half the cost of other clouds. If you  want to do more and spend less, like Uber 8x8,  
01:33and Databricks Mosaic, take a free test drive of  OCI at oracle.com/eyeonai. That's E-Y-E-O-N-A-I  
01:46all run together oracle.com/eyeonai.   
01:52Hi, I'm Craig Smith. This is Eye on AI. In  this episode, I speak again with Yann LeCun,  
01:59one of the founders of deep learning and someone  who followers of AI should need no introduction  
02:05to. Yann talks about his work on developing world  models, on why he does not believe AI research  
02:13poses a threat to humanity and why he thinks open  source AI models are the future. In the course of  
02:21the conversation we talk about a new model Gaia  I, developed by a company called Wayve.AI. I'll  
02:29have an episode with Wayve's founder to further  explore that world model, which has produced  
02:36some startling results. I hope you find the  conversation with Yann as enlightening as I did.
02:43I mean, first, the notion of a world model  is the idea that the system would get some  
02:48idea of the state of the world and be able to  predict the sort of following states of the  
02:54world resulting from just the natural evolution  of the world or resulting from an action that  
02:58the agent might take. If you have an idea  of the state of the world and you imagine  
03:04an action that you're going to take and you  can predict the resulting state of the world,  
03:11then that means you can predict what's going to  happen as a consequence of a sequence of actions.  
03:14That means you can plan a sequence of actions to  arrive at a particular goal. That's really what  
03:20a world model is. At least that's what the Wayve  people have understood the word in other contexts,  
03:28like in the context of optimal control  and robotics and things like that. That's  
03:34what a world model is. Now there's several  levels of complexity of those world models,  
03:38whether they model yourself, the agent, or whether  they model the external world, which is much more  
03:45complicated. Training a world model basically  consists in just observing the world go by and  
03:56then learning to predict what's going to happen  next, or observing the world taking an action and  
04:01then observing the resulting effect, an action that  you take as an agent or an action that you see  
04:08other agents taking. That establishes causality.  Essentially, you could think of this as a causal  
04:16model. Those models don't need to predict all the  details about the world, they don't need to be  
04:24generative, they don't need to predict exactly  every pixels in a video, for example, because  
04:31what you need to be able to predict is enough  details, some sort of abstract representation,  
04:37to allow you to plan. You're assembling something  out of wood and you're going to put two planks  
04:49together and attach them with screws. It doesn't  matter the details of which type of screwdriver  
04:56you're using or the size of the screw within some  limits and things like that. There are details  
05:01that in the end don't matter as to what the end  result will be or the precise grain of the wood  
05:08and things of that type. You need to have some  abstract level of representation within which you  
05:14can make the prediction without having to predict  every detail. That's why those JEPA architectures  
05:21I've been advocating are useful. Models like the  Gaia 1 model from Wayve actually make predictions  
05:29in an abstract representation space. There's been  a lot of work in that area for years, also at  
05:34FAIR (Facebook AI Research), but generally the abstract representations were pre-trained. So, the encoders that would take  
05:41images from videos and then encode them into some  representation were trained in some other way. The  
05:47progress we've made over the last six months in  self-supervised learning for images and video  
05:53is that now we can train the entire system to make  those predictions simultaneously. We have systems  
06:00now that can learn good representations of images.  The basic idea is very simple. You take an image,  
06:10you run it through an encoder, then you  corrupt that image. You mask parts of it,  
06:17for example, or you transform it in various ways.  You blur it, you change the colors, you change the  
06:24framing a little bit and you run that corrupted  image through the same encoder or something  
06:29very similar, and then you train the encoder to  predict the features of the complete image from  
06:36the features of the corrupted one. You're not  trying to reconstruct the perfect image, you're  
06:47just trying to predict the representation of it,  and this is different. This is not generative in  
06:52the sense that it does not produce pixels, and  that's the secret to getting self-supervised learning to  
06:57work in the context of images and video. You don't  want to be predicting pixels. It doesn't work. You  
07:04can produce pixels as an afterthought, which  is what the Gaia system is doing by sticking a  
07:08decoder on it and with some diffusion models that  will produce a nice image. But that's kind of a  
07:13second step. If you train the system by predicting  pixels, you just don't get good representations,  
07:19you don't get good predictions, you get  blurry predictions most of the time. So  
07:24that's what makes learning from images and video  fundamentally different from learning from text,  
07:31because in text you don't have that problem. It's  easy to predict words, even if you cannot make a  
07:37perfect prediction, because language is discrete.  So language is simple compared to the real world.
07:43And there's a lot written right  now about the energy required  
07:52in the computational resources, the  GPUs required to train language  
07:59models. Is it less in training a world  model, like using I-JEPA architecture?
08:07Well, it's hard to tell because there  is no equivalent training procedure,  
08:13self-supervised training procedure for video,  
08:15for example, that does not use JEPA. The  ones that are generative don't really work.
08:22Yeah, yeah Well, but this architecture could  also be applied to language, couldn't it?
08:30Oh yeah, absolutely yeah. So you could  very well use a JEPA architecture that  
08:36makes predictions in representation  space and apply it to language.
08:41Yeah, definitely, and in that case  would it be less computationally  
08:47intense than training a large  language model. It's possible.
08:54It's not entirely clear either. I mean, there  is some advantage, regardless of what technique  
09:00you're using, to making those models really big.  They just seem to work better if you make them  
09:05big. So if you make them bigger, right. So scaling  is useful. Contrary to some claims, I do not  
09:15believe that scaling is sufficient. So, in other  words, we're not going to get anywhere close to  
09:20human level AI, in fact, not even animal level  AI, by simply scaling up language models, even  
09:32multi-model language models that we apply to  video. We're going to have to find new concepts,  
09:36new architectures and I've written a vision  paper about this a while back of a different  
09:44type of architecture that would be necessary for  this. So scaling is necessary, but not sufficient,  
09:52and we're missing some basic ingredients to get to  human level AI. We're fooled by the fact that LLMs  
10:01are fluent and so we think that they have human  level intelligence because they can manipulate  
10:06language, but that's false and in fact, there's a  very good symptom for this, which is that we have  
10:17systems that can pass the bar exam, but answering  questions from text by basically regurgitating  
10:25what they've learned more or less by rote, but we  don't have completely autonomous level five self  
10:33driving cars, or at least no system that can learn  to do this in about 20 hours of practice just like  
10:40any 17-year-old, and we certainly don't have any  domestic robot that can clear up the dinner table  
10:48and fill up the dishwasher - at task that any  10-year-old can learn in one shot. So clearly  
10:54we're missing something big, and that something  is an ability to learn how the world works and  
10:59the world is much more complicated than language  and also being able to plan and reason. Basically  
11:06having a mental world model of what goes on  that allows us to plan and predict consequences  
11:12of actions. That's what we're missing. And it's  going to take a while before we figure this out.
11:18You were on another paper that talked  about augmented language models and the  
11:30embodied Turing test. Was that the same  paper? The embodied Turing test? Can you  
11:36talk about that? First of all, what is  the embodied Turing test? I didn't.
11:41I didn't quite understand that. Well,  okay, it's, it's a different concept,  
11:48but it's basically the idea that you it's  based on the, on the, the Moravec paradox,  
11:56right? So Moravec many years ago noticed  that things that appeared difficult for  
12:03humans turned out to sometimes be very easy  for computers to do, like playing chess,  
12:09much better than humans, or, I don't  know, computing integrals or whatever,  
12:13certainly doing arithmetics. But then there are  things that we take for granted as humans that we  
12:19don't even consider them intelligent tasks that we  are incapable of reproducing with computers. And  
12:25so that's where the embodied Turing test comes  in. Like you know, observe what a cat can do,  
12:30or how fast a cat can learn new, new, new tricks.  Or you know how a cat can plan to jump on,  
12:39you know, a bunch of different furniture to get  to the top of wherever it wants to go. That's an  
12:45amazing feat that we can't reproduce with robots  today. So that's kind of the embodied Turing test,  
12:52if you want, like you know, can you make a  robot that can behave, have behaviors that are  
13:00indistinguishable from those of animals first of  all, and can acquire new ones, at the same, with the  
13:06same efficiency as animals. Then the augmented LLM  paper is different. It's about how do you sort of  
13:15minimally change large language models so that  they can use tools, so they can to some extent  
13:22plan actions? Like you know, you need to compute  the product of two numbers, right, you just call  
13:27a calculator and you know you're going to get  the product of those two numbers. And LLMs are  
13:31notoriously bad for arithmetics, so they need to do  this kind of stuff or do a search, you know, using  
13:37a search engine or database look up, or something  like that. So there's a lot of work on this right  
13:42now and it's somewhat incremental, like you know.  How can you sort of minimally change LLM and  
13:46take advantage of their current capabilities but  still augment them with the ability to use tools?
13:54Yeah, and I don't want to get too much into  the threat debate, but you know you're on  
14:02one side. Your colleagues Geoff and Yoshua are  on the other. I recently saw a picture of the  
14:09three of you. I think you put that up on social  media saying how you know you can disagree but  
14:17still be friends. This idea of augmenting  language models with stronger reasoning  
14:27capabilities and the ability and agency, the  ability to use tools, is precisely what Geoff  
14:34and Yoshua are worried about. Can you just  get into why are you not worried about that?
14:43Okay, so first, first of all, this is  not necessarily what you're describing,  
14:49is not necessarily what they are afraid  of. They, they, they are alerting people  
14:57and various governments and others about various  dangers that they perceive. Okay, so one danger, 
15:03one set of dangers are relatively short term.  There are things like, you know, bad people will  
15:09use technology for bad things. What can bad people  use powerful AI systems for? And one concern  
15:16that you know governments have been worried about  and Intelligence agencies encounter intelligence  
15:24and stuff like that is, you know, could badly  intentioned organizations or countries use LLMs  
15:31to help them, I don't know, design pathogens  or chemical weapons or other things, or cyb, or  
15:38cyber attacks? You know things like that. Right  Now, those problems are not new. Those problems  
15:43have been with us for a long time and the question  is what incremental help would AI systems bring to  
15:51the table? So my opinion is that, as of today, AI  systems are not sophisticated enough to provide  
16:00any significant help for such badly intentioned  people, because those systems are trained with  
16:07public data that is publicly available on the  internet and they can't really invent anything.  
16:11They're going to regurgitate with a little bit of  interpolation if you want, but they cannot produce  
16:19anything that you can't get from a search engine  in a few minutes, so that actually that claim  
16:27is being tested at the moment. There are people  who are actually kind of trying to figure it out,  
16:31like is it the case that you can actually do  something - You're able to do something more  
16:36dangerous with the sort of current AI technology  that you can't do with a search engine? Results  
16:42are not out yet, but my hunch is that you know  it's not going to enable a lot of people to do  
16:49significantly bad things. Then there is the issue  of things like code generation for cyber, cyber  
16:55attacks and things like this, and those problems  have been with us for years. And the interesting  
17:00thing that most people should know, like you know  also, for disinformation or attempts to corrupt the  
17:06electoral process and things like this, and what's  very important for everyone to know is that the  
17:12best countermeasures that we have against all of  those attacks currently use AI massively. Okay,  
17:19so AI is used as a defense mechanism against  those attacks. It's not actually used to do the  
17:26attacks yet, and so now it becomes the question  of you know who has the better system? Like other  
17:33countermeasures, is the AI used by counter, by  the countermeasures, but significantly better  
17:39than the AI used by the attackers so that, you know,  
17:43the problem is satisfactorily mitigated, and  that's where we are. Now, the good news is that  
17:50there are many more good guys than bad guys at  the moment. They're usually much more competent,  
17:56they're usually much more sophisticated, they're  usually much better funded and they have a strong  
18:02incentive to take down the attackers. So it's a  game of cat and mouse, just like every yeah, every  
18:10security that's ever existed. There's nothing new  there. Okay, no, nothing quite so new.   
18:16Yeah, but okay, but then there is the question  of existential risk, right, and this is something  
18:25that both Geoff and Yoshua have been thinking of  fairly recently. So for Geoff, it's only sort of  
18:32just before last summer that he became he started  thinking about this because before he thought he  
18:38was convinced that the kind of algorithms that  we had were significantly inferior to the kind of  
18:43learning algorithm that the brain used, and the  epiphany he had was that, in fact, no, because  
18:51looking at the capabilities of large language  models that can do pretty amazing things with a  
18:56relatively small number of neurons and synapses,  he said maybe they're more efficient than the  
19:00brain and maybe the learning algorithm that we use,  back propagation, is actually better than whatever  
19:04it is that the brain uses. So he started thinking  about, like you know what are the consequences,  
19:09and but that's very recent and in my opinion he  hasn't thought about this enough. Yoshua went  
19:17to a similar epiphany last winter where he started  thinking about the long-term consequences and came  
19:27to the conclusion also that there was a potential  danger. They're both convinced that AI has  
19:33enormous potential benefits. They're just worried  about the dangers. And they're both worried  
19:38about the dangers because they have some doubts  about the ability of our institutions to do the  
19:46best with technology Whether they are political,  economic, geopolitical, financial institutions or  
19:57industrial to do the right thing, to be motivated  by the right thing. So if you trust the system,  
20:08if you trust humanity and democracy, you might be  entitled to believe that society is going to make  
20:21the best use of future technology. If you don't  believe in the solidity of those institutions,  
20:28then you might be scared. I think I'm more  confident in humanity and democracy than they are,  
20:35and whatever current systems than they are. I've  been thinking about this problem for much longer,  
20:40actually, since at least 2014. So when I started  FAIR at Facebook at the time, it became pretty  
20:49clear pretty early on that deploying AI systems  was going to have big consequences on people and  
20:57society, and we got confronted to this very early,  and so I started thinking about those problems  
21:02very early on. Things like countermeasures  against bias in AI systems, systematic bias,  
21:10countermeasures against attacks, or detection of  hate speech in every language these are things  
21:18that people at FAIR (Facebook AI Research) worked on and then were eventually deployed. To just to give you  
21:23an example, the proportion of hate speech that  was taken down automatically by AI systems five  
21:28years ago, in 2017, was about 20% to 25%. Last  year it was 95%, and the difference is entirely  
21:37due to progress in natural language understanding.  Entirely due to transformers that are pretrained,  
21:43self-supervised and can essentially detect  hate speech in any language. Not perfectly  
21:48Nothing is perfect, it's never perfect. But AI is  just massively there and that's the solution. So  
21:54I started thinking about those issues, including  existential risk, very early on, in fact, in 2015,  
22:01early 2016, actually, I organized a conference  hosted at NYU on the future of AI where a lot  
22:07of those questions were discussed. I invited  people like Nick Bostrom and Eric Schmidt and  
22:16Mark Schroepfer, who was the CTO of Facebook  at the time, Demis Hassabis, a lot of people,  
22:23both from the academic and AI research side and  from the industry side, and there were two days,  
22:29a public day and kind of a more private day.  What came out of this is the creation of an  
22:33institution called the Partnership on AI. This is  a discussion I had with Demis Hassabis, which was:  
22:41would it be useful to have a forum where we can  discuss, before they happen, sort of bad things  
22:46that could happen as a consequence of deploying  AI? Pretty soon, we brought on board Eric Horvitz  
22:54and a bunch of other people. We co-founded  this thing called the Partnership on AI, which  
22:58basically has been funding studies about AI ethics  and consequences of AI and publishing guidelines  
23:09about how you do it right to minimize harm. So  this is not a new thing for me. I've been thinking  
23:14about this for 10 years essentially, whereas  for Yoshua and Geoff it's much more recent.
23:20Yeah, but nonetheless, this augmented AI  or augmented language models that have  
23:29stronger reasoning and agency raises the threat,  
23:37regardless of whether or not it  can be countered to a higher level.
23:42Right, okay. So I guess the question there becomes  what is the blueprint of future AI systems that  
23:50will be capable of reasoning and planning, will  understand how the world works, will be able to  
23:58use tools and have agency and things like  that? Right? And I tell you they will not  
24:04be autoregressive LLMs. So the problems that we  see at the moment of autoregressive LLMs are the  
24:11fact that they hallucinate, they sometimes say  really stupid things, they don't really have  
24:17a good understanding of the world. People claim  that they have some simple world model, but it's  
24:22very implicit and it's really not good at all.  For example, you can tell an LLM that A is the  
24:29same as B and then you ask if B is the same as A  and it will say I don't know or no, right? I mean,  
24:36those things don't really understand logic or  anything like that, right? So the type of system  
24:44that we're talking about that might approach animal level intelligence, let alone human level intelligence,  
24:52have not been designed. They don't exist, and so  discussing their danger and their potential harm  
25:00is a bit like discussing the sex of angels at  the moment, or, to be a little more accurate,  
25:08perhaps it would be kind of like discussing  how we're going to make transatlantic flight at  
25:14near the speed of sound safe when we haven't yet  invented the turbojet in 1925. We can speculate,  
25:24but how did we make a turbojet safe? It required  decades of really careful engineering to make  
25:33them incredibly reliable and now we can run  like halfway around the world with a two-engine  
25:43turbojet aircraft. I mean, that's an incredible  feat. And it's not like people were discussing  
25:51sort of philosophical questions about how you  make turbojet safe. It's just really careful  
25:55and complicated engineering that no one none of us  would understand. So you know, how can we ask the  
26:07AI community now to explain how AI systems are  going to be safe? We haven't invented them yet,  
26:12yeah, okay. That said, I have some idea about  how we can design them so that they have these  
26:19capabilities and, as a consequence, how they will  be safe. I call this objective-driven AI, so what  
26:27that means is essentially systems that produce  their answer by planning their answer so as to  
26:36satisfy an objective or a set of objectives. So  this is very different from current LLMs. Current  
26:42LLMs just produce one word after the other, or  one token which is a transferable unit. It doesn't  
26:47matter. They don't really think and plan ahead.  As we said before, they just produce one word  
26:52after the other. That's not controllable. The only  thing we can do is see if what they've produced,  
26:59check if what they've produced satisfies some  criterion or set of criteria, and then not  
27:05produce an answer or produce a non-answer if the  answer that was produced isn't appropriate. But we  
27:13can't really force them to produce an answer that  satisfies a set of objectives. So objective-driven  
27:21AI is the opposite. The only thing that the  system can produce are answers that satisfy  
27:29a certain number of objectives. So what objective  would be? Did you answer the question? Another  
27:35objective could be, Is your answer understandable  by a 13-year-old, because you're talking to a  
27:4013-year-old? Another would be is this, I don't know,  terrorist propaganda or something? You can have a  
27:49number of criteria, like these, guardrails that  would guarantee that the answer that's produced  
27:55satisfies certain criteria, whatever they are?  Same for a robot, you could guarantee that the  
28:00sequence of actions that is produced will not  hurt anyone. Like you can have very low level  
28:06guardrails of this type that say okay, you have  humans nearby and you're cooking, so you have a  
28:12big knife in your hand, don't flair your arms,  okay, that would be a very simple guardrail to  
28:17impose, and you can imagine having a whole bunch  of guardrails like this that will guarantee that  
28:22the behavior of those systems would be safe and  that their primary goal would be to be basically  
28:30subservient to us. So I do not believe that we'll  have AI systems that can work, that will not be  
28:39subservient to us, will define their own goals - they will define their own sub-goals, but those  
28:44sub-goals would be sub-goals of goals that we set  them, and will not have all kinds of guardrails  
28:50that will guarantee their safety. And we're not  going to, It's not like we're going to invent  
28:54a system and make a gigantic one that we know will  have human-level AI and just turn it on and then,  
28:59from the next minute, is going to take over the  world. That's completely preposterous. What we're  
29:04going to do is try with small ones, maybe as  smart as a mouse or something, maybe a dog,  
29:09maybe a cat, maybe a dog maybe and work our way up  and then put some more guardrails, basically like  
29:16we've engineered more and more powerful and more  reliable turbojets. It's an engineering problem.
29:22Yeah, yeah, you were also on a paper.  Maybe this is the one that talked about  
29:29the embodied Turing test on neuro-AI.  Can you explain what neuro-AI is?
29:40Okay. Well, it's the idea that we should get  some inspiration from neuroscience to build  
29:48AI systems and that there is something to be  learned from neuroscience and from cognitive  
29:57science to drive the design of AI systems.  Some inspiration, something to be learned,  
30:05as well as the other way around. What's  interesting right now is that the best models  
30:09that we have of how, for example, the visual  cortex works is convolutional neural networks,  
30:16which are also the models that we use to recognize  images, primarily in artificial systems. There is  
30:24information being exchanged both ways. One way  to make progress in AI is to ignore nature and  
30:35just try to solve problems in an engineering  fashion, if you want. I found interaction with  
30:45neuroscience always thought-provoking. You  don't want to be copying nature too closely,  
30:52because there are details in nature that are  irrelevant and there are principles on which  
30:59natural intelligence is based that we haven't  discovered. But there is some inspiration to have,  
31:04certainly convolutional nets were  inspired by the architecture of the visual  
31:08cortex. The whole idea of neural nets and deep  learning came out of the idea that intelligence  
31:14can emerge from a large collection of simple  elements that are connected with each other and  
31:19change the nature of their interactions. That's  the whole idea. Inspiration from neuroscience  
31:27has been extremely beneficial so far, and the  idea of neural AI is that you should keep going.  
31:34You don't want to go too far. Going too far, for  example, is trying to reproduce some aspect of the  
31:41functioning of neurons with electronics. I'm not  sure that's a good idea. I'm skeptical about this,
31:49for example. So your research right  now, are you, your main focus is  
31:57on furthering the JEPA architecture into  other modalities, or where are you headed?
32:06Yeah, so, I mean, the long term goal is, you know,  to get machines to be as intelligent and learn  
32:14as efficiently as animals and humans. Okay, and  the reason for this is that we need this because  
32:19we need to amplify human intelligence, and so  intelligence is the most needed commodity that  
32:25we want in the world, right? And so we could,  you know, possibly bring a new renaissance to  
32:32humanity if we could amplify human intelligence  using machines, which we are already doing with  
32:37computers, right, I mean, that's pretty much  what they've been designed to do. But even more,  
32:42you know, imagine a future where every one of us  has an intelligent assistant with us at all times.  
32:53They can be smarter than us. We shouldn't feel  threatened by that. We should feel like we are,  
32:59like, you know, a director of a big lab or a CEO  of a company that has a staff working for them of  
33:06people who are smarter than themselves. I mean,  we're used to this already. I'm used to this,  
33:10certainly working with people who are smarter  than me. So we shouldn't feel threatened by this,  
33:15but it's going to empower a lot of us, right,  and humanity as a whole. So I think that's a  
33:23good thing. That's the overall practical goal. If  you want right. Then there's a scientific question  
33:28that's behind this, which is really what is  intelligence and how do you build it? And  
33:33then which is you know, how can a system learn  the way animals and humans seem to be learning  
33:38so efficiently? And the next thing is, how do  we learn how the world works? By observation,  
33:45by watching the world go by, through vision  and all the other senses. And animals can do  
33:52this without language, right? So it has nothing  to do with language. It has to do with learning  
33:57from sensory percepts and learning mostly  without acting, because any action you take  
34:03can kill you. So it's better to be able to learn  as much as you can without actually acting at all,  
34:08just observing, which is what babies do in the  first few months of life. They can't hardly do  
34:13anything, right? So they mostly observe and  learn how the world works by observation.  
34:18So what kind of learning takes place there?  So that's obviously kind of self-supervised,  
34:23right, it's learning by prediction. That's an old  idea from cognitive science, and the thing is,  
34:30you know, we can learn to predict videos.  But then we noticed that predicting videos,  
34:34predicting pixels in video, is so infinitely  complicated that it doesn't work. And  
34:39so then came this idea of JEPA right. Learn  representations so that you can make predictions  
34:44in representation space, and that turned out to  work really well for learning image features,  
34:50and now we're working on getting this to work for  video and eventually we'll be able to use this to  
34:56learn world models where you show a piece of video  and then you say I'm going to take this action,  
35:03predict what's going to happen next in the  world and you know, which is a bit where the  
35:10Gaia system from Wayve is doing at a high level. But  we need this at various levels of abstraction so  
35:16that we can build, you know, systems that  are more general than autonomous driving.
35:23Okay, that's the yeah, and it's my  fault so I won't go over the hour,  
35:33but is it conceivable that someday  there will be a model that you,   
35:44maybe embodied in a robot that is ingesting  video from its environment and learning,  
35:54as it's just continuously learning and  getting smarter, and smarter, and smarter?
36:01Yeah, I mean, that's kind of a bit of a  necessity, the reason being that you know,  
36:09even if you train a system to have a world model  that can predict what's going to happen next.  
36:13The world is really complicated and there's  probably all kinds of situations that you,  
36:17you know the system hasn't been  trained on and needs to, you know,  
36:21fine tune itself as it goes. So you know, animals  and humans do this early in life by playing. So  
36:32play is a way of learning your world model in  situations that basically you won't hurt you,  
36:42and but then during life, of course, you  know, when we learn to drive, there's all  
36:46kinds of these mistakes that we do initially,  that we don't do after having some experience,  
36:52and that's because we're fine tuning our world  model to some extent. Yeah, learning a new task,  
36:58we're basically just learning a new version of  our world model, right? So, yeah, I mean, this  
37:04type of continuous, continual learning is going  to have to be present, but the overall power and  
37:10intelligence of the system will be limited by, you  know, how much a computer neural nets it is  
37:15using and various other constraints. You  know, computational constraints basically.
37:20You know you're still young and  this. I'm not sure about that. Well,  
37:28you're younger than Geoff. Let me put it that way.
37:30I'm younger than Geoff, I'm  older than Yoshua, yeah.
37:35But this, the progress you've made on world  models is fairly rapid from my point of view  
37:42watching it. Are you hopeful that within your  career you'll have embodied robots that are  
37:55building world models through their interaction  in reality and then being able to? Well, I guess  
38:01the other question on world models: do you then  combine it with a language model to do reasoning,  
38:11or is the world model able to do reasoning on  its own? But are you hopeful that in your career  
38:17you'll get to the point where you'll have  this continuous learning in a world model?
38:22Yeah, I sure hope so. I might have another, you  know, 10 useful years or something like this in  
38:28research before my brain, you know, turns into  bechamel sauce, but, or something like that.  
38:36You know 15 years, if I'm lucky, so, or perhaps  less, but yeah, I hope that there's going to be  
38:44breakthroughs in that direction during that  time. Now, whether that will result in the  
38:50kind of artifact that you're describing you know  robots that can Like you know domestic robots, for  
38:57example, or self-driving cars that can learn fairly  quickly by themselves, I don't know, because there  
39:04might be all kinds of obstacles that we have not  envisaged that may appear on the way. No, it's a  
39:12constant in the history of AI that you have some  new idea and a breakthrough and you think that's  
39:19going to solve all the world's problems, and then  you're going to hit a limitation and you have  
39:25to go beyond that limitation. So it's like you  know you're climbing a mountain. You find a way  
39:30to climb the mountain that you're seeing and you  know that once you get to the top you will have  
39:36the problem solved, because now it's, you know,  a gentle slope down and once you get to the top,  
39:42you realize that there is another mountain  behind it that you hadn't seen. So that's been  
39:48the history of AI right, where people have come up  with sort of new concepts, new ideas, new ways to  
39:55approach AI, reasoning, whatever perception, and  then realize that their idea basically was very  
40:04limited. And so you know this. Inevitably we're  trying to figure out what's the next revolution in  
40:17AI. That's what I'm trying to figure out, and so  you know, learning how the world works from video,  
40:22having systems that have world models, allow  systems to reason and plan. And there's something  
40:29I want to be very clear about, which is an answer  to your question, which is that you can have  
40:37systems that reason and plan without manipulating  language. Animals are capable of amazing feats of  
40:44planning and also, to some extent, reasoning. They  don't have language, at least most of them don't  
40:52and so many of them don't have culture because  they are mostly solitary animals. So you know,  
41:01it's only the animals that have some level of  culture. So the idea that a system can plan and  
41:10reason is not connected with the idea that you  can manipulate language. Those are two different  
41:16things. It needs to be able to manipulate abstract  notions, but those notions do not necessarily  
41:23correspond to linguistic entities like words or  things like that. We can have mental images. If  
41:29you want to think like you do, ask a physicist or  mathematician you know how their reason is very  
41:35much in terms of sort of mental models, nothing  to do with language Then you can turn things into,  
41:40into language. But that's a different story.  That's the second, second step. So you know,  
41:49we're going to have to figure out how to do this.  Reasoning, hierarchical planning in machines  
41:55reproduce this first and then, of course, you  know, sticking language on top of it will help,  
42:01like it will make those systems smarter and be  able you know, it will allow us to communicate  
42:05with them and teach them things and they're going  to be able to teach us things and stuff like that.  
42:10But this is a different question really,  the question of how we organize AI research  
42:16going forward, which is somewhat determined by how  afraid people are of the consequences of AI. So if  
42:21you have a rather positive view of the impact of  AI on society and you trust humanity and society  
42:28and democracy is to use it in good ways, then the  best way to make progress is to open research and  
42:37for the people who are afraid of the consequences,  whether they are societal or geopolitical,  
42:44they're putting pressure on governments around  the world to regulate AI in ways that basically  
42:49limit access, particularly of open source code  and things like that, and it's a big debate at  
42:57the moment. I'm very much on the side, so is  Meta, very much on the side of open research.
43:03Yeah, actually that was something I was going  to ask you and now that you brought it up,  
43:09because I've been talking to people about this and  there is a view that, aside from the risks of open  
43:18source again Geoff Hinton saying, would you open  source thermonuclear weapons? Aside from that,  
43:28the question is whether open source can marshal  the resources to compete with proprietary models  
43:40and because of the tremendous resources required  for when you're scaling these models. And there's  
43:49a question as to whether or not Meta will  continue to open source future versions  
43:55of Llama or not continue to open source,  but whether it'll continue to invest the  
44:02resources needed to push the open source  models. So what do you think about that?
44:11Okay, there's a lot to say about this, Okay. So  first thing is there's no question that Meta will  
44:16continue to invest the resources to build better  and better AI systems because it needs it for its  
44:21own products. So the resources will be invested.  Now the next question is will we continue to open  
44:30source the base models? And the answer is probably  yes, because that creates an ecosystem on top  
44:37of which an entire industry can be built, and  there is no point having 50 different companies  
44:44building proprietary, closed systems when you  can have one good open source base model that  
44:52everybody can use. It's wasteful and it's not  a good idea. And another reason for having open  
45:00source models is that nobody has, no entity as  powerful as it thinks it is, has a monopoly  
45:09on good ideas. And so if you want people we can  have good, new, innovative ideas to contribute,  
45:15you need an open source platform. If you want  the academic world to contribute, you need  
45:19open source platforms. If you want the startup  world to be able to build customized products,  
45:24you need open source base models, because  they don't have the resources to build, to  
45:28train large models. And then there is the history  that shows that for foundational technology, for  
45:39infrastructure type technology, open source always  wins. It's true of the software infrastructure of  
45:50the internet. In the early 90s and mid 90s there  was a big battle between Sun Microsystems and  
45:55Microsoft to deliver the software infrastructure  of the internet - operating systems, web servers,  
46:05web browsers and various server side and client  side frameworks. They both lost. Nobody is talking  
46:12about them anymore. The entire world of the web is  using Linux and Apache and MySQL and JavaScript,  
46:24and even the basic core code for web browsers is  open source. So open source won by a huge margin.  
46:35Why? Because it's safer, it gathers more people  to contribute. All the features are unnecessary,  
46:42it's more reliable, Vulnerabilities are fixed  faster and it's customizable. So anybody  
46:51can customize Linux to run on whatever  hardware they want. So open source wins.
46:57But it's the same thing.
47:00It's going to be the same thing. It's inevitable.  The people now who are climbing up like OpenAI,  
47:08their system is based on publications  from all of us and from open platforms  
47:17like PyTorch. ChatGPT is built using PyTorch.  PyTorch was produced originally by Meta. Now  
47:22it's owned by the Linux Foundation. It's open  source. They've contributed to it, by the way,  
47:29their LLM is based on transformer architectures  invented at Google. All the tricks to train,  
47:36all those things came out of various papers  from all kinds of different institutions,  
47:41including academia. All the fine-tuning techniques  are the same. So nobody works in a vacuum. The  
47:48thing is, nobody can keep their advance and their  advantage for very long if they are secretive.
47:57Yeah, except that with these models, because  they're so compute intensive and they cost so  
48:02much money to train, you need somebody  like Meta who's going to be willing to  
48:09build them and open source them. That's why,  when I was asking whether they'll continue,  
48:17obviously Meta will continue  building resource intensive models,  
48:24but the question is whether they'll  continue to open source them.
48:30I'm telling you the only reason why Meta could  stop open sourcing models is legal. So if  
48:38there is a law that outlaws open source AI  systems above a certain level of sophistication,  
48:46then of course we can't do it. If there are  laws that, in the US or across the world,  
48:55make it illegal to use public content to train AI  systems, then it's the end of AI for everybody,  
49:03not just for open source, or at least the end of  the type of AI that we are talking about today.  
49:09We might have new AI in the future, but that  doesn't require as much data. And then there  
49:15is liability. If you believe that someone is  doing something bad with an AI system that was  
49:28open sourced by Meta, then Meta is liable. Then  Meta will have a big incentive not to release it,  
49:35obviously. So the entire question about this is  around legal reasons and political decisions.
49:41But on the idea of open source winning, don't  you need more people or more companies like  
49:47Metal building the foundation models  and open sourcing them? Or could it be  
49:53an open source ecosystem win based on  a single company building the models?
50:00No, I mean you need two or three, and  there are two or three, right. I mean,  
50:03there is this Hugging Face. There is Mistral  in France, who is also embracing open source LLM  
50:11they're very good LLM. It's a small one, but it's  very good. There are academic efforts like LAION.  
50:20They don't have all the resources they need, but  they collect the data that is used by everyone,  
50:24so everybody can contribute. One thing that  I think is really important to understand  
50:28also is that there is a future in which I  described earlier, in which every one of us,  
50:35every one of our interactions with the digital  world, would be mediated by an AI assistant,  
50:41and this is going to be true for everyone around  the world, right? Everyone who has any kind of  
50:46smart device. Eventually, it's going to be in our  augmented reality glasses, but for the time being,  
50:52in our smartphones. And so imagine that future  where you are, I don't know, from Indonesia or  
51:06Senegal or France and your entire digital diet  is done through the mediation of an AI system.  
51:19Your government is not going to be happy about it.  Your government is going to want the local culture  
51:24to be present in that system. It doesn't want  that system to be closed sourced and controlled  
51:30by a company on the west coast of the US. So  just for reasons of preserving the diversity  
51:40of culture across the world and not having our  entire information diet being biased by whatever  
51:47it is that some company on the west coast of the  US thinks there's going to need to be open source  
51:53platforms and they're going to be predominant in  at least outside the US for that reason, including  
52:03China. There is all those talks about oh what if  China puts their hand on our open source code? I  
52:08mean, China wants control over its own LLM because  they're a citizen to have access to certain types  
52:15of information. So they're not going to use our  LLMs, they're going to train theirs. That they  
52:20already have. And nobody is particularly ahead  of anybody else by more than about a year.
52:28And China is pushing open source. I mean, they're  very pro open source within their ecosystems.
52:36Some of them. There is no unified opinion there,  but I mean it's the same in the West, right,  
52:44there are some governments that are too afraid  of the risks and then, or are thinking about it,  
52:51and some others that are all for open source  because they see this as the only way for  
52:56them to have any influence on the information,  the type of information and culture that would  
53:03be mediated by those systems. So it's going to  have to be like Wikipedia, right? Wikipedia is built  
53:14by millions of people who contribute to it from  all around the world, in all kinds of languages,  
53:20and it has a system for vetting the information.  The way AI systems of the future will be taught  
53:26and will be fine tuned will have to be the  same way. It will have to be crowd sourced,  
53:31because something that matters to a farmer  in Southern India is probably not going to  
53:39be taken into account by the fine tuning done  by some company on the West Coast of the US.
53:46AI might be the most important new computer  technology ever. It's storming every industry  
53:52and literally billions of dollars are being  invested, so buckle up. The problem is that  
53:57AI needs a lot of speed and processing power.  So how do you compete without cost spiraling  
54:04out of control? It's time to upgrade to  the next generation of the cloud Oracle  
54:10Cloud Infrastructure, or OCI. OCI is a single  platform for your infrastructure, database,  
54:18application development and AI needs. OCI has  four to eight times the bandwidth of other clouds,  
54:27offers one consistent price instead of  variable regional pricing. And, of course,  
54:32nobody does data better than Oracle. So now you  can train your AI models at twice the speed and  
54:39less than half the cost of other clouds. If  you want to do more and spend less, like Uber,  
54:478x8 and Databricks Mosaic, take a free test drive  of OCI at oracle.com/eyeonai. That's E-Y-E-O-N-A-I  
55:00all run together: oracle.com/eyeonai.    
55:06That's it for this episode. I want to thank Yann  for his time. If you want to read a transcript  
55:12of this conversation, you can find one on our  website eye-on.ai, that's eye-on.ai. And remember  
55:22the singularity may not be near, but AI is  changing your world, so best pay attention.