Yann LeCun on World Models, AI Threats and Open-Sourcing | Eye On AI #150
#AI Evolution#Oracle Cloud Infrastructure#Yann LeCun#Embodied Turing Test#World Model AI#Neuro-AI#AI Energy Consumption#Open Source AI#Augmented LLMs#AI Threat Debate#ConvNets
9.6K Visualizzazioni|1 Riepilogato|2 anno fa
💫 Riepilogo
인공지능 연구에서의 오픈 리서치와 AI의 위험성에 대한 Yann LeCun의 견해 및 세계 모델 훈련에 대한 이야기.
✦
AI 연구에서의 오픈 소싱은 사회에 긍정적인 영향을 믿고 인간과 사회 그리고 민주주의를 신뢰할 때 가장 진보된 방향으로 나아갈 수 있다.
00:00시스템이 예측할 수 있는 세계 모델을 가지더라도, 복잡한 상황들을 다루기 위해서는 시스템이 자체적으로 조정해야 한다.
AI 연구의 조직 방식은 AI의 결과에 대한 사람들의 두려움에 따라 결정된다.
AI는 다른 산업에도 큰 영향을 미치고 있으며, 많은 투자가 이루어지고 있다.
✦
이미지와 비디오의 맥락에서 작업을 할 때, 픽셀을 예측하려는 것은 좋지 않으며, 이미지와 비디오로부터 학습하는 것은 텍스트로부터 학습하는 것과 근본적으로 다르다.
06:57픽셀을 예측하는 것은 좋은 표현을 얻지 못하고 흐릿한 예측을 얻는다.
텍스트는 이산적이기 때문에 단어를 예측하기 쉽다.
JEPA 아키텍처를 사용하여 언어에도 적용할 수 있다.
모델을 크게 만들면 더 잘 작동하는 경향이 있다.
✦
Yann LeCun은 AI 시스템이 현재로서는 악의적인 사람들에게 큰 도움을 제공하지 못한다고 말하며, AI 시스템이 공개적으로 사용 가능한 데이터를 사용하고 있다는 이유를 제시한다.
13:54Geoff와 Yoshua는 강력한 추론 능력과 도구 사용 능력을 가진 언어 모델을 우려하고 있다.
AI 시스템은 현재로서는 악의적인 사람들에게 유의미한 도움을 제공하지 못한다.
AI 시스템은 공개적으로 사용 가능한 데이터를 사용하여 훈련되며, 새로운 것을 창조할 수 없다.
✦
인공지능 시스템은 우리에게 복종적이며 안전을 보장하는 가이드레일을 갖게 될 것이다.
27:49인공지능 시스템은 우리가 설정한 목표의 하위 목표를 정의할 수 있지만, 우리에게 복종적일 것이다.
작은 규모의 인공지능 시스템부터 시작하여 점진적으로 발전시킬 것이다.
뉴로인공지능은 뇌과학에서 영감을 받아 구축되는 아이디어이다.
✦
비디오를 통해 지속적으로 학습하고 더 똑똑해지는 로봇이나 모델이 언젠가는 가능할 것이다.
34:50비디오를 통해 환경을 학습하는 모델이 필요하다.
복잡한 세상에서 fine tuning이 필요하다.
지속적인 학습이 필요하다.
✦
인공지능 연구의 방향성은 AI의 결과에 대한 사람들의 두려움에 따라 결정될 것이다.
41:49긍정적인 시각을 가진 사람들은 연구를 공개하는 것이 진전을 이루는 가장 좋은 방법이라고 생각한다.
그러나 일부 사람들은 AI의 결과에 대해 불안감을 가지고 있어 규제를 요구하고 있다.
오픈 소스 코드의 접근을 제한하는 것은 큰 논쟁거리이다.
✦
AI 시스템을 훈련시키기 위해 공공 콘텐츠 사용이 불법이 되면, 오픈 소스나 다른 형태의 AI 개발이 어려워질 수 있음.
48:46법적 이유와 정치적 결정에 따라 AI 개발이 제한될 수 있음.
Hugging Face, Mistral, LAION과 같은 기업과 학문적 노력들이 오픈 소스 AI 개발에 참여하고 있음.
미래에는 모든 사람들의 디지털 세계와의 상호작용이 AI 어시스턴트를 통해 이루어질 것으로 예상됨.
00:00Even if you train a system to have a world model
that can predict what's going to happen next,
00:03the world is really complicated and there's
probably all kinds of situations that the system
00:07hasn't been trained on and need to, you know,
fine-tune itself as it goes. The question of how
00:13we organize AI research going forward, which is
somewhat determined by how afraid people are of
00:18the consequences of AI. So if you have a rather
positive view of the impact of AI on society and
00:24you trust humanity and society and democracies
to use it in good ways, then the best way to make
00:29progress is to open research.
00:31AI might be the most important new computer
technology ever. It's storming every industry
00:37and literally billions of dollars are being
invested, so buckle up. The problem is that
00:43AI needs a lot of speed and processing power.
So how do you compete without cost spiraling
00:49out of control? It's time to upgrade to
the next generation of the cloud Oracle
00:56Cloud Infrastructure, or OCI. OCI is a single
platform for your infrastructure, database,
01:03application development and AI needs. OCI has
four to eight times the bandwidth of other clouds,
01:12offers one consistent price instead of
variable regional pricing and, of course,
01:18nobody does data better than Oracle. So now you
can train your AI models at twice the speed and
01:25less than half the cost of other clouds. If you
want to do more and spend less, like Uber 8x8,
01:33and Databricks Mosaic, take a free test drive of
OCI at oracle.com/eyeonai. That's E-Y-E-O-N-A-I
01:46all run together oracle.com/eyeonai.
01:52Hi, I'm Craig Smith. This is Eye on AI. In
this episode, I speak again with Yann LeCun,
01:59one of the founders of deep learning and someone
who followers of AI should need no introduction
02:05to. Yann talks about his work on developing world
models, on why he does not believe AI research
02:13poses a threat to humanity and why he thinks open
source AI models are the future. In the course of
02:21the conversation we talk about a new model Gaia
I, developed by a company called Wayve.AI. I'll
02:29have an episode with Wayve's founder to further
explore that world model, which has produced
02:36some startling results. I hope you find the
conversation with Yann as enlightening as I did.
02:43I mean, first, the notion of a world model
is the idea that the system would get some
02:48idea of the state of the world and be able to
predict the sort of following states of the
02:54world resulting from just the natural evolution
of the world or resulting from an action that
02:58the agent might take. If you have an idea
of the state of the world and you imagine
03:04an action that you're going to take and you
can predict the resulting state of the world,
03:11then that means you can predict what's going to
happen as a consequence of a sequence of actions.
03:14That means you can plan a sequence of actions to
arrive at a particular goal. That's really what
03:20a world model is. At least that's what the Wayve
people have understood the word in other contexts,
03:28like in the context of optimal control
and robotics and things like that. That's
03:34what a world model is. Now there's several
levels of complexity of those world models,
03:38whether they model yourself, the agent, or whether
they model the external world, which is much more
03:45complicated. Training a world model basically
consists in just observing the world go by and
03:56then learning to predict what's going to happen
next, or observing the world taking an action and
04:01then observing the resulting effect, an action that
you take as an agent or an action that you see
04:08other agents taking. That establishes causality.
Essentially, you could think of this as a causal
04:16model. Those models don't need to predict all the
details about the world, they don't need to be
04:24generative, they don't need to predict exactly
every pixels in a video, for example, because
04:31what you need to be able to predict is enough
details, some sort of abstract representation,
04:37to allow you to plan. You're assembling something
out of wood and you're going to put two planks
04:49together and attach them with screws. It doesn't
matter the details of which type of screwdriver
04:56you're using or the size of the screw within some
limits and things like that. There are details
05:01that in the end don't matter as to what the end
result will be or the precise grain of the wood
05:08and things of that type. You need to have some
abstract level of representation within which you
05:14can make the prediction without having to predict
every detail. That's why those JEPA architectures
05:21I've been advocating are useful. Models like the
Gaia 1 model from Wayve actually make predictions
05:29in an abstract representation space. There's been
a lot of work in that area for years, also at
05:34FAIR (Facebook AI Research), but generally the abstract representations were pre-trained. So, the encoders that would take
05:41images from videos and then encode them into some
representation were trained in some other way. The
05:47progress we've made over the last six months in
self-supervised learning for images and video
05:53is that now we can train the entire system to make
those predictions simultaneously. We have systems
06:00now that can learn good representations of images.
The basic idea is very simple. You take an image,
06:10you run it through an encoder, then you
corrupt that image. You mask parts of it,
06:17for example, or you transform it in various ways.
You blur it, you change the colors, you change the
06:24framing a little bit and you run that corrupted
image through the same encoder or something
06:29very similar, and then you train the encoder to
predict the features of the complete image from
06:36the features of the corrupted one. You're not
trying to reconstruct the perfect image, you're
06:47just trying to predict the representation of it,
and this is different. This is not generative in
06:52the sense that it does not produce pixels, and
that's the secret to getting self-supervised learning to
06:57work in the context of images and video. You don't
want to be predicting pixels. It doesn't work. You
07:04can produce pixels as an afterthought, which
is what the Gaia system is doing by sticking a
07:08decoder on it and with some diffusion models that
will produce a nice image. But that's kind of a
07:13second step. If you train the system by predicting
pixels, you just don't get good representations,
07:19you don't get good predictions, you get
blurry predictions most of the time. So
07:24that's what makes learning from images and video
fundamentally different from learning from text,
07:31because in text you don't have that problem. It's
easy to predict words, even if you cannot make a
07:37perfect prediction, because language is discrete.
So language is simple compared to the real world.
07:43And there's a lot written right
now about the energy required
07:52in the computational resources, the
GPUs required to train language
07:59models. Is it less in training a world
model, like using I-JEPA architecture?
08:07Well, it's hard to tell because there
is no equivalent training procedure,
08:13self-supervised training procedure for video,
08:15for example, that does not use JEPA. The
ones that are generative don't really work.
08:22Yeah, yeah Well, but this architecture could
also be applied to language, couldn't it?
08:30Oh yeah, absolutely yeah. So you could
very well use a JEPA architecture that
08:36makes predictions in representation
space and apply it to language.
08:41Yeah, definitely, and in that case
would it be less computationally
08:47intense than training a large
language model. It's possible.
08:54It's not entirely clear either. I mean, there
is some advantage, regardless of what technique
09:00you're using, to making those models really big.
They just seem to work better if you make them
09:05big. So if you make them bigger, right. So scaling
is useful. Contrary to some claims, I do not
09:15believe that scaling is sufficient. So, in other
words, we're not going to get anywhere close to
09:20human level AI, in fact, not even animal level
AI, by simply scaling up language models, even
09:32multi-model language models that we apply to
video. We're going to have to find new concepts,
09:36new architectures and I've written a vision
paper about this a while back of a different
09:44type of architecture that would be necessary for
this. So scaling is necessary, but not sufficient,
09:52and we're missing some basic ingredients to get to
human level AI. We're fooled by the fact that LLMs
10:01are fluent and so we think that they have human
level intelligence because they can manipulate
10:06language, but that's false and in fact, there's a
very good symptom for this, which is that we have
10:17systems that can pass the bar exam, but answering
questions from text by basically regurgitating
10:25what they've learned more or less by rote, but we
don't have completely autonomous level five self
10:33driving cars, or at least no system that can learn
to do this in about 20 hours of practice just like
10:40any 17-year-old, and we certainly don't have any
domestic robot that can clear up the dinner table
10:48and fill up the dishwasher - at task that any
10-year-old can learn in one shot. So clearly
10:54we're missing something big, and that something
is an ability to learn how the world works and
10:59the world is much more complicated than language
and also being able to plan and reason. Basically
11:06having a mental world model of what goes on
that allows us to plan and predict consequences
11:12of actions. That's what we're missing. And it's
going to take a while before we figure this out.
11:18You were on another paper that talked
about augmented language models and the
11:30embodied Turing test. Was that the same
paper? The embodied Turing test? Can you
11:36talk about that? First of all, what is
the embodied Turing test? I didn't.
11:41I didn't quite understand that. Well,
okay, it's, it's a different concept,
11:48but it's basically the idea that you it's
based on the, on the, the Moravec paradox,
11:56right? So Moravec many years ago noticed
that things that appeared difficult for
12:03humans turned out to sometimes be very easy
for computers to do, like playing chess,
12:09much better than humans, or, I don't
know, computing integrals or whatever,
12:13certainly doing arithmetics. But then there are
things that we take for granted as humans that we
12:19don't even consider them intelligent tasks that we
are incapable of reproducing with computers. And
12:25so that's where the embodied Turing test comes
in. Like you know, observe what a cat can do,
12:30or how fast a cat can learn new, new, new tricks.
Or you know how a cat can plan to jump on,
12:39you know, a bunch of different furniture to get
to the top of wherever it wants to go. That's an
12:45amazing feat that we can't reproduce with robots
today. So that's kind of the embodied Turing test,
12:52if you want, like you know, can you make a
robot that can behave, have behaviors that are
13:00indistinguishable from those of animals first of
all, and can acquire new ones, at the same, with the
13:06same efficiency as animals. Then the augmented LLM
paper is different. It's about how do you sort of
13:15minimally change large language models so that
they can use tools, so they can to some extent
13:22plan actions? Like you know, you need to compute
the product of two numbers, right, you just call
13:27a calculator and you know you're going to get
the product of those two numbers. And LLMs are
13:31notoriously bad for arithmetics, so they need to do
this kind of stuff or do a search, you know, using
13:37a search engine or database look up, or something
like that. So there's a lot of work on this right
13:42now and it's somewhat incremental, like you know.
How can you sort of minimally change LLM and
13:46take advantage of their current capabilities but
still augment them with the ability to use tools?
13:54Yeah, and I don't want to get too much into
the threat debate, but you know you're on
14:02one side. Your colleagues Geoff and Yoshua are
on the other. I recently saw a picture of the
14:09three of you. I think you put that up on social
media saying how you know you can disagree but
14:17still be friends. This idea of augmenting
language models with stronger reasoning
14:27capabilities and the ability and agency, the
ability to use tools, is precisely what Geoff
14:34and Yoshua are worried about. Can you just
get into why are you not worried about that?
14:43Okay, so first, first of all, this is
not necessarily what you're describing,
14:49is not necessarily what they are afraid
of. They, they, they are alerting people
14:57and various governments and others about various
dangers that they perceive. Okay, so one danger,
15:03one set of dangers are relatively short term.
There are things like, you know, bad people will
15:09use technology for bad things. What can bad people
use powerful AI systems for? And one concern
15:16that you know governments have been worried about
and Intelligence agencies encounter intelligence
15:24and stuff like that is, you know, could badly
intentioned organizations or countries use LLMs
15:31to help them, I don't know, design pathogens
or chemical weapons or other things, or cyb, or
15:38cyber attacks? You know things like that. Right
Now, those problems are not new. Those problems
15:43have been with us for a long time and the question
is what incremental help would AI systems bring to
15:51the table? So my opinion is that, as of today, AI
systems are not sophisticated enough to provide
16:00any significant help for such badly intentioned
people, because those systems are trained with
16:07public data that is publicly available on the
internet and they can't really invent anything.
16:11They're going to regurgitate with a little bit of
interpolation if you want, but they cannot produce
16:19anything that you can't get from a search engine
in a few minutes, so that actually that claim
16:27is being tested at the moment. There are people
who are actually kind of trying to figure it out,
16:31like is it the case that you can actually do
something - You're able to do something more
16:36dangerous with the sort of current AI technology
that you can't do with a search engine? Results
16:42are not out yet, but my hunch is that you know
it's not going to enable a lot of people to do
16:49significantly bad things. Then there is the issue
of things like code generation for cyber, cyber
16:55attacks and things like this, and those problems
have been with us for years. And the interesting
17:00thing that most people should know, like you know
also, for disinformation or attempts to corrupt the
17:06electoral process and things like this, and what's
very important for everyone to know is that the
17:12best countermeasures that we have against all of
those attacks currently use AI massively. Okay,
17:19so AI is used as a defense mechanism against
those attacks. It's not actually used to do the
17:26attacks yet, and so now it becomes the question
of you know who has the better system? Like other
17:33countermeasures, is the AI used by counter, by
the countermeasures, but significantly better
17:39than the AI used by the attackers so that, you know,
17:43the problem is satisfactorily mitigated, and
that's where we are. Now, the good news is that
17:50there are many more good guys than bad guys at
the moment. They're usually much more competent,
17:56they're usually much more sophisticated, they're
usually much better funded and they have a strong
18:02incentive to take down the attackers. So it's a
game of cat and mouse, just like every yeah, every
18:10security that's ever existed. There's nothing new
there. Okay, no, nothing quite so new.
18:16Yeah, but okay, but then there is the question
of existential risk, right, and this is something
18:25that both Geoff and Yoshua have been thinking of
fairly recently. So for Geoff, it's only sort of
18:32just before last summer that he became he started
thinking about this because before he thought he
18:38was convinced that the kind of algorithms that
we had were significantly inferior to the kind of
18:43learning algorithm that the brain used, and the
epiphany he had was that, in fact, no, because
18:51looking at the capabilities of large language
models that can do pretty amazing things with a
18:56relatively small number of neurons and synapses,
he said maybe they're more efficient than the
19:00brain and maybe the learning algorithm that we use,
back propagation, is actually better than whatever
19:04it is that the brain uses. So he started thinking
about, like you know what are the consequences,
19:09and but that's very recent and in my opinion he
hasn't thought about this enough. Yoshua went
19:17to a similar epiphany last winter where he started
thinking about the long-term consequences and came
19:27to the conclusion also that there was a potential
danger. They're both convinced that AI has
19:33enormous potential benefits. They're just worried
about the dangers. And they're both worried
19:38about the dangers because they have some doubts
about the ability of our institutions to do the
19:46best with technology Whether they are political,
economic, geopolitical, financial institutions or
19:57industrial to do the right thing, to be motivated
by the right thing. So if you trust the system,
20:08if you trust humanity and democracy, you might be
entitled to believe that society is going to make
20:21the best use of future technology. If you don't
believe in the solidity of those institutions,
20:28then you might be scared. I think I'm more
confident in humanity and democracy than they are,
20:35and whatever current systems than they are. I've
been thinking about this problem for much longer,
20:40actually, since at least 2014. So when I started
FAIR at Facebook at the time, it became pretty
20:49clear pretty early on that deploying AI systems
was going to have big consequences on people and
20:57society, and we got confronted to this very early,
and so I started thinking about those problems
21:02very early on. Things like countermeasures
against bias in AI systems, systematic bias,
21:10countermeasures against attacks, or detection of
hate speech in every language these are things
21:18that people at FAIR (Facebook AI Research) worked on and then were eventually deployed. To just to give you
21:23an example, the proportion of hate speech that
was taken down automatically by AI systems five
21:28years ago, in 2017, was about 20% to 25%. Last
year it was 95%, and the difference is entirely
21:37due to progress in natural language understanding.
Entirely due to transformers that are pretrained,
21:43self-supervised and can essentially detect
hate speech in any language. Not perfectly
21:48Nothing is perfect, it's never perfect. But AI is
just massively there and that's the solution. So
21:54I started thinking about those issues, including
existential risk, very early on, in fact, in 2015,
22:01early 2016, actually, I organized a conference
hosted at NYU on the future of AI where a lot
22:07of those questions were discussed. I invited
people like Nick Bostrom and Eric Schmidt and
22:16Mark Schroepfer, who was the CTO of Facebook
at the time, Demis Hassabis, a lot of people,
22:23both from the academic and AI research side and
from the industry side, and there were two days,
22:29a public day and kind of a more private day.
What came out of this is the creation of an
22:33institution called the Partnership on AI. This is
a discussion I had with Demis Hassabis, which was:
22:41would it be useful to have a forum where we can
discuss, before they happen, sort of bad things
22:46that could happen as a consequence of deploying
AI? Pretty soon, we brought on board Eric Horvitz
22:54and a bunch of other people. We co-founded
this thing called the Partnership on AI, which
22:58basically has been funding studies about AI ethics
and consequences of AI and publishing guidelines
23:09about how you do it right to minimize harm. So
this is not a new thing for me. I've been thinking
23:14about this for 10 years essentially, whereas
for Yoshua and Geoff it's much more recent.
23:20Yeah, but nonetheless, this augmented AI
or augmented language models that have
23:29stronger reasoning and agency raises the threat,
23:37regardless of whether or not it
can be countered to a higher level.
23:42Right, okay. So I guess the question there becomes
what is the blueprint of future AI systems that
23:50will be capable of reasoning and planning, will
understand how the world works, will be able to
23:58use tools and have agency and things like
that? Right? And I tell you they will not
24:04be autoregressive LLMs. So the problems that we
see at the moment of autoregressive LLMs are the
24:11fact that they hallucinate, they sometimes say
really stupid things, they don't really have
24:17a good understanding of the world. People claim
that they have some simple world model, but it's
24:22very implicit and it's really not good at all.
For example, you can tell an LLM that A is the
24:29same as B and then you ask if B is the same as A
and it will say I don't know or no, right? I mean,
24:36those things don't really understand logic or
anything like that, right? So the type of system
24:44that we're talking about that might approach animal level intelligence, let alone human level intelligence,
24:52have not been designed. They don't exist, and so
discussing their danger and their potential harm
25:00is a bit like discussing the sex of angels at
the moment, or, to be a little more accurate,
25:08perhaps it would be kind of like discussing
how we're going to make transatlantic flight at
25:14near the speed of sound safe when we haven't yet
invented the turbojet in 1925. We can speculate,
25:24but how did we make a turbojet safe? It required
decades of really careful engineering to make
25:33them incredibly reliable and now we can run
like halfway around the world with a two-engine
25:43turbojet aircraft. I mean, that's an incredible
feat. And it's not like people were discussing
25:51sort of philosophical questions about how you
make turbojet safe. It's just really careful
25:55and complicated engineering that no one none of us
would understand. So you know, how can we ask the
26:07AI community now to explain how AI systems are
going to be safe? We haven't invented them yet,
26:12yeah, okay. That said, I have some idea about
how we can design them so that they have these
26:19capabilities and, as a consequence, how they will
be safe. I call this objective-driven AI, so what
26:27that means is essentially systems that produce
their answer by planning their answer so as to
26:36satisfy an objective or a set of objectives. So
this is very different from current LLMs. Current
26:42LLMs just produce one word after the other, or
one token which is a transferable unit. It doesn't
26:47matter. They don't really think and plan ahead.
As we said before, they just produce one word
26:52after the other. That's not controllable. The only
thing we can do is see if what they've produced,
26:59check if what they've produced satisfies some
criterion or set of criteria, and then not
27:05produce an answer or produce a non-answer if the
answer that was produced isn't appropriate. But we
27:13can't really force them to produce an answer that
satisfies a set of objectives. So objective-driven
27:21AI is the opposite. The only thing that the
system can produce are answers that satisfy
27:29a certain number of objectives. So what objective
would be? Did you answer the question? Another
27:35objective could be, Is your answer understandable
by a 13-year-old, because you're talking to a
27:4013-year-old? Another would be is this, I don't know,
terrorist propaganda or something? You can have a
27:49number of criteria, like these, guardrails that
would guarantee that the answer that's produced
27:55satisfies certain criteria, whatever they are?
Same for a robot, you could guarantee that the
28:00sequence of actions that is produced will not
hurt anyone. Like you can have very low level
28:06guardrails of this type that say okay, you have
humans nearby and you're cooking, so you have a
28:12big knife in your hand, don't flair your arms,
okay, that would be a very simple guardrail to
28:17impose, and you can imagine having a whole bunch
of guardrails like this that will guarantee that
28:22the behavior of those systems would be safe and
that their primary goal would be to be basically
28:30subservient to us. So I do not believe that we'll
have AI systems that can work, that will not be
28:39subservient to us, will define their own goals -
they will define their own sub-goals, but those
28:44sub-goals would be sub-goals of goals that we set
them, and will not have all kinds of guardrails
28:50that will guarantee their safety. And we're not
going to, It's not like we're going to invent
28:54a system and make a gigantic one that we know will
have human-level AI and just turn it on and then,
28:59from the next minute, is going to take over the
world. That's completely preposterous. What we're
29:04going to do is try with small ones, maybe as
smart as a mouse or something, maybe a dog,
29:09maybe a cat, maybe a dog maybe and work our way up
and then put some more guardrails, basically like
29:16we've engineered more and more powerful and more
reliable turbojets. It's an engineering problem.
29:22Yeah, yeah, you were also on a paper.
Maybe this is the one that talked about
29:29the embodied Turing test on neuro-AI.
Can you explain what neuro-AI is?
29:40Okay. Well, it's the idea that we should get
some inspiration from neuroscience to build
29:48AI systems and that there is something to be
learned from neuroscience and from cognitive
29:57science to drive the design of AI systems.
Some inspiration, something to be learned,
30:05as well as the other way around. What's
interesting right now is that the best models
30:09that we have of how, for example, the visual
cortex works is convolutional neural networks,
30:16which are also the models that we use to recognize
images, primarily in artificial systems. There is
30:24information being exchanged both ways. One way
to make progress in AI is to ignore nature and
30:35just try to solve problems in an engineering
fashion, if you want. I found interaction with
30:45neuroscience always thought-provoking. You
don't want to be copying nature too closely,
30:52because there are details in nature that are
irrelevant and there are principles on which
30:59natural intelligence is based that we haven't
discovered. But there is some inspiration to have,
31:04certainly convolutional nets were
inspired by the architecture of the visual
31:08cortex. The whole idea of neural nets and deep
learning came out of the idea that intelligence
31:14can emerge from a large collection of simple
elements that are connected with each other and
31:19change the nature of their interactions. That's
the whole idea. Inspiration from neuroscience
31:27has been extremely beneficial so far, and the
idea of neural AI is that you should keep going.
31:34You don't want to go too far. Going too far, for
example, is trying to reproduce some aspect of the
31:41functioning of neurons with electronics. I'm not
sure that's a good idea. I'm skeptical about this,
31:49for example. So your research right
now, are you, your main focus is
31:57on furthering the JEPA architecture into
other modalities, or where are you headed?
32:06Yeah, so, I mean, the long term goal is, you know,
to get machines to be as intelligent and learn
32:14as efficiently as animals and humans. Okay, and
the reason for this is that we need this because
32:19we need to amplify human intelligence, and so
intelligence is the most needed commodity that
32:25we want in the world, right? And so we could,
you know, possibly bring a new renaissance to
32:32humanity if we could amplify human intelligence
using machines, which we are already doing with
32:37computers, right, I mean, that's pretty much
what they've been designed to do. But even more,
32:42you know, imagine a future where every one of us
has an intelligent assistant with us at all times.
32:53They can be smarter than us. We shouldn't feel
threatened by that. We should feel like we are,
32:59like, you know, a director of a big lab or a CEO
of a company that has a staff working for them of
33:06people who are smarter than themselves. I mean,
we're used to this already. I'm used to this,
33:10certainly working with people who are smarter
than me. So we shouldn't feel threatened by this,
33:15but it's going to empower a lot of us, right,
and humanity as a whole. So I think that's a
33:23good thing. That's the overall practical goal. If
you want right. Then there's a scientific question
33:28that's behind this, which is really what is
intelligence and how do you build it? And
33:33then which is you know, how can a system learn
the way animals and humans seem to be learning
33:38so efficiently? And the next thing is, how do
we learn how the world works? By observation,
33:45by watching the world go by, through vision
and all the other senses. And animals can do
33:52this without language, right? So it has nothing
to do with language. It has to do with learning
33:57from sensory percepts and learning mostly
without acting, because any action you take
34:03can kill you. So it's better to be able to learn
as much as you can without actually acting at all,
34:08just observing, which is what babies do in the
first few months of life. They can't hardly do
34:13anything, right? So they mostly observe and
learn how the world works by observation.
34:18So what kind of learning takes place there?
So that's obviously kind of self-supervised,
34:23right, it's learning by prediction. That's an old
idea from cognitive science, and the thing is,
34:30you know, we can learn to predict videos.
But then we noticed that predicting videos,
34:34predicting pixels in video, is so infinitely
complicated that it doesn't work. And
34:39so then came this idea of JEPA right. Learn
representations so that you can make predictions
34:44in representation space, and that turned out to
work really well for learning image features,
34:50and now we're working on getting this to work for
video and eventually we'll be able to use this to
34:56learn world models where you show a piece of video
and then you say I'm going to take this action,
35:03predict what's going to happen next in the
world and you know, which is a bit where the
35:10Gaia system from Wayve is doing at a high level. But
we need this at various levels of abstraction so
35:16that we can build, you know, systems that
are more general than autonomous driving.
35:23Okay, that's the yeah, and it's my
fault so I won't go over the hour,
35:33but is it conceivable that someday
there will be a model that you,
35:44maybe embodied in a robot that is ingesting
video from its environment and learning,
35:54as it's just continuously learning and
getting smarter, and smarter, and smarter?
36:01Yeah, I mean, that's kind of a bit of a
necessity, the reason being that you know,
36:09even if you train a system to have a world model
that can predict what's going to happen next.
36:13The world is really complicated and there's
probably all kinds of situations that you,
36:17you know the system hasn't been
trained on and needs to, you know,
36:21fine tune itself as it goes. So you know, animals
and humans do this early in life by playing. So
36:32play is a way of learning your world model in
situations that basically you won't hurt you,
36:42and but then during life, of course, you
know, when we learn to drive, there's all
36:46kinds of these mistakes that we do initially,
that we don't do after having some experience,
36:52and that's because we're fine tuning our world
model to some extent. Yeah, learning a new task,
36:58we're basically just learning a new version of
our world model, right? So, yeah, I mean, this
37:04type of continuous, continual learning is going
to have to be present, but the overall power and
37:10intelligence of the system will be limited by, you
know, how much a computer neural nets it is
37:15using and various other constraints. You
know, computational constraints basically.
37:20You know you're still young and
this. I'm not sure about that. Well,
37:28you're younger than Geoff. Let me put it that way.
37:30I'm younger than Geoff, I'm
older than Yoshua, yeah.
37:35But this, the progress you've made on world
models is fairly rapid from my point of view
37:42watching it. Are you hopeful that within your
career you'll have embodied robots that are
37:55building world models through their interaction
in reality and then being able to? Well, I guess
38:01the other question on world models: do you then
combine it with a language model to do reasoning,
38:11or is the world model able to do reasoning on
its own? But are you hopeful that in your career
38:17you'll get to the point where you'll have
this continuous learning in a world model?
38:22Yeah, I sure hope so. I might have another, you
know, 10 useful years or something like this in
38:28research before my brain, you know, turns into
bechamel sauce, but, or something like that.
38:36You know 15 years, if I'm lucky, so, or perhaps
less, but yeah, I hope that there's going to be
38:44breakthroughs in that direction during that
time. Now, whether that will result in the
38:50kind of artifact that you're describing you know
robots that can Like you know domestic robots, for
38:57example, or self-driving cars that can learn fairly
quickly by themselves, I don't know, because there
39:04might be all kinds of obstacles that we have not
envisaged that may appear on the way. No, it's a
39:12constant in the history of AI that you have some
new idea and a breakthrough and you think that's
39:19going to solve all the world's problems, and then
you're going to hit a limitation and you have
39:25to go beyond that limitation. So it's like you
know you're climbing a mountain. You find a way
39:30to climb the mountain that you're seeing and you
know that once you get to the top you will have
39:36the problem solved, because now it's, you know,
a gentle slope down and once you get to the top,
39:42you realize that there is another mountain
behind it that you hadn't seen. So that's been
39:48the history of AI right, where people have come up
with sort of new concepts, new ideas, new ways to
39:55approach AI, reasoning, whatever perception, and
then realize that their idea basically was very
40:04limited. And so you know this. Inevitably we're
trying to figure out what's the next revolution in
40:17AI. That's what I'm trying to figure out, and so
you know, learning how the world works from video,
40:22having systems that have world models, allow
systems to reason and plan. And there's something
40:29I want to be very clear about, which is an answer
to your question, which is that you can have
40:37systems that reason and plan without manipulating
language. Animals are capable of amazing feats of
40:44planning and also, to some extent, reasoning. They
don't have language, at least most of them don't
40:52and so many of them don't have culture because
they are mostly solitary animals. So you know,
41:01it's only the animals that have some level of
culture. So the idea that a system can plan and
41:10reason is not connected with the idea that you
can manipulate language. Those are two different
41:16things. It needs to be able to manipulate abstract
notions, but those notions do not necessarily
41:23correspond to linguistic entities like words or
things like that. We can have mental images. If
41:29you want to think like you do, ask a physicist or
mathematician you know how their reason is very
41:35much in terms of sort of mental models, nothing
to do with language Then you can turn things into,
41:40into language. But that's a different story.
That's the second, second step. So you know,
41:49we're going to have to figure out how to do this.
Reasoning, hierarchical planning in machines
41:55reproduce this first and then, of course, you
know, sticking language on top of it will help,
42:01like it will make those systems smarter and be
able you know, it will allow us to communicate
42:05with them and teach them things and they're going
to be able to teach us things and stuff like that.
42:10But this is a different question really,
the question of how we organize AI research
42:16going forward, which is somewhat determined by how
afraid people are of the consequences of AI. So if
42:21you have a rather positive view of the impact of
AI on society and you trust humanity and society
42:28and democracy is to use it in good ways, then the
best way to make progress is to open research and
42:37for the people who are afraid of the consequences,
whether they are societal or geopolitical,
42:44they're putting pressure on governments around
the world to regulate AI in ways that basically
42:49limit access, particularly of open source code
and things like that, and it's a big debate at
42:57the moment. I'm very much on the side, so is
Meta, very much on the side of open research.
43:03Yeah, actually that was something I was going
to ask you and now that you brought it up,
43:09because I've been talking to people about this and
there is a view that, aside from the risks of open
43:18source again Geoff Hinton saying, would you open
source thermonuclear weapons? Aside from that,
43:28the question is whether open source can marshal
the resources to compete with proprietary models
43:40and because of the tremendous resources required
for when you're scaling these models. And there's
43:49a question as to whether or not Meta will
continue to open source future versions
43:55of Llama or not continue to open source,
but whether it'll continue to invest the
44:02resources needed to push the open source
models. So what do you think about that?
44:11Okay, there's a lot to say about this, Okay. So
first thing is there's no question that Meta will
44:16continue to invest the resources to build better
and better AI systems because it needs it for its
44:21own products. So the resources will be invested.
Now the next question is will we continue to open
44:30source the base models? And the answer is probably
yes, because that creates an ecosystem on top
44:37of which an entire industry can be built, and
there is no point having 50 different companies
44:44building proprietary, closed systems when you
can have one good open source base model that
44:52everybody can use. It's wasteful and it's not
a good idea. And another reason for having open
45:00source models is that nobody has, no entity as
powerful as it thinks it is, has a monopoly
45:09on good ideas. And so if you want people we can
have good, new, innovative ideas to contribute,
45:15you need an open source platform. If you want
the academic world to contribute, you need
45:19open source platforms. If you want the startup
world to be able to build customized products,
45:24you need open source base models, because
they don't have the resources to build, to
45:28train large models. And then there is the history
that shows that for foundational technology, for
45:39infrastructure type technology, open source always
wins. It's true of the software infrastructure of
45:50the internet. In the early 90s and mid 90s there
was a big battle between Sun Microsystems and
45:55Microsoft to deliver the software infrastructure
of the internet - operating systems, web servers,
46:05web browsers and various server side and client
side frameworks. They both lost. Nobody is talking
46:12about them anymore. The entire world of the web is
using Linux and Apache and MySQL and JavaScript,
46:24and even the basic core code for web browsers is
open source. So open source won by a huge margin.
46:35Why? Because it's safer, it gathers more people
to contribute. All the features are unnecessary,
46:42it's more reliable, Vulnerabilities are fixed
faster and it's customizable. So anybody
46:51can customize Linux to run on whatever
hardware they want. So open source wins.
46:57But it's the same thing.
47:00It's going to be the same thing. It's inevitable.
The people now who are climbing up like OpenAI,
47:08their system is based on publications
from all of us and from open platforms
47:17like PyTorch. ChatGPT is built using PyTorch.
PyTorch was produced originally by Meta. Now
47:22it's owned by the Linux Foundation. It's open
source. They've contributed to it, by the way,
47:29their LLM is based on transformer architectures
invented at Google. All the tricks to train,
47:36all those things came out of various papers
from all kinds of different institutions,
47:41including academia. All the fine-tuning techniques
are the same. So nobody works in a vacuum. The
47:48thing is, nobody can keep their advance and their
advantage for very long if they are secretive.
47:57Yeah, except that with these models, because
they're so compute intensive and they cost so
48:02much money to train, you need somebody
like Meta who's going to be willing to
48:09build them and open source them. That's why,
when I was asking whether they'll continue,
48:17obviously Meta will continue
building resource intensive models,
48:24but the question is whether they'll
continue to open source them.
48:30I'm telling you the only reason why Meta could
stop open sourcing models is legal. So if
48:38there is a law that outlaws open source AI
systems above a certain level of sophistication,
48:46then of course we can't do it. If there are
laws that, in the US or across the world,
48:55make it illegal to use public content to train AI
systems, then it's the end of AI for everybody,
49:03not just for open source, or at least the end of
the type of AI that we are talking about today.
49:09We might have new AI in the future, but that
doesn't require as much data. And then there
49:15is liability. If you believe that someone is
doing something bad with an AI system that was
49:28open sourced by Meta, then Meta is liable. Then
Meta will have a big incentive not to release it,
49:35obviously. So the entire question about this is
around legal reasons and political decisions.
49:41But on the idea of open source winning, don't
you need more people or more companies like
49:47Metal building the foundation models
and open sourcing them? Or could it be
49:53an open source ecosystem win based on
a single company building the models?
50:00No, I mean you need two or three, and
there are two or three, right. I mean,
50:03there is this Hugging Face. There is Mistral
in France, who is also embracing open source LLM
50:11they're very good LLM. It's a small one, but it's
very good. There are academic efforts like LAION.
50:20They don't have all the resources they need, but
they collect the data that is used by everyone,
50:24so everybody can contribute. One thing that
I think is really important to understand
50:28also is that there is a future in which I
described earlier, in which every one of us,
50:35every one of our interactions with the digital
world, would be mediated by an AI assistant,
50:41and this is going to be true for everyone around
the world, right? Everyone who has any kind of
50:46smart device. Eventually, it's going to be in our
augmented reality glasses, but for the time being,
50:52in our smartphones. And so imagine that future
where you are, I don't know, from Indonesia or
51:06Senegal or France and your entire digital diet
is done through the mediation of an AI system.
51:19Your government is not going to be happy about it.
Your government is going to want the local culture
51:24to be present in that system. It doesn't want
that system to be closed sourced and controlled
51:30by a company on the west coast of the US. So
just for reasons of preserving the diversity
51:40of culture across the world and not having our
entire information diet being biased by whatever
51:47it is that some company on the west coast of the
US thinks there's going to need to be open source
51:53platforms and they're going to be predominant in
at least outside the US for that reason, including
52:03China. There is all those talks about oh what if
China puts their hand on our open source code? I
52:08mean, China wants control over its own LLM because
they're a citizen to have access to certain types
52:15of information. So they're not going to use our
LLMs, they're going to train theirs. That they
52:20already have. And nobody is particularly ahead
of anybody else by more than about a year.
52:28And China is pushing open source. I mean, they're
very pro open source within their ecosystems.
52:36Some of them. There is no unified opinion there,
but I mean it's the same in the West, right,
52:44there are some governments that are too afraid
of the risks and then, or are thinking about it,
52:51and some others that are all for open source
because they see this as the only way for
52:56them to have any influence on the information,
the type of information and culture that would
53:03be mediated by those systems. So it's going to
have to be like Wikipedia, right? Wikipedia is built
53:14by millions of people who contribute to it from
all around the world, in all kinds of languages,
53:20and it has a system for vetting the information.
The way AI systems of the future will be taught
53:26and will be fine tuned will have to be the
same way. It will have to be crowd sourced,
53:31because something that matters to a farmer
in Southern India is probably not going to
53:39be taken into account by the fine tuning done
by some company on the West Coast of the US.
53:46AI might be the most important new computer
technology ever. It's storming every industry
53:52and literally billions of dollars are being
invested, so buckle up. The problem is that
53:57AI needs a lot of speed and processing power.
So how do you compete without cost spiraling
54:04out of control? It's time to upgrade to
the next generation of the cloud Oracle
54:10Cloud Infrastructure, or OCI. OCI is a single
platform for your infrastructure, database,
54:18application development and AI needs. OCI has
four to eight times the bandwidth of other clouds,
54:27offers one consistent price instead of
variable regional pricing. And, of course,
54:32nobody does data better than Oracle. So now you
can train your AI models at twice the speed and
54:39less than half the cost of other clouds. If
you want to do more and spend less, like Uber,
54:478x8 and Databricks Mosaic, take a free test drive
of OCI at oracle.com/eyeonai. That's E-Y-E-O-N-A-I
55:00all run together: oracle.com/eyeonai.
55:06That's it for this episode. I want to thank Yann
for his time. If you want to read a transcript
55:12of this conversation, you can find one on our
website eye-on.ai, that's eye-on.ai. And remember
55:22the singularity may not be near, but AI is
changing your world, so best pay attention.