Searching...
Searching...
13 results for “scaling challenges”
...scaling. And it's really tempting to say that we we can simply solve all of our problems by throwing enough money at it. And indeed, I would say the effectiveness of scaling can't be denied. Increasing the model size, the number of training examples,
...scaling loss and the validation loss actually translate into quality improvements for text to image synthesis. We saw similar results also in different modality other modalities like video synthesis. And overall, it makes us confident that further sc
...scaling law for it, which is effectively for how much compute you put in, the architecture will get to different levels of performance at test tasks. And mixture of experts is one of the ones at training time, even if you don't consider the inference
But but by itself, that's not enough, because invariably, that one model running on, a set of hardware, is gonna get too much traffic that it cannot handle. And at that point, you need to horizontally scale it. And that's not an ML problem. That's no
...scaling in context length. So this can mean just having more text inputs for for your models, but it can also mean things like taking a lot of visual token inputs, image inputs to your models, or generating lots of outputs. And one thing that's been
to get the models to think for that long or scale up test time compute. As you scale up test time compute, you're spending more on test time compute, which means that there's a limit to how much you could spend. That's one potential ceiling. Now obvi
...the next scaling paradigm. All analogies are imperfect. What is one way in which thinking fast and slow or system one, system two kinda doesn't transfer to how we actually scale these things?
...challenges from scale. Yeah. For these kind of large models, we definitely need to shard, you know, distribute the model into multiple, like, GPUs, multiple nodes. Right? And then and, yeah, then there's, like, definitely, like, a problem of how to c
How quickly do you go from a single, replica of that model to five to 10 to a 100? And so that's the second that's the second pillar that is necessary for running these mission critical inference workloads. And what does it take to do that? It takes
...we're seeing scaling not only during training time, but also during test time. So this is one of the the this is the iconic image from the OpenAI o one release. Not only are we starting to scale train time compute, but we're also starting to scale te
...depth, we actually unlock the ability to scale along batch size as well. So this is Ah. One of yeah. So so okay. So I guess collinear, like yeah. Right. So, like okay. I guess for context, like, in traditional RL, like, value based RL, scaling batch
scaling model size and maybe doing a little bit more pretraining. And, you know, especially at the time, it really was about model size. And just sort of doing more uniform scaling of that nature is just going to solve all of your problems. Yeah. And
...if scaling is actually giving you a better model, like, is it going to be financially worth it? And I think it'll kind of slowly will push it out as AI solves more compelling tasks. So like the likes of cloud opus 4.5, making cloud code just work for
Have a podcast?
Get ranked clips, hooks, and ready-to-post copy from your own episodes. Free to try.