Searching...
Searching...
20 results for “model distillation”
“China copies American AI breakthroughs but can't make them 10x better”
...distillation where you basically train the next model on the answers of of the previous model, and and and I think for sure China is is doing some of that, and there there's a there's a lens on that t
...model. Yeah. And even if that's the case, what they did is still amazing, by the way, what DeepSig did efficiency wise. Distillation is standard practice in not if you're at a closed lab where you care about terms of service and IP closely, you disti
...distillation. So Smaller models become better than bigger models purely because of the quality of the data that's inputted through them? One theory is that the smaller model is better at generating output that you would want it to generate, essential
...models and so many different providers, Open source is a very viable possible route, and distillation is looked to have in a shady way. Is distillation really wrong if it ultimately propels spaces forward? Well, even, like, let's take within the labs
...models that are able to do similar things at a lower cost.
...Hopefully, we'll see more sort of condensed optimized distilled models that are able to do similar things at a lower cost. Okay. So there was a lot of news last week, so this got kind of lost. But I heard there was a big update to ChatGPT's advanced
...released in a while, certainly around code, so that's gonna show up. And so I just feel like the players, the money behind the players, the fact that these models distill, this one end up in an oligopoly. To what extent do you think the large model p
...with different flavors, and there's gonna be a lot of new flavor models that will come out. You know, Mira and Ilya are out there creating models. I mean, you got these very legit teams that were some of the pioneers. We're just starting up models fo
...open source models, take LAMA, for example, like, they've been able to do that from their own research and perspective and data ingestion and and training. And so I guess I would say distillation does not feel, essential in order to unlock those thin
...model. It's a great model. And if you actually look at it, you know, on the price performance, I would say in many use cases, it's the one that I actually use as my standard model, it's better than anthropic for some use cases if you actually, you kn
...models distill, like, this will end up in an oligopoly. But I I mean, I don't know. That's just my guess. To what extent do you think the large model providers in ten years' time have already been created, or are they yet to be founded? I think that
...model distillation to make these four zero fine tunes really good. But, yeah, the main thing is, like, can it remain factual? Can it answer questions based on what it retrieves? And get it cited accurately? And that's what this fine tune model really
...and it's still, oh, it's 10,000,000,000 parameters, 20,000,000,000 parameters, whatever. On the inference side, it's similar learnings happening simultaneously. We don't need to do a 100 steps of diffusion for inference, like a 100 denoising steps to
...models and make them an order of magnitude more efficient in twelve months, then it could make sense to run really aggressive margins on serving models. It really comes down to the stickiness and whether those subsidies today are driving large LTVs t
...can distill models and have them work with a few steps of diffusion now. I think we're definitely the most inefficient we'll ever be, and it's only gonna get more and more efficient. It could be a factor of at least an order of magnitude, like, 10 x
...model distillation, or do you think that comes from just better orchestration of tools? Where do you think that's the speed? I think, like, the low hanging fruit is just, like, plain old deterministic, like, DevOps y type stuff. Okay. You know, like,
...models different than external because the external models have been getting a lot smaller,
...distillation is that sometimes the smaller models become better than the than the than the bigger model through distillation.
...seen distillation of foundation models. We're going to start to see distillation of business models, businesses. And so I would expect these really successful businesses to get copied ridiculously quickly. So I think this idea that you're gonna have
...model with supervised learning, supervised fine tuning on it. But then how would you even detect that this is an distillation attack versus just an evaluation? Because right now, I'm actually running, I mean, I'm distilling myself for chapter eight o
Have a podcast?
Get ranked clips, hooks, and ready-to-post copy from your own episodes. Free to try.