Searching...
Searching...
14 results for “cognitive performance”
So I I don't think it sounds too far fetched to me. Yeah. I mean, I I think the the the thing that came up earlier of the also the, like, intelligence per cost thing, you know, the the real world is, like, an interesting litmus test because at the en
...of domain performance, outside of just pure code generation.
you know, the measurements to really define that clearly. But I think it's pretty clear. You know, people try chain of thought with GPD, like, really small models, and they saw that it just didn't really do anything. Then you go to bigger models, and
...performance in other areas. Right? So that's the hard part because you you can of course, you could put more coding data in or you could put more,
that is what we're doing here. Yeah. And you could argue that actually this is not that different from, like, I guess, the, the system one, system two paradigm because, you know, if you ask, like, a pigeon to think really hard about playing chess, yo
...of cognitive tasks that, you know, we know that humans can do, and maybe also make the system available to, a few 100 of the world's top experts, Terrence Tows of each each subject area, and see if they can find you know, give them give them a month
...out of domain performance, outside of just pure code generation. And coding and math is also interesting because sometimes when you ask complex questions, Alex, the sub steps involve being able to compute stuff or calculate stuff and pass the results
especially in climate, especially in, kinda like, agriculture, food security, you can't think of this as, you know, like shots on goal and this and that. You've gotta kind of say, hey, we can get better at this. Reasoning is the biggest paradigm shif
And so we're trying to, like, iteratively kind of, you know, deploy these things and, like, try them out and figure out, like, where are they reliable, you know, and where are they not. Because yeah. Like, if you did just let the model control your c
...performance is usually how enterprises think about it when they run their evaluations themselves. So that's, that's why I wouldn't put too much money on the benchmarks. It's still useful. Certain of them are. The the the lower you are from the the fa
on the, that's only exclusive to CloudCoWork. We have some tricks for this sort of like change week over week, we eval cowork maybe against different use cases than we would evil a clock code, right? If you think about it this way. Okay. So like cloc
“TikTok scrolling destroys your memory worse than random guessing”
...performance is barely better than random guessing. So, apparently, if you What does that mean? So, basically, if you opened TikTok and scroll through a lot of lot of, sort of clips Uh-huh. After that,
The thing dude, we were in between. Like, the lawyers all pulled an all nighter as well going and getting this because it was, like, yeah. I mean, we we need to get this ready to go, but, you know, and there's just all the various little things of th
So I think they're also, yeah, downplaying whether or not it's reasoning or not. I think they're trying to merge everything together. And it's not I I mean, I didn't realize that, but extended thinking could not use tools before the way they worded i
Have a podcast?
Get ranked clips, hooks, and ready-to-post copy from your own episodes. Free to try.