Francois Chaubard
AI researcher and podcast guest discussing recursive model architectures.
“A 7 million parameter model can solve what a 100 billion parameter model can't solve trained on the entire internet, and a 7 million parameter wins.”
Source→“The right answer is to take the amazingness here and take the amazingness here, which probably is already in Gemini already or some of these, it might be at least in some part.”
Source→“This guy, Constantine at Francis Chalet's company, India, actually did. And it's this amazing breakdown that he posted on YouTube. The main takeaway is that the outer refinement loop is the main reason why these things work so well.”
Source→“She figures out that you actually can back prop through all the way to the deep recursion, which actually improves performance much, much more. She makes the model three, four times smaller. Because it has that recursion it actually outperforms.”
Source→“This was only a 27 million parameter model that was only trained on ArcPrize. There's literally a thousand tasks... This starts from literally tabula rasa weights. And it can outperform — o3 gets zero. Literally zero. And this got like something like 70% on ArcPrize 1.”
Source→“We were very much in the belief that this was required to get to AGI — peak RNN use probably until 2016 with Alex Graves' NeurIPS keynote, which is just fantastic, and all his adaptive compute time work.”
Source→“Demis had this whole thing about like the ultimate test is the Einstein test. Like go back to 1911 and then have it rebuild all the physics up until now.”
Source→“There's this researcher named Melanie Mitchell that writes this book talking about this very phenomenon which is like, it is sufficient, not necessary to go bigger and get better performance, and it is sufficient and not necessary to add more recursion.”
Source→“Going deeper actually didn't help. And actually on some tests, it was just the feed forward net that works just as well as a transformer there — on Sudoku, MLP actually outperformed the attention.”
Source→AI-extracted from podcast / newsletter / paper summaries. May contain errors.