OpenAI President Greg Brockman: Doubling Down on Text Models, The Superapp Plan, Codex’s PotentialOpenAI is shifting strategies yet again. Here's the logic behind the latest moves and what they mean for the company's direction.OpenAI is in the midst of yet another strategic shift. The company has abandoned video generation, it’s building a ‘superapp’ that combines coding, chat, and browsing, and it’s zeroing in on a use case where AI uses your computer to make you more effective at work, and assist you everywhere else. Underlying the shift, OpenAI President Greg Brockman told me, is a belief that the company’s core large language models are the right architecture to bet on, and that tree of AI is growing increasingly capable and reliable. “There’s been this debate about how far text models can go,” Brockman said. “I think we have definitively answered that question — it is going to go to AGI. We have line of sight to much better models coming this year.” In an extended Big Technology Podcast interview, Brockman offered revealing comments about the company’s research direction, how far it can push its Codex coding assistant, and the logic of supporting all of this with mountains of compute. You can read the full Q&A below, edited lightly for length and clarity, or listen on Apple Podcasts, Spotify, or your podcast app of choice. Alex Kantrowitz: OpenAI has shut down video generation and is preparing a ‘superapp’ that will combine business and coding use cases. From the outside, it looks like you were winning in consumer but decided to shift toward business. Why do that? Greg Brockman: We’ve been in a world where we’re developing this technology, deep learning, to really see, can it have the positive impact that we have always pictured? And we’ve separately had an effort to actually try to deploy this technology, whether that’s to help sustain the business, to start getting some practice with getting real world impact, those kinds of things. We’re at a moment now where we’ve really seen this technology is going to work, and that we’re moving out of testing on benchmarks and these almost cerebral demonstrations of capability to it actually being the case that for us to develop it further, we need to see it in the real world and get feedback from how people are using it in knowledge work, in various applications. And so the way I think about it is that this is a bigger strategic shift because of the phase of the technology. It’s not so much that we’re saying we’re moving from consumer to B2B, what we’re saying is — What are the most important applications that we can focus on, because we can’t focus on everything. What are the things that we can bring to life that will actually synergize together as we build them, and that will deliver meaningful impact and help elevate everyone? When we look at the list, there’s consumer, which you can think of it as many things, but there’s a personal assistant — something that knows you, that’s aligned with your goals, it’s going to help you achieve whatever it is that you want in your life. There’s also creative expression and entertainment and many other applications. On the business side, maybe if you zoom out, it looks more like: You have a hard task, can AI go do it? Does it have all the context to do all these things? For us, it’s very clear that the stack rank includes two things at the top. One is the personal assistant, the other is the AI that can go and solve hard problems for you. And when we look at the compute we have, we are not even going to have enough compute to fund those two things. And then once we start adding in many other applications, many other things that AI is going to be very useful for and is going to help people with, we just can’t possibly get to all of them. So this is a recognition of the maturation of the technology and the incredible impact it’s going to have very quickly, and our need to prioritize and to actually pick the set of applications that we want to really bring to the world. I’ve heard you compare OpenAI’s various bets to Disney’s, where you have a company with a core advantage that it farms out in different ways. Disney has Mickey Mouse, and it can do movies, theme parks, Disney+. At OpenAI, you have the model, and you can do video generation, a personal assistant, enterprise work. Is that no longer possible? In some ways, that story is even more true than it’s been. But the thing that’s important to realize is technologically, the Sora models — which are incredible models, by the way — are a different branch of the tech tree than the core reasoning GPT series. They’re just built in a very different way. And to some extent, we’re really saying that pursuing both branches is very hard for us to do for these applications. We are actually continuing the Sora research program in the context of robotics, which I think is very clearly going to be a transformative application, which is still a little bit in the research phase. And so it’s a recognition that for this moment, we really need to put the primary focus on developing the GPT series. And that doesn’t just mean text. It doesn’t just mean cerebral things. For example, bidirectional communication, having a great speech to speech interface — that is something that also is going to make this technology very usable and very useful. But it’s not a different branch of the tech tree. It’s all one model, and we just sort of tweak that in slightly different ways. If you branch too far and you have two different artifacts, it is very hard to sustain in a world where there is limited compute, and the reason there’s limited compute is because there’s so much demand. There’s so much that people want to do with every single model that we create. Betting on Text vs. World ModelsWhy is your bet on the GPT model tree, when you had been seeing real progress with Sora? The problem in this field is too much opportunity. The thing that we observe very early on in OpenAI is that everything we could imagine works now. There’s different levels of friction associated with it, different amounts of engineering effort, different compute requirements, all those things, but every single different idea, as long as it’s kind of mathematically sound, you actually can start getting some pretty good results. That shows you the power of the underlying technology of deep learning, the ability to really take any sort of problem and to get to the meat of it, to have an AI that really understands the underlying rules that generated the data. So it’s not about data itself, it’s about understanding the underlying process and then being able to apply it in new contexts. So you can do that in world models. You can do that in scientific discovery. You can do that in coding. Every single different idea, as long as it’s kind of mathematically sound, you actually can start getting some pretty good results. And I think that where we are as we think about the rollout of this technology is that there’s been this debate of, how far will the text models go, how far can text intelligence go? Can you have a real conception of how the world operates? And I think that we have definitively answered that question — it is going to go to AGI. We see line of sight. And at this point we have line of sight to much better models that are coming this year, and the amount of pain within OpenAI that we’ve had to decide how to allocate compute goes up, not down, over time. And so I think that maybe the core of it is that it’s about sequencing and timing, and that in this moment, the kinds of applications that we’ve always dreamed of are starting to come into reach. For example, solving unsolved physics problems — we had this result recently where a physicist had been working on a problem for some time, he gave it to our model, 12 hours later we have a solution. And he said this is the first time he’s seen a model where he felt like it was thinking, that it felt like this is a problem that maybe humanity would never solve, and our AI solved it. When you see something like that, you have to double down, you have to triple down, because we can really unlock all of this potential for humanity. And so for me, it’s not about relative importance of these things. It’s more about what is OpenAI’s mission of delivering AGI to the world, our vision of how it can benefit everyone, and the fact that we have a tech tree that we see how to just push it, how to do the engineering, do the further science and research to then have that come to fruition. Does open AI potentially miss something by doubling down on the text model tree? So two answers. One is: absolutely, yes. In this field, you do have to make choices. You have to make a bet, and that’s actually where OpenAI started — we really said, what is the path to AGI that we believe in? And really focused hard on that. The sum of random vectors is zero, but if you align your vectors, then you can go in a direction. But the second point is — it’s actually image generation that has been very, very popular within ChatGPT, and that’s something we’re continuing to invest in, continuing to prioritize. And the reason we’re able to do that is because it’s not actually on the world model, diffusion model tech branch — it’s actually based on the GPT architecture. And so there, even though it’s a different data distribution, the actual core technology, the core stack, it’s all one thing. That is actually the pretty wild thing about what AGI is, is that sometimes these very different looking applications — between speech to speech, image generation, text — and text is, by the way, itself many facets, like science and coding and personal wellness information, those kinds of things — all of that you can do in one tec |