Notes on innovation, and the chat we didn't see coming

There is a kind of wrong that is useful in this work, and a kind that is not. The kind that costs you a quarter is not. The kind that changes the shape of how you look at the next thing is. One of those landed on me at the end of November, 2022.

For several months that year, a small team of engineers and I had been working inside BERT and GPT-2. We read the papers as they dropped. We argued about tokenization and embedding dimensions and which fine-tuning regimes would generalize. We built. We were sure we were thinking out of the box. We were, all of us, convinced that we were the creative people in the room. We were not.

When you operate inside a research-adjacent product team for long enough, you begin to mistake fluency for vision. The conversation gets dense. You start speaking in a half-language of attention heads and prompt patterns and ablations, and you take the ability to keep up as evidence that you are seeing further than the people not in the room. You are not. You are seeing further into a particular kind of question. Whether that question is the question that matters is a separate problem entirely.

November 30, 2022

On November 30, OpenAI did the thing we had not done. They didn't release a more powerful model. They didn't deepen the architecture. They didn't publish a paper. They took the model they had and wrapped it in a chat interface and put it in front of the public: anyone, no application, no API key, no engineering required. The model behind it was not the most powerful thing on their bench. The decision was a product decision, not a research decision. And it broke the field open.

The week after, I watched the public take a model that today's standards would call quaint and do things I had not imagined. They had no priors. They had not read the papers. They poked at a chat box and told their friends. The work product was extraordinary. Cool things. Strange things. Useful things. Frivolous things. They built workflows around it. They argued with it. They asked it to write in styles it had not been trained to mimic, and it tried. None of this required the priors I had spent months accumulating. In some cases, those priors had been the thing keeping me from seeing the use.

We had been thinking about the model. The world was thinking about the surface.

What I had wrong

What I had wrong wasn't the technology. The technology I understood. What I had wrong was that I had been treating capability as the constraint. The constraint was access. The constraint was whether a person who had never read a paper could put a question in front of the model and get an answer back without any of the apparatus we had been building.

I had also been wrong about what creativity looks like in this work. I was working alongside competent, intelligent people, and we considered ourselves creative. Most of what we were doing was projecting the next version of what we had already built. The public wasn't projecting. The public didn't know what was supposed to be hard, so they tried things the people inside the lab would have ruled out as not interesting, or not feasible, or not worth the cycles. Some of those tries were extraordinary. A non-trivial number of them shaped the actual product roadmap of the next two years.

What I keep

I have been in technology long enough to be skeptical of anything that calls itself a paradigm shift. I am cautious about the phrase here too. But the experience of November 2022 changed something specific about how I evaluate AI work, and I want to write it down so I keep it.

The first thing: domain expertise is a moat, and it is also a blinder. I had used the word innovation with some confidence in 2022. After November, I used it more carefully. The people most steeped in a technology are often the worst people to imagine what the technology is for. Capability is one variable. Distribution and surface and permission to play are others. If you are working on capability alone and you have stopped asking about the others, you should expect to be surprised by someone who is not.

The second thing: when a tool escapes the people who built it, watch what happens carefully. The use cases that emerge from a public who was not briefed are the closest thing to a real demand signal you will ever get. If they look unserious to you, that is information about you.

The most generative thing about ChatGPT was that it didn't ask anyone for permission.

I loved every minute of being shown how wrong I had been. That is, in my experience, the rare and useful kind of wrong. Not the kind that costs you a quarter. The kind that gives you back a sense of how much you don't know about how the work is going to be used. I have not really stopped looking for moments like it since.

// DG