The following is a rough transcript which has not been revised by High Signal or the guest. Please check with us before using any quotations from this transcript. Thank you.
===

nick: [00:00:00] What we are seeing in real time is a shift from, I think what I would term as co-driving into agentic, even for very challenging, sophisticated software engineering work inside of Google, which has an enormous repo with your very sophisticated dependencies, you are starting to see a model where you could treat it as an agent.

And for me, the distinction with an agent is I will assign work to this thing and I'm really not going to watch it, which is a very different paradigm than code driving because like basically when the agent has done its work. You're paying this cost to even absorb or download the state of where the agent finished and like to even understand what did he try to do and then how to correct what it did.

That was Nicholas Moy, 

hugo: former head of research at Windsurf and now at Google DeepMind on the shift from AI co-driving to a truly agentic era of software development. Such a take may be somewhat obvious to people on the bleeding edge of using and or building AI assisted coding systems, but the world has bifurcated.

Many are still maxing out on AI code [00:01:00] completion and not realizing that tools such as Google anti-gravity can even abstract away your coding environment. Your AI agent can go away and build software for tens of minutes, and your job becomes focusing on software systems and product. In addition to reviewing AI pool requests, may you live in interesting times.

Indeed. In this episode, Nick Duncan and I explore the engineering discipline required to build at the frontier of AI powered coding. We discuss winds, surf's journey, and the strategic necessity of disrupting yourself to avoid the innovator's dilemma. Nick shares his framework for building resilient software by focusing on what he calls information in variance.

These are the core data requirements that remain constant, even as underlying models improve exponentially. We dive into the technical challenges of the agentic paradigm, including the double penalty of agentic failure, and why the primary developer interface is shifting from the IDE to review centric [00:02:00] workflows.

We also cover the unique value of capturing fuzzy human data and the strategy of swaggy future capabilities to stay ahead of the next frontier model checkpoint. This was a really fun conversation about how the most sophisticated teams are navigating the transition from real time assistance to autonomous agents.

If you enjoy these conversations, please leave us a review. Give us five stars, subscribe to the newsletter and share it with your friends. Links are in the show notes. I'm Hugo b Anderson, and welcome to High Signal. Let's jump in. Hey there, Nick, and welcome to the show. Thanks for having me. So exciting to have you here.

There are so many things you've been working on and thinking about that. That I think I'm very excited about. But our audience will be super interested in not least of all what it was like to build windsurf and now everything you're doing at a a deep mind. So I'm just wondering if you could walk us through the journey of building windsurf from what the initial vision was and how you had to pivot and evolve as what we've seen with model [00:03:00] development, frontier model development dramatically changing this year.

nick: Yeah, it's a really just been a rollercoaster over the last three years. And I'm happy to start from the beginning. I'm gonna take you guys back. Imagine a time before time when it was like GPT-3 0.5. It was ai. The state of the art for AI was just autocomplete. It was like tab like finish your sentence.

And that already felt like magic. And that was like a setting in which I joined as a pretty early engineer at Windsurf, which was at the time called Codem. And uh, at that time, almost as soon as I joined, I think we prototyped, we were trying out this notion that would later become very popular of this agentic coding agent that was gonna.

Take multiple steps on your behalf and make really big changes. But at that time, as you can imagine, it was just didn't feel right. Compounding errors. It would just always go veer off in the wrong direction or stop too early or whatever. And I remember we would just periodically go back. We would try again with GPT-4, the original GPT-4, not four, oh none, not any of the other checkpoints.

And it still wasn't right. So basically the decision was. We should try to ship something and have it be [00:04:00] useful, not just wait for a future in which like this dream that we have is gonna happen. So we focused on the auto complete product. We tried to make auto complete really good. We were trying to sell something against, to be honest, a very strong incumbent, which was copilot at that time, and really emphasized features that we could have control over other than the model, which was self-hosting on-prem.

Huge investment in something. Things like dedicated context and context retrieval and context engineering. I think the whole time we were always dreaming of like, when can we go back to that vision for the future? And so I think this is not maybe unique. A lot of people were probably working at the same time, but I think we did, we were probably one of the first to ultimately ship an ag agentic coating product and feature that was windsurf.

And that was around, I think late October, early November of last year, 2024. And it was crazy successful. I think it was like beyond our expectations. I remember when we launched, I went away for the. Weekend with my wife to New Mexico and I was trying to turn off the notifications and then I was on the email alert list for philanthropic [00:05:00] for like our billing.

And we would get a bill, like every thousand dollars that we spent. And I could literally see like an exponential curve in my email inbox. 'cause I would just get pings more and more frequently, like through the course of a day. That launch went really well. And then the immediate question we had after is like, what can we do else after that?

And I think the big thing that we set out to do after was like, okay, we have this. We have this unique advantage relative to a model company, which is we have this real usage data. Can we demonstrate the data flywheel? Can we show, demonstrate, or do a proof of concept that there's really something valuable you can get out of real users interacting with your model in a real kind of problem setting.

And so we set out to try to train a model of our own. I think we were again released a frontier level model produced by an agent lab company that was SWE one that was in May. And I think a number of other agent lab companies have since followed suit. But this was cloud 3.5 level at a time when I think Cloud 3.7 was state of the art.

So. That was basically May of last year. And then as of last summer, basically a lot of the core research and development team basically joined Google DeepMind, and [00:06:00] that's where we are now. That's where I'm now, and working on basically making Gemini feel better for software engineers use cases. 

duncan: That's so cool to hear Nick.

And building a startup in this space today means your technology can so quickly be leapfrogged by competitors almost overnight. And it, I think this creates an almost innovator's dilemma where you have to be con constantly reinventing yourself as you've called out. Was there like a specific moment or particular kind of model release where you've realized like, wow, this needs to change right now?

nick: Yes. Yes, I think so. I actually remember very vividly that day. So as I said the whole time, the whole time we were building. The auto complete business out and trying to sell and like trying to get adoption for that business. We were always dreaming of like, what's next? Because I think maybe one key way to avoid the inner innovator's dilemma is you don't look at yourself as the thing that is making you a lot of money today.

If you are always perceiving yourself as we [00:07:00] are not the Tap Auto complete company, we are the AI powered software generation company. You're always just impatient for what the next thing is. So we're always in the back burner, have this. A gentech coating framework prototype that's like kind of cooking and we're testing a lot of stuff.

And I remember the day that we were finally like, we've been hearing some good things about cloud and let's just plug that into our harness and changing nothing else. Literally just being able to like sample from the cloud API, it was like, whoa, this thing is ready. And basically I think we shipped windsurf a month later.

That's the kind of moment that we were waiting for. So in, in a way, it's almost like preparing to disrupt yourself and then finally when the opportunity arrives, you're like ready to go. 

hugo: This is actually so important because a conversation we had with Lance Martin from Lang Chain about this, he really made sure that people went away understanding this as well.

That we're at a stage where if you are, if you have an idea and your product doesn't seem to work, there may be a model release tomorrow which allows it to work. And that seems something relative [00:08:00] like sure, technological change can make new ideas possible. But it hasn't been at this cadence or with the release of new models historically.

nick: Yeah, absolutely and in fact I would say it's only been a year, but like I feel that the pace of meaningful kind of new model releases that are meaningfully superior in capability to the previous one has actually only accelerated. So it's even less time than you have to wait. I think we waited, we did wait about a year for the models to get catch up to good enough.

But I think now if you have a vision for the future that you don't have right now, it's very possible like in two months or three months. Suddenly a product feature that didn't make any sense is gonna make a lot of sense. 

hugo: I know Duncan has a very interesting question about for developers around engineering and product challenges, but before that, I do want to ask, what does this mean for even the concept of a defensible moat?

And how did you think about that at WinDor? It seems, it seems like mo moats are incredibly, increasingly challenging. 

nick: Yeah, I think it's not easy, and what I would say is. I used to work in finance, and I actually started to think of this a lot. There's like a portfolio management strategy [00:09:00] with managing a startup.

And I think like any portfolio, you decide for yourself what your risk reward profile is gonna be, but then you gotta balance basically the sets of investments that you make, a long term, short term kind of local product features, making yourself robust to the future. And there are basically, different companies will come out with different strategies for how they want to deal with this.

And there were many companies in our vintage that basically went all the way towards. We're just gonna go to the finish line. We're gonna build either, like there's something that only makes sense when the agents are basically capable of running for hours and hours by themselves. Or we're gonna, actually, we're gonna go straight to changing training a, a foundation model end to end.

And we think that's gonna be a mode. And I think what we took is a, probably a more cautious approach, which was like we are gonna try to build for today. Because there's a lot of kind of benefits to that where you get kind of a user-based adoption, some brand recognition a, and probably most importantly, like a really strong intuition for.

What the frontier is and what your customers want from you and what they're trying to do from you. And that's like surprisingly kind of valuable and interesting. And I think like my, many of the strongest AI companies now [00:10:00] that you look out out there like Cursor or what have you, that's a lot of what they rely on that like that and staying on their toes, it, they're probably always a little bit ahead informationally than anyone else behind them, nevermind the fact that they have.

Ability to build reputation and build a recruiting pipeline and build a lot more things become easy when you have something in the market that you can feed back to your investors, your customers, your potential hires, your current hires towards. So that was the strategy that we took and I think that worked really well for us.

I think probably that's my bias, but I would say if I were a startup and I would imagining a future, it's, you can always try to. Dream up a technical mode for yourself or some kind of like long-term defensible mode. But I would say that there's almost, it's almost never a bad idea to like try to get something in market and try to get real users, if, especially if you believe that something is not, you're not forcing that feature out.

You think it's a good feature, you think it will like really meaningfully tell you something about where the thing is gonna go in the future. That's very valuable to get into the market and get people to really start to recognize you. I don't know if that's a satisfying answer. It is very dicey out there for any kind of AI powered application startup, [00:11:00] but I think that there's plenty of business out there that seem to be able to use that lead to consistently maintain that lead and consistently build kind of enterprise value over time and we'll, time will tell what happens there.

duncan: Do you think there, Nick, the kind of the value of building in public has changed in the AI era? Like along those lines though, you need to be just like constantly talking about what you're doing and being really transparent about it so that you can create that flywheel. 

nick: Yeah, I think so. One idea that I was thinking about is this notion that with ai it's so like positive some, and it's so, it's very weird to operate in a substrate where whatever the opportunity market opportunity that's in front of you today, you could just almost underwrite that it'll be at least like two to four x bigger X year just by virtue of these things are gonna be.

More capable. And so no one idea is so precious that you have to be so protective and private about it. 'cause there is, there's a kind of another round to play. And so in that environment, I think there's asymmetric returns to move fast, kind of ship something early, see if it resonates, see if it actually works.

And that's [00:12:00] leading more towards maybe a lot of teams kind of shipping things in public. And it's so much easier to get to, like there is an aha here. It may be more of a niche than maybe a lot of the common use cases get snapped up earlier and earlier, but. Like it's way easier to ship something that is like materially useful for some people now.

And in some ways I feel like that's actually, that's a really helpful way to get the flywheel moving. 

hugo: I, I think Anthropic is a wonderful example of this in terms of building in public dog fooding while also helping the community so much. And I do think in the limit where. It's not clear besides data, what emote can be.

Switching costs are very low for all of us. Right. Actual C community building and goodwill and helping people has returned to be something incredibly important. 

nick: Yeah, and that too. I think it also, and reputationally, I think there's like that. HR report that's philanthropic by far has the highest retention rate of their people.

Mm-hmm. I think, I think like when you really stand for something, you really represent, you have a value system, it's very clear. Maybe you turn some set of people off, I don't [00:13:00] know. But like for the people that you resonate with, it's like very, very, it's, that's another way to create loyalty. And we haven't talked about like the importance of having a talent, but like attracting talent in this space is insanely difficult.

And maybe that's another way that you've created a moat, is actually like in your ability to have this group of people that are real. I think Brian Chesky used that word like missionaries, not mercenaries like that are real missionaries for what you were trying to build. And they really believe in either the mission of the company or the culture or the values of that company.

I think that is manifestly true and very consistent in the behavior of some companies like Anthropic, maybe among others, that I think they are what they are, what they say that they are, and I think that kind of does pay them dividends over time. That's 

duncan: super cool. Let's shift gears a little bit and curious to talk more about like the product and engineering side, and in particular how you thought about the kinda most important and product engineering challenges you had to solve when you're building an AI driven assistant and maybe what's something you learned there that would surprise people?

nick: Yeah. I don't know about surprise and [00:14:00] there's so many lessons, but I think maybe the one that's the kind of the most new to me is. Like this challenge of writing. Software that's still useful in six months when the product might be completely different. Like how do you avoid writing a feature and then throwing it out in three months?

'cause like you, your AI might way more capable than it used to be before. And one thing I think I'm really proud of with Windsurf is over the last 18 months, a lot of our core agent framework is basically been the same. And this is like really going back to way before I think age agentic was even a word that was really used in product development.

I don't even think we had that word in our vocabulary. We were using another word for it. But like the data representations that we had, the way that we orchestrated the loop, the way that we mapped them, again, I don't even think we had tool calling at that time in the API. So the way that we mapped it to like functionality and in addition like the data flywheel, so the eval infrastructure, the training infrastructure, the data processing infrastructure.

Like one, one thing I think is interesting is without realizing it, we were multi-step. RL compatible sort of 18 months ago [00:15:00] when I think only that was only a word that was being explored inside of the frontier AI lab, high AI labs in a very experimental way in some of these AI labs. So I think How did, how do you, that's probably a little bit of luck, maybe a little bit of insight.

How do you do that? I think it, it's like you try to try to keep a very kind of close eye on. This is where I'm trying to go and here are the in variates for where I'm trying to go. So what are the inva of the problem that you're trying to solve? And I think about the. I think the way that I, we used to think about this is what's the work that we're ul ultimately trying to enable the AI to achieve if we were to project like 10, 15, 20 years in the future?

And what will always be true about having to do that work. So for example, like common in varis would be like information in Varis. There's fundamentally no way to com accomplish this task unless you have this kind of source of information and or like output in variance. Ultimately like you, you need to produce work.

And that work is like always gonna take this form in this industry. That's what it's gonna look like. So like you have some understanding of these are things that we can invest over a three year [00:16:00] horizon for. And then almost anything else is you just gotta treat it as if you might have to throw it all away.

And I think that's a kind of delicate balance and like knowing the difference is really hard. And I think many things in retrospect, like we made the wrong call on a lot of things, but in aggregate, can you like consistently string together good decision making around what is going to be there in two years and what's something that you should wait and see how much you should invest in.

I think that turns out to be really challenging and really rewarding if you can get it right. 

hugo: Super interesting. I'm really interested in the spread or the variance of how developers. Use ai. So a couple of years ago, a lot of us were copying and pasting from our IDE to Che GBT or whatever it may be, sometimes taking screenshots.

And of course one of the jokes was it was very good at a lot of APIs wasn't particularly good at its own at A API, and I still know people who do that. Right. Um, then we have order completion, and now of course we have agents embedded in our ideas where they're writing code. We also have. Agents embedded [00:17:00] in CICD, the rise of ambient and background agents.

So there are so many levels to how we can use these amazing things. I'm just wondering how you've seen user behavior evolve from vibe coding to what, for lack of a better term, would now call a agent software engineering. 

nick: Yeah, so I do think basically what we are seeing in real time is a shift from, I think what I would term as co-driving.

Into like ag agentic and so I think co-driving sort of means like I am still sitting like you're doing this, but I'm sitting there watching more or less the files that you open or the thoughts that are going through your head, the edits that you're making. And in real time I might have to do a course correction and then it's slightly less than real time.

Maybe every three minutes I need to course correct you or we prompt you or something like that. That's kinda the state that we're in right now. I think obviously if you compare that to two years ago, it feels like night and day. If before it was literally I am driving and you every once in a while will like.

Auto complete, so you'll nudge me back in the right direction. But I think it's pretty clear now that even for [00:18:00] very challenging work, even for very challenging, sophisticated software engineering work inside of Google, which has an enormous repo with very sophisticated dependencies, you are starting to see a model where you can actually be, you can treat it as an agent.

And for me, the distinction with an agent is. I will assign work to this thing and I'm really not going to watch it, which is a very different paradigm than co-driving because like basically when the agent has done its work, you're paying this cost to even absorb or download the state of where the agent finished and like to even understand.

So it requires a lot more trust to allocate work to the agent because if the agent does something wrong, you basically have to pay a double penalty 'cause you have to understand what it even tried to do and then how to correct what it did, and it's almost not worth the effort. And I think that threshold is, the models are just about reaching the intelligence where that is really going to become the norm, where you can trust them to get it roughly right, more times often than not.

And more importantly, you can also trust them to explain their work and walk through their work and basically lower the cost for you of understanding what they did. And in the [00:19:00] similar way that a pretty good engineer will write a nice, helpful PR description with pointers and screenshots and documenting what they did.

I think with a lot of the products, now you're seeing that kind of self explanation and that helps a lot with the age agentic mode. So that barrier's really coming down. The models are getting smarter. You're really gonna start to see people really just treat these as things that I will fire off and come and check back in on them maybe an hour later, or maybe in, in not too distant future, like days later.

hugo: They're getting so much smarter and I'm very happy to stand corrected. I was less, I was less bullish on the models abilities to do unguided or less supervised agent stuff early this year, particularly in the last six months with. Models such as last month or with models such as Gemini three, and congrats on flash, by the way, as as well, but even the ability of the newer models to do like a series of tool calls correctly and plan the tool calls.

That's something that I'm incredibly impressed and surprised by, and I am wondering why I'm not quite asking [00:20:00] you to predict the future. Having said that, we do work in machine learning, but like a possible trajectory here is. We don't even read a lot of the code soon, right? Like we look at pool requests and maybe a couple of dashboards and agent written descriptions and these types of course with very high risk stuff.

You wanna make sure you have human eyes on that, but I, I'm just wondering your thoughts on what the future evolution could be. 

nick: Yeah, I think pool request is a nice analogy for this, and I think that will increasingly, maybe for better or worse, depending on many engineers, may be dismayed at this. This may be like the primary surface at which you interact with and supervise.

Agentic work for the foreseeable future because there's a kind of a pretty wide spectrum in that space. Again, if you're talking about if I gave a review to a very experienced engineer, I might even not even look that closely at the code. But maybe right now the agents are not quite a little bit less than a very experienced engineer, so I would look more closely.

But they're already in that ballpark. In that kind of spectrum. 

hugo: Yeah. You wouldn't look at it in an editor as well. Right, and that's part of like, [00:21:00] I think we're still like agent and and editor, and I think that the paradigm may shift away from that a lot. 

nick: You are absolutely right. I think that's a big part of why anti-gravity, the product we just shipped it's front and center is this mode where you agent manager mode where you don't see an editor at all.

You only see kind of cell-surface artifacts that the model is choosing to present to you to highlight or summarize the work that it's done. And importantly, I think it's like the steps that I've taken to verify this work. Here's a screen recording of my testing of the UI, or here's a unit test with some log, uh, pointer to some logs that I ran the unit test and confirmed that this thing works.

I think that. Increasingly will be a surface where most of the review happens and you only double click when you have some suspicion that something is going wrong. 

duncan: We talked a little earlier about the data flywheel and the need to gather more customer feedback, but would be interested to maybe tease that out a little bit more from like a strategic perspective, and not only how does it enable you to go faster today, but how you might think about like the strategic need to [00:22:00] build data flywheels into any given AI product.

So that you can develop faster, learn faster, or maybe train your own models or fine tune models like cu Curious about how you've seen that play out and how you think that may play out in the future. 

nick: Yeah, I think the calculus around this is evolving in real time. At the time that we started, there were no startups really credibly able to produce training data with the realism and of kind of environment for the tasks that, uh, I can think we were anticipating doing.

And I think in that world it just. Seems so clear. If nothing else, we can like basically produce a, this massive volume and then license it or commercialize it, which I think some of these kind of, we never did that, but a lot of these IDE companies have basically done, and I think that is changing. And I think for any kind of AI product company, you have to be very crisp.

Around. What do you think is really special about your data? That like a very motivated kind of startup that's like just come out of Y Combinator is gonna have a hard time replicating in a synthetic RL environment [00:23:00] that can be just sold to these labs and have UL ultimately absorbed into these frontier models and basically your advantage is gone.

And I think that's. There's probably very likely speaking something, there's probably something very valuable. An example I think that's very durable is something like Tab where it's like knowing when or when not to show, like a suggestion is so subtle. It's like this very human feedback kind of thing.

Super hard to like synthetically generate this data, I think, and it's the, the real kind of usage data is such a nice. Way to capture this very soft, fuzzy distribution. And I think that probably goes even more extreme. Like I think probably in the future this is a prediction, but in the future you'll see more model personalization in general, like per user model personalization.

And there I think it's basically impossible, right? You do need that data flywheel and that's a another way that. I think application companies that are going direct to their users have that prime position to do that for their, for their CU customers in a way that like a lab company cannot provide that value directly.

But anyways, the long and short of is, I think it's possible, I think it's in fact almost probably certain that any product company, there is [00:24:00] something real to your data. But if you have to. You. You have to be careful about what it is and not be trying to oversell or over promise like what you are differentially able to provide versus synthetic data.

Because you have to remember synthetic data, you can produce it at scale. It's very clean. Like it does exactly what the data you say it does. And that has to be offset in some kind of meaningful, strategic way by the kind of data that you collect. I think 

hugo: so. Something we've been talking around is cost and economics and the economics of building on top of frontier models.

Of course. Per token is incredibly affordable. Right. And anyone, the ability of anyone to not have to train a model now and to ping an API and build a prototype is, is so wild and wonderful. Having said that, as you're scaling a product, costs can blow out incredibly and with a handful of of providers. I'm just wondering how should founders and builders think about build versus buy when it comes to the core, a AI and, and, and thinking about cost.

nick: [00:25:00] Yeah, this is a, a really important question and it's pretty hard 'cause if, if you stack up all of the different companies that are providing value in this kind of value chain, right? The hardware layer, the cloud provider, then the model company that's layered on top of the cloud provider and then you and every company is trying to build a margin, you're gonna be structurally at a disadvantage, um, cost-wise relative to another company that's able to go vertically integrated.

And I think maybe you're seeing some of this with like cloud code competing with. Competing with Cursor and it's not easy. And I think that's why Cursor for example, they have tried to train their own model and it's why, the same reason why at Windsurf we were really focused on can we train our own model that is actually respectable, like quite good at the task that we can provide to our users to provide maximum value and daily driver kind of usage to our users.

In a way that they still feel like they can get value out of our product, and we're not just giving away massive API credits. Or on the flip side, losing losing billions of dollars in API costs net of the subscription revenue that you can generate. So I think. Especially if you're [00:26:00] in a position where you have a large volume of usage, you have a user base that likes using your product, and there is a path, and this is not a trivial thing, but there is a path for you to de develop the expertise, to train your own model.

It always makes sense to be able to do that. At the same time, I've increasingly of the view that it's gonna be tough to believe that. Just being smaller or faster or cheaper is like really gonna satisfy your users the next time a brand new model comes out. That's like twice as smart as the last one.

And so ultimately, I do think these companies are gonna be in a position where they always have to build, they're gonna be trying to build their own models at the same time, they're gonna have to build capabilities as if it's always state-of-the-art capabilities. 'cause otherwise you run the risk of basically.

Missing the boat on kind of the next frontier. Even if you can't serve that with your own models at the right cost, I think you're gonna have to do that because it's just the state-of-the-art models are moving forward so quickly and I think it's only become harder, honestly to to match that performance on a custom, fine tuned model, except unless it's a very niche, very narrow kind [00:27:00] of use case.

hugo: Yeah, that makes a lot of sense. Having said that, of course, that. Performance to cost ratios getting pretty interesting these days. And actually I'll link to a, a post by Thomas from Theory Ventures, who we had on the podcast recently, but he actually had a wonderful post about the cost and performance of Gemini three flash in particular a couple of days ago.

And the cost performance ratio is, it's mind blowing how cheap it is with respect to what you get from it compared to any other model out there at the moment. 

nick: Yeah, it's an insanely good model and not only that, but it is a kind of this mantra of actually sometimes fast is really good and it's just a pleasant model to use, especially if you're trying to do stuff that doesn't require four hours of kind of thinking or, or whatever.

Like almost the classic, the original wind of kind of coding task. I would, I love to use flash for that model. But I'm just thinking ahead to just imagine a world in which you are used to using cloud code with Opus and it's able to spin up Kubernetes deployment for you and do debugging for you and analyze the logs and like figure out what's wrong with it [00:28:00] and fix it for you and redeploy it.

Again, it's pretty hard to compare that feeling to the feeling of, oh, I could do all of those steps and the AI is helping me make, do all those steps twice as fast. I think that is the delta that I think it's. I'm not sure exactly how this will play out in the future and whether that kind of gap will always feel like it's the case between true so state of the art and like economical price performance.

But I do feel like that Delta in model size and commensurate model cost is something that users, you know, 10 times out of 10 they vote with their feet and they like tend to prefer that if they can get it. 

hugo: That makes a lot of sense. But I am interested in a lot of us in our workflows, and I'm sure both of you as well will choose models based around what we're doing.

Also, like when I'm building something. Yep. To plan, I'll use a stronger model with reasoning, that type of stuff. And then when I go into execution mode, I want to just. Write some code, I'll go to a smaller, faster model. Of course there are some interesting agentic systems like AMP for example, where they don't even tell you what they're using now and they'll just do routing based on it.

Of [00:29:00] course, g PT five famously as well started routing, which broke nearly everything I was I was doing at at the time. But I suppose my question for you is, are we seeing products or gonna see products where. We actually use a variety of different models based upon what our needs are at that point in time.

Or are we more thinking one model to just do everything with ease? 

nick: Oh, I think almost certainly, if not already. I, it just, I don't think this is revealing anything secret, but inside of anti-gravity, your main model is probably going to be Gemini Pro, but the browser use model, just because it's trying to do something simpler and you want really fast, kind of multimodal understanding and interaction with a.

That's gonna probably use something like flash. And that's just a good cost benefit between the speed at which you wanna feel like the, it's using the browser and the intelligence that it needs to do the task. So I certainly think that's already in products today. I certainly think that will be come even more the case when now you have a really nice Pareto frontier with almost at every part along that curve.

You have a pretty nice model and [00:30:00] you can have a real justified use case for a lot of these kind of different model sizes. I think that's definitely true. 

hugo: Yeah. And. Also, this is something we've talked around and something we, Duncan and I have discussed a lot on the podcast and otherwise, but it's really challenging how we all have to like, rip out our agent harnesses and, um, all, all, all of these things as new models come out, I, what does, what does this mean for how we produce and develop software these days?

It seems like a total paradigm shift to be honest. 

nick: And by by ripping out your agent harness, you mean? Okay. 'cause this model is like doing something completely different, or what do you mean? Yeah, 

hugo: it's all the tool calls, the way I specified them, the way I tested things. None of this really matters anymore.

And it's totally bloated, ripping them out, having less tools, whatever it may be. 

duncan: And I think oftentimes the harness may make it worse with the new model. Yes. When there's better capabilities, right? Yes. So actually you're handicapping yourself and if you're not aggressive enough by ripping it out, you actually may able to see benefit from the new model.

And then you're [00:31:00] even worse off than you would've be in, in kind of a, a freer world. 

nick: Yeah, so I actually, he's absolutely, this has happened to us so many times as you could probably imagine, where we come to this like slow realization that all this stuff that we have invested so much in, he's actually making the model do worse at its job.

We should just unleash it, unfettered it, and let it flex its wings. So I absolutely believe that's true. At the same time, once you get your head around it, I actually think what it is actually very optimistic because what it allows you to do is. I think you, you tend to dream a little bigger, so you're always trying to say to yourself like, what's still gonna be relevant in six months?

Or what's gonna be relevant in six months that isn't relevant today? So going back to this kind of how we started off at the beginning, if you're preparing yourself for this model to be 50% more capable, a hundred percent more capable in a few months, there's this sort of like bitter lesson like approach to developing software that actually when you flip it around, it's like pretty exciting.

It's pretty cool because you can just swag a lot of stuff. And the real challenge is can you dream big enough? Can you dream big [00:32:00] enough in a realistic way that anticipates a little bit where the puck is going. And then once that happens, you'll feel like so much more magic and you can actually turn out to swag a lot of stuff and really achieve, still achieve really amazing functionality.

And that's something that I've really started to try to encourage even on our research team, just even in our research bets, like the scale of the bet that you can make and the scale of project that you can achieve in like a month is just bigger than it was a year ago. And. Again, it's scary, but at the same time, if you are primarily excited by, or motivated by like kind of the scope of the impact that you can have, which I think is true for a lot of people in our industry, this is just an exciting time.

I think it's just like these things are really just there to allow you to achieve your wildest dreams and then just teach you to dream even wilder next time. 

duncan: Such an exciting time to be building and so fun, right? You can just try so much, so quickly. Many people have thought about this year as like the year of the agent.

But it still feels like we're pretty early in that, and maybe it's more of the decade of the agent. [00:33:00] I think others have recently called out. But what, how do you think about the key differences between this like co-pilot or co-driver as you called out earlier era and like the truer agent era that should be coming.

nick: Yeah, so I do think, I think there's a real shift, and going back to the earlier part of our conversation there, there's a real shift in mindset when you are no longer. There's like a binary change when you are no longer there watching more or less what the model is doing and you're really just like going away completely removing what the model is doing from your context and you have to come back and like settle back in and realize what this thing was was doing.

It's like the difference between chat and email. Your assigning a project to or direct report and you're expecting them to come back maybe in a day or week or whatever. And I think the reason why decade of agent is correct is because once you pass over that threshold. Like the ceiling for that is basically unlimited.

Like basically at that point you're already getting used to the fact that it's gonna come back to you and it's gonna take you a little bit of time to [00:34:00] understand and to gr what they've done and to decide what they need to do next. And whether that is like a day or like a week or a month, it's in, in many ways, like it's still substantively might be the same.

So I think we will have a lot of time to get. We will have to take some time to get used to this new paradigm. New tools will have to be developed. I think new evolutions of all the existing product services we're familiar with, or maybe some brand new product services will come along that will be more kind of agent native.

And then from there, I think you, you will ex, you'll see larger and larger horizons of work with potentially more fan out, both like longer in depth and also more fan out in scope to happen. And up to the point where maybe the analogy is like you, you're maybe a manager managing like two or three direct reports and now you have many layers underneath you and that's scaling.

And I have faith that as humans we will be able to continue to scale our ambitions and our capacity to manage and get value out of like that increasing layers and complexity of, and capability of agents for at least quite a while into the future. 

duncan: As someone actively [00:35:00] like working the bleeding edge, I'm curious, like how do you keep your own.

Ambitions of what is accomplishable today with these tools up to date? Like how do you make sure that you are aggressively enough delegating to those two or three reports or starting to build your own like second line team? Just curious if you have a take on that, Nick. 

nick: That, that's a great question. I ask myself that.

Am I taking advantage of these things enough even today? And I actually think it's gotten to the point where. Part of my decision on the projects that I try to take on or that my team try to take on at work is literally like, part of this is I can use this as a time to test out this idea I have around how I can use these agents in a different way.

It's like part of, partly it's just the information to be gained from trying to implement something in the same way that maybe earlier, earlier in my career, I would take on an infrastructure project just so I could also learn a little bit more about how does Kubernetes work or something like that.

That's like part of the value of taking on that project. And I think it's like actively building [00:36:00] in part of your value is not just like the work that I'm gonna deliver, but like the lessons I'm gonna learn in implementing that and taking that into not just your hacking projects, but like your day job and companies should be empowering their employees to use these tools as much as possible in their day jobs.

I think that's gonna just have to be part of your workflow. And certainly for us, given that we are developing this thing, it's even more the case that it's part of our job description to like proactively push the limits of what these things are capable of. 

hugo: Totally. And something Duncan said is, I think it's obvious from this conversation and a lot of the conversations all of us have, but we rarely state explicitly how much fun the space is at the moment and how exciting it is.

And that also, especially with the velocity, dare I say, acceleration of developments, that that makes it full on as as well, right? Like when I get up in the morning, I'm deciding what to do. To build my business. All of these things as all of us are like sometimes like decision paralysis, it is massive. So I'm just wondering, in such a fun, exciting, fast moving space, [00:37:00] how can people even, how would you encourage people to think about what to focus on and when to experiment?

nick: Maybe I'll answer this question, the famous analogy that Steve Jobs said, which is I'm building a bicycle for the mind. And I think what that kind of implies is you choose your own adventure and you don't have to. You don't have to like, okay, how am I going to feed back into this self-improvement loop and build this or that thing?

I think what we are trying to do is basically build the most capable thing that allows you to stretch your imagination as far as you want it to go or don't want it to go. And I think I've always been amazed and impressed by the people that just follow whi. And interests and how important that is actually as a civilization, that we have that wide diversity of bizarre takes and weird views on, I would never think to use the tool in some of the ways that people would use it, and I think if you're too pressured into being like, I need to do something productive, like I need to make the next spreadsheet up or whatever.

As opposed to just following the kind of where your unique mind and your unique experiences take you. I think that's just, I've. [00:38:00] Try to learn to follow my own impulses because I'm just, I can only, I cannot help what I am. I cannot help what I'm interested in. And when I do that, then the bicycle takes me farther in that direction, allows me to do more and explore more in that direction.

And I try not to worry too much about whether that's going to be something that's gonna be worth $10 billion or a hundred billion dollars like next year. 

hugo: I love that. And I have two responses to that. I'm gonna give the slightly spicy one further. It's not that spicy, but I firstly, I totally agree.

Secondly, I can't imagine you felt. That every day at Windsurf, right? Like you there, there were economic pressures you had that, I'm sure you got up and you're like, we need to do particular things today. Right? 

nick: Yes. I think that's always gonna be true. Yes. There will always be, we gotta do this, we gotta implement this thing.

But actually those tend to be the kind of things that, that's what the copilot is for. It's just like, okay, I gotta whack this thing out or whatever. And that's exactly what it's, it's freeing you up to do, and I will say, and. Maybe this is the thing, if you are in a startup and it is a matter of maybe sort of financial life or death for you and, and you [00:39:00] do feel the pressure to like deliver something, I still think I go back to this is like an amazing industry to be in because the best way to succeed is to really build something that is aligned with value.

And I don't think that's always true. Like I, I don't know. You can look at other industries where it's the only reason this business is valuable and exists is because. Of regulatory capture or because it has, it happens to be able to generate this patent, or it's like some small dislocation in the market.

But with with ai, if you're in this business of developing useful applications out of ai, the market will sniff those things out very quickly. And the only way you can really, you can really generate value here is if you do something that's actually useful and mean something to somebody, like somebody can get value out of it.

And that's maybe. Challenging, but it's also maybe that's rewarding because then it just frees you to just focus on like, how can I do useful things or how can I imagine how the world will be useful next year? And at least for me, that's like very easy to get motivated by. And that's like when I mentioned at the beginning, the practical reality, we couldn't ship a product around this agent coating, but all the time we could be dreaming about like, how do I make that a little bit feel more like the [00:40:00] reality?

And that was always, I think, pleasurable because that's what I wanted to. See when I joined 

hugo: Amazing. And that was, as I said, that my, was, my first response was my slightly more lizard brain finite game. Yeah. Mind my, my infinite Game Mind's response is, I love that you referred to it. In, in, in Steve's jobs, Steve Jobs's, lances a Bicycle for the Mind.

And I'm wondering if we can take it even a step further that. We are building bicycles for the mind, but at the same time, we're building machines that build bicycles for the mind as well. There's this double or triple game where we're not only building thought partners, but we're using them to iterate on themselves and more products in real time and That's amazing.

nick: Yeah. It's almost, it's almost like hard to keep track of. I don't even, I don't know if I can fully imagine what it's gonna look like even a year from now to keep track of all these things and to keep track of all the ways that this is feeding back into itself and helping itself. But it's for sure the case.

I think many companies will tell you like this has accelerated the pace at which they're able to improve and pursue research ideas and [00:41:00] add to these AI labs and these kind of agent labs. And I think that trend will probably only accelerate. 

hugo: Um, awesome. I think that's it from us, Duncan. Was it? Are there any final thoughts or questions here?

This is so much fun, Nick. 

nick: Yeah, thank you 

hugo: for 

nick: this. I really enjoyed it guys. I really appreciate the invite to come on the show and talk to you guys. 

hugo: Such an absolute pleasure. Awesome. Thanks so much for listening to High Signal, brought to you by Delphina. If you enjoyed this episode, don't forget to sign up for our newsletter, follow us on YouTube and share the podcast with your friends and colleagues like and subscribe on YouTube and give us five stars and a review on iTunes and Spotify.

This will help us bring you more of the conversations you love. All the links are in the show notes. We'll catch you next time.