julia-dalton-launchpod
===

[00:00:00] All right, Julia, welcome to the show. Good to have you here. Thank you. I'm happy to be here. I'm looking forward to this one. You're leading product at capacity, but I think one thing that I was told to make everyone aware of is that if people navigate away from this episode too early. You're actually a fairly intimidating person when it comes to, uh, some of your capabilities.

You're, you're, uh, at one point a nationally ranked power lifter strongman. I used to compete in strongman competitions and competed nationally several years. And, you know, as age gets to you, it's a little hard on the body. So I still actively trained, but, uh, the competition bug has waned a little bit in the last couple of years.

Listen to that, folks. Don't abandon the show too early or else is real consequences. This go around. So you're at capacity, you're running product. How did you end up running a company with so many people you've worked with previously on the leadership team? [00:01:00] I've worked in startups throughout almost my entire career and seen them through various exit strategies.

And so the earlier startups that I worked with, it was sort of a formative experience for a lot of us. We were in our early twenties, very eager, very much in that hustle culture, but also. Those formative years where you're navigating adult life, you're making friendships and you're kind of doing so over work.

And so there was a significant number of us who worked at a company called OneSpace, which started out as crowdsource that really sort of enjoyed working together. And we've kept in touch and as we parted ways in each went into different startups. There's five of us now, including myself. I'm just picturing a group of 20 somethings sitting around building a crowdsource company casually lifting cars.

That's probably not an accurate representation. No, uh, no. I mean, at one space we took the work hard, play hard, very seriously. So, no, it actually isn't that far off to assume. [00:02:00] The other thing that was really neat that I wanna talk about in one space actually brings it up perfectly because it started there is everyone right now is talking about managing agents and how do you prompt better and how do you build context and how do you kind of delineate tasks and build agent swarms and all these kinds of things.

But at most, aside from maybe a couple people at some of the research hubs, people have a couple years of experience. At one space, the entire product was based around kind of this crowdsourcing of work. So kind of wanted to dig into this idea of like the human API layer and how it's turned into what is probably a better understanding of agent behavior and, and how to manage agents.

Now, it is kind of an interesting thing when you realize how relevant your experience and how applicable your experience can be, even in a completely new wave of technology. For me, there's almost a direct parallel. So at One Space, which was rebranded from Crowdsource. But, uh, fundamentally when you think about how that company began, it was crowdsource.com.

Our specialty was, we worked with brands to help [00:03:00] them essentially what was called Microtasking back in the day. Mm-hmm. And let's use the example of retailers. Let's use that industry as an example. So we would work with a lot of retailers who not only had their own website, but were trying to deploy their products to Amazon or a Walmart or something like that.

And they have large product catalogs. They need to get ranked. They need visibility, they need good quality product content. And one might think. Agency model, right? You just go and you take somebody and you say, okay, hey, you're responsible for all of this. You're the copywriter. You write all of the copy.

But distributed work was way faster than going through that typical agency route. And what we ended up finding out is, you know, in order to effectively do Microtasking at scale is you might simply think like, okay, somebody does the product images, then somebody writes all of the content. But there's a great deal of nuance and specialization that makes.

The quality bar go higher and higher. And so what [00:04:00] we ended up finding was we need somebody who focuses on and understands very innately how to optimize product titles, right? Mm-hmm. Then we need somebody who understands the purpose and point of those bullets and those product specification bullets and what needs to be included in there.

And then we need somebody who really specializes in the product description content, and then obviously somebody who designs the images or picks the feature images. And so on and so forth. So you can imagine you start to sort of break those things apart and immediately you might be overwhelmed at like, okay, how do I organize all of these tasks?

Because some might feed into the other. Mm-hmm. And so what we ended up building was, we called it workflow chains. We built a workflow platform and what it allowed us to do is. Even within, we called it workflow chains because it was actually workflows that were then also chained together. So even thinking about writing good product content, there's actually a writing step and an editing step [00:05:00] in order to do that.

Mm-hmm. So you think about the output of being something that's like, we want good quality product. Content. Well, within that mini workflow, we've got a writing step and an editing step, and both of those are specialized. So we had to do this at scale. We're talking at any given time, thousands of freelancers that we've trained and qualified to do this work.

And so there's a couple of very key parallels. One instructions matter, your ability to define. What the task is, what the dos and don'ts are for your particular task at hand was absolutely critical. Getting good results back, that is a hard lesson learned. That is why I am laughing, because you cannot leave it up to chance, especially when you are diversifying it out and distributing it out to two thousands of people.

Do you have an example from that? I mean, if you think about simply cost and the lessons learned of iterating, right? So you send out a [00:06:00] batch of 500 product descriptions and your instructions are off, or they have an error, or you weren't clear enough. You've paid for 500 product descriptions, but now you've gotten 'em back and you can't use them.

So you have to redo them. And so again, is another parallel. We actually had people who helped us optimize. Our tasks and help us work out those kinks, which is very similar to how you simulate agents today. Yeah. And you do things like, how do you train an agent? You had orchestration agents before the work agents and because it does make sense, right?

Like if you got back work that met your criteria. It just didn't actually meet the real criteria. But what you wrote was the criteria. How do you look at a task like that or a bigger complex, multi-step flow? And what was the process to kinda break it down into that? You had a good word for it, but like a, a task chain or what was it?

Yeah, workflow chain. Yeah, it's a great question. It's the same kind of approach that one takes for designing conversation flows for [00:07:00] agents or where, you know, there is sort of an upfront design phase, right? And I don't mean. Graphic design. I mean, you're sitting down and you're kind of trying to understand the problem you're trying to solve.

Now, because we specialized in certain industries, we certainly had templates, right? Once we had gotten a really proven workflow chain flow for. Product content for Amazon, because by the way, there's nuanced differences between deploying the Amazon and their requirements versus Walmart versus name insert any other big retailer that exists.

And so there's that upfront design. And then obviously being able to reuse what works and having ability to deploy that quickly for new clients. Part of that is very similar to agent orchestration today. There were routing rules in between those workflow chains, which would say something to the effect of, you know, if the editing step for here is done, route it to.

The product [00:08:00] specification, bullet phase. If the images are done and the description is done, now we can go ahead and release the tasks for those bullet writers because now they have the images that are focused on some of the key benefits and they've got the product content, and then usually the titles would be last.

Right, because you would want to accrue all of that information. Mm-hmm. So LLMs didn't exist then, so we couldn't use LLMs. To surmise. So it was rigid conditional rules, which actually are still applicable in some ways for multi-agent architectures as well. But that logic of what goes to which step when is also very, very important to the end product just as much as the task instructions are, because incomplete information can also lead to really bad results.

It's interesting 'cause when I've gone through and worked with companies and people where the kind of feedback is, oh, AI doesn't really do anything interesting. Uh, I gave this and you know, got bad results out. Typically if you [00:09:00] look at it, it's an ill thought out task or ill-defined outcomes or you weren't specific enough.

I guess tie it back to capacity, what you guys are doing is all about kind of AI agents. At the communication level, there's lots of ways that that can go south with seemingly clear direction. Have you found that this kind of like understanding you're coming in with has helped as you and the product org have really started to push this stuff forward more on the agent side?

Yes. I mean the, the parallels are surprising. I wish I could go back and tell Julia 15 years ago, this is really, really important. You should really appreciate what it is that you're doing right now. 'cause it's gonna serve you very well in the future. There's a couple of things, right? And this is not just specific to agents, this is in every realm of human existence.

What seems clear to you and what you've communicated is oftentimes. Very unclear or not as clear as you thought to the audience or to the recipients. And it is even [00:10:00] more critical when you're asking said agent or said person to execute a task where you're expecting a certain level of quality or a certain level of deliverable.

And so it's not just did I. From my own lens come up with instructions or a prompt or something that I feel is very clear. It's the validation against other people's perspectives or other agents' perspectives to validate and harden that truth. So. It's not enough to just sit there and stew by yourself and work on the prompt, like your ability to get feedback mechanisms, which is why I said back at OneSpace, we actually started to have like a sort of separate workflow that helped us validate our task UIs and our instructions, and getting that input.

That was the exercise. Mm-hmm. It wasn't to execute the task. It was if you were going to execute this task. Your primary task, though, is to give us feedback on the instructions. If there's one sort of takeaway, that same concept of [00:11:00] validating. I mean, we'll get into the data I think later, but your data needs to be right.

But aside from everything else, your instructions and those being clear and validating that clarity from other perspectives is just paramount because you're not the one doing the task, you're just architecting it. Right. It seems like there's kind of two areas where these kinda workflows can fall down.

One is. Just unclear instruction or unclear what great looks like. And the other is messy data or inaccessible data or data that somehow impinges upon, uh, the successful completion of the outcome you're looking for. If you can nail those two things in a really, really complete way that seems like. A huge amount of the battle that we can control.

I don't know about you. We're not making our own models. So I mean, we do actually a little bit in the voice side, but generally speaking, large language models, that is not our core. There are companies that specialize. They're large companies that specialize in that. So. All right, so it kind of came up already, but I want to go back and dig into it just a little bit more at [00:12:00] OneSpace, the concept of you'd actually work with an example version.

Of the workers to go through and work through instructions and now there's the corollary to people using agents to refine their prompts before they actually give the prompts to real agents. How do you build those down? What's the process that you found now where you are kind of using that to maybe recursively.

Build that initial prompt to be better. Yeah, it, I mean, it goes back to the specialization of understanding the task, right? So at one space we had people where we said, Hey, this is your task. You know what the task that is being asked, but your task is to actually evaluate the instructions. You can build agents in the same way, right?

So you can say, Hey, agent, your task is. Not to do said prompt instructions of like, here is the prompt that we're trying to work through. Your job is to actually go through this sort of simulated conversation and simulated exercise of trying to execute said [00:13:00] objectives. So you're a friendly support agent in, you know, you have the ability to access these tools, but your primary responsibility is to answer questions about the company using your website as a resource.

Let's just say that that's your initial prompt for whatever your chat agent is. It's largely insufficient of a prompt, but I'm just spitballing here. Putting a chat bot on your website with very loose instructions can be problematic, right? Because Yeah, to say the least. Yeah. Yeah. And so, you know, what you wanna be able to do is allow agents to sort of talk to agents.

So where you have the agent, who is the evaluator, who is. Able to talk to the agent that you're trying to deploy and run simulations and run evaluations. And then also we also add humans to this. So we have clients, our own internal team going through and doing test runs and testing it. And then we have logging and ways to capture what's happening and evaluating how agents have performed against.

Their [00:14:00] prompt instructions and we kind of are able to score them and identify where there might need to be refinement. So we've kind of got multiple layers where we're still adding people because we want the clients to feel comfortable. Our internal team have their own expertise, but. You also can put an agent and have agents speak to each other, which is always funny to me.

When I've been testing things where I'm actually having conversations with myself and I'm trying to test both sides of the conversation. It's much more effective when you have agents doing it and then having that data collection layer. That's actually sitting there and monitoring the conversations and understanding what's happening, and their primary purpose is to tell you, Hey, this agent's prompt and instruction set said this.

In the conversation, these things happened. It was either really well and it hit it as objective, or it didn't. In these cases, and then obviously you can aggregate that and the team can use that to make adjustments. So really, it's kind of a multifaceted approach for how you can validate it. [00:15:00] But it's so important because even using something as simple as a chatbot on your website, it can be somewhat catastrophic if you do not have good instructions.

Even within my own team, I've kinda run into people still, even if we're trying to create an image or. Texture, almost anything like running through people like, oh, I didn't get great results from this. I'm not really happy with how it came out. And you walk through the process like, what'd you do? Well, I told her what I wanted it to do.

Like, well, there's your problem. You wanna make an image, you gotta go to this platform and tell it what you want. Give it examples. Tell to spit out a prompt for this other platform, but also before it does that say, ask all the questions that you need and push back, you know, on it. And once you've kind of gone through that whole conversation.

Then you can go put it over there and like, you know, maybe like, I've been doing this, just playing around with chat GT's, new image. Mm-hmm. Creative. 'cause that thing's dope by the way. Mm-hmm. That thing's incredible all of a sudden. But you get this really, really refined, super predictive, structured image, but it doesn't quite have exactly what you want.

So you take the result back through, up, [00:16:00] back into Claude and go, I got this. Here's what I want different actually. The back and forth is really what gets it done way, way faster, gets a just exponentially higher output. But to the point, it's funny, I don't try to read a lot of academic papers out of the Frontier Labs just 'cause I'm not qualified.

Most of that goes right, right over. But there was an interesting one that came out talking about one of the best ways, just easy ways to get better output from any kind of prompt you're putting in. To just take the results and re-give it the prompt and go, is this really right? Like, think about it, and basically try to get like recursive prompting.

I mean, our team is heavily instrumented with AI internally, right? In terms of our day to day. Yeah. And if you're using Claude or anything like that, like the check your work, you know, Hey, you know, Claude, you came up with this plan. Can you go back and check your work? Can you go back and validate, is this the right approach?

Can you. You know, sort of starting out as like a skeptic almost. It's that adversarial [00:17:00] sort of testing it is really important because once you lose the trust of the AI experience that you're trying to present, it's really, really difficult to climb back up that hill. Yeah. Well it's the same as like almost any tool in general.

People are very quick If it's something new. To one be like, nah, it doesn't really work. It's not as good. There's usually enough people like that. Mm-hmm. But almost every process I build in Claude now has some level of recursive to it. Mm-hmm. Where it's based, like once you finish, go back and compare it to what we said was important.

Yep. And does it do that? Or like, you know, it's two or three skills, or do this five times and then go through and pick the best one and pick the one that matches best. And then why do the other ones not do that? Why do the other ones do? So prompting and instructions is clearly really important. But the other end, we kind of alluded too quickly is how do you actually have data structure in a way that G, you know, you can tell them what to do in excruciating detail, but if they don't have access to all the context and input that they need to do that, it [00:18:00] doesn't really matter.

So I know that this has been an issue that you've looked at closely and worked heavily on, especially at capacity as of late. How do you kinda set that up for success and what does that actually mean, like having the data structured Well. So I'll use in two parallels. One for like customer facing agents, which is the core competency of what capacity delivers to the market.

We call it our knowledge orchestration layer. That's what's important. Like anybody can hook API endpoints up to an LLM and call it an agent and like, that's great for you. But that's not where like the real power lives. It's like it's powerful and it can be incredibly useful, but it can also be incredibly dangerous.

And so one of the core foundational parts of our product is our ability to go out and get that context layer, whether it lives. In Google Drive or SharePoint or your own company's proprietary repository, whether it's in GitHub or any of these other sort of knowledge sources that are typically where people store their [00:19:00] knowledge capacity, has tooling and has products that are really, really good at going out and getting that information so you don't have to move your context layer from where it is.

We're really good at going out and accessing it from where it is. Then internally, right? When you're instrumenting your team or you're trying to help product management, we face our own challenges. I call it the fire hose of inputs. You know, you get so many requests for products throughout the system and how do you solve that challenge?

And so in both of those cases, whether your data lives in SharePoint or in Salesforce or in your database for your product usage data. AI only amplifies the data, so if your data is wrong, it's going to amplify its wrongness in a major way. If it's good, it's going to deliver and amplify those good results, and so you could have the best instructions on the planet, the best prompt, but if your data is wrong, you're going to get [00:20:00] really, really terrible results.

You know, we talked a bit about like how important it's to sit down and architect the conversation flow and the designs and the, the instructions. But like, you should start way earlier about perfecting or like doing a manicure on your data set. Mm-hmm. So that data work and that validating the data that you're getting and you're using is, you know, we didn't talk about it earlier, but that usually has to start a little bit sooner.

You can build an agent very, very quickly. It takes longer to make sure the data is sound. You mentioned like the fire hose of product, I would assume capacity is no different from any other company where the number of people talking to product and asking product for things, price is exponentially higher than the number of product people you have.

It's like a 10% ratio usually. Yeah. Um, my product team is vastly smaller than the teams who submit the request. Yeah. I mean, if you look at everyone who's submitting, it's, uh, it might not even be 10%, it might be smaller. Mm-hmm. I don't know. Here it is here, it's way smaller than that. Mm-hmm. I love the, you came up with this kind of PRP [00:21:00] project that you built yourself.

Can we talk about that? Because I thought this was super awesome. Yeah. I mean, I built the app myself, but it was a heavy, heavy collaboration with our customer success team. We have some really strong CS team leads. Yeah, and all of it though was with this mutual goal of how does the CS team, how are they able to submit feature requests, bugs on behalf of clients, but also just by the very nature that they're also some of our biggest users.

In a way that is. Structured right, that they're able to triage, right? So if you think about the problem, even within the customer success organization, one individual customer support person might think that this is a high priority item. So when they log it, they log it as high priority. But in reality, we have 50 other things that are of substantially larger impact.

And so one problem we tried to solve is that product can't be the triager of that internal prioritization between cs. Mm-hmm. And so we built a triage [00:22:00] step within our own CS team where our own CS leaders kind of evaluate those requests coming in. But the end goal really was. We have a significant number of requests that are coming in from our customer success team.

We rolled it out to our revenue team last quarter and there's a lot of information that we need to be able to auto triage it, auto classify it, and so we built structured. Intake forms. So what I built in the weekend was the UI layer, right? The thing that presented all of it. But there was definitely work that went into, okay, we need structured inputs.

Clients need to be in Salesforce in this particular way. They need to be in vitally, and data that feeds into vitally needs to be structured this way. And I validated a lot the number of times where I looked at things and I said, is this matching this? We needed our product usage data in there, but once we had the data pieces in there, in the right inputs in Jira.

Which is the tool we use to log those tickets. Then the problem that I was trying to solve is, okay, [00:23:00] if we have this fire hose of, of feedback and now it's feedback that we have signals, validated signals coming from different platforms that we can layer onto the initial requests. So you can tag what client or clients are impacted in and that gives us the ability to go pull additional information.

Now I have a system that can go through. Take those requests and auto classify auto triage as an algorithm that puts the A RR impact, the retention impact on it and sort of weights it. And so it gives my product team the ability to focus on instead of a fire hose. Now we've got a slow. Steady stream of really important things for us to go triage and take through and think through and collaborate on.

And then we've got those mid to low things can get put into a workflow that can get into our AI driven development workflows to go and kind of handle some of those things because they don't necessarily need the same level of care. But what we also found though is that by tying all that information together, is that [00:24:00] now.

We can understand the impact of what we're delivering to. Mm-hmm. So it doesn't just help on the prioritization, it helps for understanding the ROI of the deliverable and the same mechanisms that we use to prioritize what surfaces to the top. It also helps us from a product and then even down into the engineering realm, communicating in them the, the impact that they're making by releasing some of these features or fixing these bugs.

Yeah, it's easy at some level to just collect all the feedback. Mm-hmm. You have a big backlog. What do you do with that backlog? Right. Yeah. And that's been the history of product forever and why you started to see all sorts of frameworks coming out for like how do you apply some kinda logic to this, but there's always going to be a million things to work on and only like 10 things you could do at once.

Yes. And. What I love about this is the ability to kind of a, what do we actually need to work on? Like how do we do that in a smart, structured way and take in all these different people who all think their thing is number one? Mm-hmm. Because I mean, if I'm a customer success person and I see one of my big customers having any complaint at all, that's my number one priority.

'cause I don't want that person to churn. [00:25:00] I'm probably gonna overvalue the chance that they churn, but also that's money in my pocket. But like you said, there's probably 10 priorities that can play into that, and how do we kinda weigh all that against each other and then just see everything. There's also the signal of you have one client.

That has five or 10 feature requests. Mm-hmm. They're five or 10 things that are really impacting their retention versus one client who has maybe one issue. Those are all sort of signals that you're able to evaluate and say, okay, this client has 50 things that they've logged that they've said. They're essentially what changed or unhappy with they're weighting.

Their retention risk is you've got more structured input that you can use to help. As opposed to things kind of getting stuck in anecdotal calls or feelings or gut checks. You know, you've got, you've got hard data that you can look at now. Yeah. The interesting thing we found is we kind of gained kinda capability in our Galileo AI layer.

Is as we're able to kind of watch the sessions and process and understand what's important. One thing [00:26:00] was seeing customer feedback from support tickets and stuff like that, and seeing someone complain, but being able to take that, match that to the actual session that they complained in or brought an issue.

And did that same thing affect a hundred or a thousand other people? Yep. And. Did it have the same negative impact or did like a thousand other people see that issue and they didn't care? Like how big of an issue is this? Maybe beyond what we even can diagnose from just this base level of data. I love it though.

'cause it's taking this way of how do you take these tools and actually do something beyond like, oh cool, we wrote kind of product doc or some kind of like. Specifications doc here, but we actually are systematically moving the entire product and engineering and kind of build side of the company forward faster in a more meaningful, impactful way by being able to use the capabilities of these things are great at mm-hmm.

Parsing data, categorizing, pulling in context, but still using, you know, the instruction layer and stuff like you said, where the humans are gonna be really important. You guys are doing really cool stuff over there. I gotta say Julia. Yeah. Uh, that's exciting. [00:27:00] Uh, I'm, I'm enjoying it. I feel like I learned a lot about this.

This has been a great lesson on how to think through agent instruction, architecture and kind of job chaining and all that kinda stuff that I think if you're building agents doing complex things, these are the things you need to think about. Mm-hmm. And if you're not. You're gonna have a very disappointing product.

Yeah. So thank you so much for coming. This is really, really interesting to learn all this and see people actually kind of putting the structured thought into this. Um, love to catch up, you know, hopefully down the road and see how things continue to going and I think we're only gonna go deeper down this road.

Yes. So hopefully we can stay in touch. Absolutely. I've enjoyed it. Thank you for having me on. Thanks for coming on, Julia. It was great to see you. Have a good rest of your day. Thank you.