The following is a rough transcript which has not been revised by High Signal or the guest. Please check with us before using any quotations from this transcript. Thank you.
===

[00:00:00] You do see this explosion in the rate of experiments, and I think one thing we encountered internally within LinkedIn is as people are able to develop features so much more quickly, they wanna be able to run experiments that much more quickly on our internal AB testing platform. And the gaps on our internal experiment platform have really become a bottleneck.

So actually I just switched to an internal role rebuilding the AB testing platform here at LinkedIn. That was Dawn Woodward distinguished engineer at LinkedIn on how the sheer speed of AI driven feature development is turning traditional AB testing platforms into a primary bottleneck for innovation.

And this isn't a one-off, as we're discovering AI adoption is putting a huge strain on traditional infrastructure to say the least. In this episode of High Signal Duncan and I have the pleasure of being joined by a powerhouse panel of data leaders. Dawn Woodward, Andreas Bki from LATAM Airlines and Jeremy Herman, [00:01:00] CEO, and co-founder of delphina.

We navigate the shifts hitting the worlds of data science, analytics, and bi. As the industry adopts conversational querying AI assisted data science and software development and ai more generally. At scale, we dive into the sobering reality of the source of truth problem, and discuss why semantic ambiguity kills AI utility, and why the role of the analyst is shifting from data wrangler to a verifier of technical outputs.

The group also explores why traditional fundamentals like strict data, cataloging and upstream validation are actually more critical now than they were in the pre AI era. We also dig into the core limitations of LLMs, questioning whether next token predictors can ever achieve true causal inference, or if they're simply destined to imitate the sophisticated workflows of the humans who master them.

If you enjoy these conversations, please leave us a review. Give us five stars, subscribe to the newsletter and share it with your friends. Links are in the show notes. [00:02:00] I'm Hugo b Anderson and welcome to High Signal. I'm so excited to really jump into what's happening in AI and how it impacts the work in analytics and data we've all been doing and interested in since before the generative AI era hit.

And I'd actually love to jump in to talk about the impact we're seeing, uh, AI is having on analytics. And Dawn, I'd really be interested in your thoughts on this, your distinguished engineer, uh, at at LinkedIn. I'm not telling you your, what you do actually, I'm doing this for the benefit of, of others. You were at Uber for over seven years as senior director of applied science and data science for platforms, among other things.

Not only that, you've been an associate professor with tenure Cornell, so you've seen such a wide array of things and I'm wonder. With your experience in data and what you're doing now, what have you seen happening to traditional analytics and BI tools and data warehouses in the AI era now? How are things changing?

Well, we obviously seeing this. [00:03:00] Evolution from traditional tooling to tools that allow for natural language, conversational querying of data sources. So obviously this is a big shift in the industry. On one hand, it's really easy for people to create these sorts of tools. So for example, I vibe coded an app over a couple of days that does open-ended analysis of publicly available data.

You can ask a question like, what's the pattern over the course of the week for taxi trips in New York City? And it'll, it'll give you a nice little plots and give you the code and and so on. So it's incredibly easy to build a tool like this. But then there's a question about what does it take to make it actually be correct, which is a different level of, of nuance.

And so for example, this challenge of identifying what is the source of truth data for a particular query becomes a really important one because you, at [00:04:00] Uber for example, we had initially some four different definitions of member sessions or user sessions. Okay, so which of those four user sessions, then they're all named.

Things like user sessions, writer sessions, blah, blah. And so from an AI perspective, it's difficult to distinguish between what's the correct source of truth data for something, a concept like user sessions. And so it becomes more important to have those sorts of annotations for data internally is one, uh, consequence.

I'd be curious how, what the other folks in the panel think about, like what are the other guardrails that you need in order to make sure that these tools are correct. Yeah, I'd love to hear how you are thinking about this, Andres, particularly as you've been scaling and bringing a data functionality into latam, which has been massive for so long as well.

Yeah, so I think that we have the challenge of taking a hundred year old company that is massive, as you say, it's the largest airline network in Latin America [00:05:00] by by a considerable extent, and trying to tackle the same issues that Don proposes. One of the things that we've realized is that there are some things that change over the, I know next six months, right?

AI is reasonably good at a. In six more months, it will be good at eight. Definitely. Right? Maybe not in the exact shape, but to some extent. But there are some things that are required, right? So we focus it, our efforts on those for really high quality catalog, being very strict with the quality of documentation.

Like I, I think that now more than ever, like big engineering teams, miss not being kind of linked, savvy, or organized with their code. Uh, organized code works really well with ai. The same thing goes to data. And the other thing that we realized is that verifiable outputs are key. So what you end up doing is you prefer to, let's say, delegate to ai, anything that can be easily [00:06:00] verifiable first, right?

So for instance, some of the, of the processes that we've automated are legal processes, right? Where you can actually have some output that's easily verifiable. Of course, code is one of those. But analytics as well. We've been working on generating, uh, high quality analysis for hypothesis development. And you can easily verify if it's actually a sound chain of thought and there is information behind the claims, and then you just put that into the funnel for a bunch of other hypothesis that you can test.

So there is a kind of very broad space to pick these kind of processes and that definitely helps, but there is a big gap in making sure that something is a good output when things are maybe a bit more blurry. Absolutely. And I think testing and verifiability, to your point and how you've extended on Andrea, Andreas are so critical.

Jeremy, I'd love your thoughts on this, particularly as you built and then were head of machine learning platform, Michelangelo, uh, uh, uh, Uber for four years, and [00:07:00] now you are building Delphina. So I'm wondering from your perspective, what type of shift I I is happening. Yeah, I mean it's, you know, obviously revolutionizing software engineering and, and expect the same to happen, you know, kind of top to bottom with analytics and data science.

You know, right now one of the things we wrestle with a lot is how to give feedback and kind of back pressure to the LLMs and, and make sure they, they kind of stay on track in the coding world, uh, you have kind of linters and things to check the code and, and to give it feedback. And in the data world we've been um, you know, finding various forms of, of evals where you have kind of trusted, trusted numbers, um, that you can both offline kind of measure the agent against, but also reference points for live analysis that it can kind of double check and kind of gut check its answers against to make sure that it's, that it's staying on track without a doubt.

And I, the other thing I'm interested in, we're talking about things that were already possible for the most part, and I think the tools we have now, increasingly we can do things a lot, [00:08:00] a lot more quickly. Such as Dawn mentioned she was able to vibe code. This app in, in a couple of days, but it also seems like we're able to do things and dream of things and execute on them that we were never even able to do before.

So I'm wondering if any of you wanna chat about how it's expanding, what is possible also? Yeah, so I'll jump in. One of the things that we've seen is that we can probably rearrange how teams work. So to be more detailed on this data engineering, if you think about the role in, I know the running or the building, you have like different blocks where you produce, right?

And probably it's very difficult for AI to grab all of those, but in some places if you kind of shift the process around, like if you add validation, like strong validation first, then downstream data quality is easy to maintain. And those things weren't really doable before. You would probably get a pipeline and a better, better, uh, set of, of.

[00:09:00] Of composed data, right? Not raw data. And that at some point you would validate the quality of that information. Now you can probably turn things a bit around and make those things much, much easier. So we've improved the, the quality and the speed of our data engineering teams considerably. We've actually, we've shrunk 20% of our data engineering, uh, teams and of course moved them to places where we needed speeding up.

The same thing we've seen with, uh, the analytical, analytical capabilities. So I, I remember, I, I heard this at Uber at some point, like a good data person, if anything like that, has three qualities. One is data wrangling, the other is some level of stats, and the other is, uh, communicational business acumen or, or let's say storytelling with data, right?

Mm-hmm. All of those have increased product and step change. It's difficult to understand to what level different roles are able to do. Independent [00:10:00] things with data, but we've seen an impressive advance in the quality of open-ended question analysis and our capability of, of those teams to actually verify things.

If the data quality is really good, usually get really close ballpark figures. I mean, it's astounding. I've done the research myself and contrasted it with some, some things that, that the AI brings over and it, it's astounding, right? And that wasn't doable before. It would take a long time for you to output or produce something like that.

I, I'm actually concerned that the bottleneck is now further down the line, like we are not going to be able to decide properly what to do with all the things that we figured out. I'm actually curious, as you say that Andreas, there's kind of this opportunity obviously to really democratize data analysis for not only the data functions, but to broader cross-functional stakeholders and, and leaders and executives.

And there's this kind of interesting thing now where [00:11:00] it used to be you would add, go ask a data analyst a question and they would kind of help you formulate that question as something actually that you could like properly answer. With the data now we see kind of leaders having access to gen AI and being able to ask questions themselves and potentially they have to start actually learning how to ask the right questions and become much more data literate.

And so curious for thoughts on like how we're seeing the data literacy of non-technical stakeholders change with the advent of AI and whether that's expediting that literacy and whether they're ready for it. We dawn over to you. That's a great question. I would say there's a lot of eagerness and people jumping in for sure.

I think there's still a a little ways to go in terms of people. Knowing whether what they're getting is right. So that comes back to one of the points that, that I talked about before. I think there's a lot more like people generating [00:12:00] ideas and, and then getting some data about those ideas and sending it my way and saying, Hey, it looks like, you know, and then they jump to a conclusion that then sometimes I have to go back with them and, and do a bit of iteration on, okay, but this is a biased, uh, uh, analysis because of X, Y, and Z.

Let's take a, a closer look. So, so I think there's, there, it, it leads to a lot of eagerness and more engagement for sure. And then when it comes to like basic queries at this point, I think also I'm able to push people to self-serve around a lot of that stuff more than, than we could before. Nice. Yep. So connecting with Don, we've seen the same thing at all time.

I, I surely hope there's more of this. We have KPIs that we, we are a traditional company, so we set KPIs for anything. So we have this team maturity index in terms of data, right? So it, it measures adoption and how, how leverage they are using AI on specific tools that do actually perform an analysis and not just help me review this email, right?[00:13:00] 

Um, and, and we see progress, let's say in a group of people. So there is a long tail and there is a group of people that's probably lagging behind. The thing is that the technology is improving so fast that even those people actually progress over time. But if I had to get my wish right, my, my one single wish.

I think that at some point what data teams or more technical teams should be able to provide to non-technical teams is the tools to kind of narrow down. Their research. So let me give you a very concrete example. Every single week you need to review thousands of signals that the planes emit. When something is kind of off bound doesn't mean that it's dangerous, right?

But you rather look at that, right? If you think about giving assistance to each single person that reviews this, then you get productivity increases, right? Good ones, but it's not noticeable. If you provide the AI with the capabilities of [00:14:00] kind of a broader filtering, right? And you are really thorough with that, and then you get a narrower space for searching for specific things, then the productivity is tenfold.

Like there's an order of magnitude right away. And that's something that was really difficult to achieve before. And I think that AI has opened that space, but we still need to understand how to scale that up because we have some. Interesting cases where it happens, we need to make sure that we understand how to scale it.

I, I love all of this and in particular, the way that AI can help with data literacy across technical and non-technical stakeholders. I think there's an interesting gotcha in that AI can help with these things. Yet there's an intersection with the need for AI literacy as well. So not only is it expanding data literacy, but people need to know the capabilities and gotchas with, with ai.

So I wonder, Jeremy, particularly with respect to what you are working on at Delphia, how, how you think about making sure that people know what AI [00:15:00] is capable of and, and what you shouldn't be doing with it at the moment. Yeah, well it's funny, like a lot of people show up and don't even know what to ask in the beginning.

And so there is some effort to even just have suggested questions. A lot of the same things in chat, bt to help people break the ice and get going. We actually generally find that once they engage and start asking questions. Then the snowball starts rolling and they get excited and, and kind of are able to explore, you know, what's possible.

And, and that kind of feeds on itself. We have, on the Delphia side, we've kind of initially limited it, what it can do to constrain it to kind of trustworthy, you know, space of answers that that, that we, um, that we believe are good, but, but then quickly are kind of expanding that, you know, as the model improves and as our technology improves.

And so, you know, within the space of delphina, the idea is that you can ask questions and trust the answers. But I do think people are also understanding that ai, like people can sometimes make mistakes and, and how to be cognizant of that and recognize that [00:16:00] on the tool side too, not only showing the final answer, but also trying to show the detailed steps that got there, both code level but also kind of narrative level helps folks make sure that what the AI's doing is actually what the intended.

Totally. And I love that you mentioned some of the things that start to change as the models get better. We've had several episodes recently, one with Nick Moy from DeepMind, who actually built the first multi, uh, step agent at Windsurf. And their story is fascinating, particularly with respect to how the product they wanted to build, they just couldn't, and then a new model came out and they could build it that day.

And I'm wondering, uh, Dawn and, and or Andreas just with the new model capabilities, particularly late last year with Opus 4.5 and Gemini three, and perhaps, you know, what's, what's coming soon? How do you think about the democratization of ai, particularly with non-technical people when models are getting better so quickly?

It's definitely hugely empowering when it comes to [00:17:00] building, right? And so, so we used it a lot internally for a recent hack week, for example, and or a ton of AI powered features that people were building, but. At the same time, I think we're still pushing on, on being able those tools, being able to architect things well.

So if you, if you need a very simple app, a very simple feature, right? It can build that. But the more that you iterate and with, with a coding agent, if you use something like Rept for example, it tends to go down into these rabbit holes from a design perspective and do strange things that end up compounding and leading to very weird consequences and buggy experiences and and so on.

So it's incredibly powerful and I think that, I think the capabilities are so great that. It makes you want more. And the more that I want is that ability [00:18:00] to architect something and for it to actually do a good job. And it's just not quite there yet. And so I think that's where I'm trying to push it, right?

So I want, I want better like code reviews at this stage. And maybe it's just I'm not using the right set of tools. I think we're all just trying to pick and choose the right tools for, for, um, drafting the prs, the right tools for reviewing the prs, the right tools for, like planning the code changes in sequence that, that we need to have done.

Um, but I feel like I, I really have to keep an eye on things along the way as it's executing to make sure it doesn't do really stupid things. And I ask, oh, why didn't you just. Design it like that and it says, oh yeah, you're right. That's how I should have done it. I totally agree, and I do think the code review question is still very challenging.

What I've found, and I know other people have is adversarial code reviews, getting different models co It seems like models are more lenient on their own code than on [00:19:00] the code of other models. And I, I think Codex is one that people have found generally very good. For code review on Claude, Claude output, Andreas to, to that point, I'm interested in your thoughts on, I mean, Dawn mentioned something really fundamental and important here that it will just do all of this stuff and stuff that it said it won't do in the previous message, uh uh as well.

So I'm wondering your thoughts on these tools and how you think about them at latam when they'll do things that they say they won't and helping your team and the broader organization use them usefully knowing this. So I think there are different levels of users I think that we would be talking about, like the more state of the art users here inside of latam, which is a big bunch of people, but.

Though that kind of tooling, I think that there was a noticeable change when in, in the approach, when anti-gravity came out, they tried to kind of flip the table. It's like now, now the agent's going to be kind of first, right? And, and you can view the code. But the agents first did it kind of, well, it was a good idea.

I think [00:20:00] that code hap uh, cloud code happened to also release a model that's really good at this. Uh, so they took the lead and now Codex made the same move. I think that yesterday or today they released Codex for desktop. I don't remember, but it's actually kind of the same idea. So it's mainly an agent and then the code, right?

I think that opened the door to a ton of stuff. And yes, we see a lot of garbage produced, but if you think about the less you can now aspire like the moonshot, of course you're going to probably miss. But if you aim to kind of two blocks away, something that would have taken you two weeks, takes you a day.

And we have a ton of those examples that add a really high volume of value. I think that the important part here is to make sure that those pieces connect really quickly to value, to company value, right? And that requires kind of a different thinking aspect of what this technology enables, right? You need to make sure that, uh, business opportunities kind of bubble up the technical differences [00:21:00] between what we have and what we should have, and then use AI to bridge that gap.

Uh, we have examples with, with, we needed to understand how different sets of information would correlate, right? So there are a lot of regulations in the middle, like it's really tricky, but technically you can achieve that through like private setting, intersections, stuff like that. So you need to build that really complex system.

So we have people that understand that pro producing a system that will scale would be really expensive, and AI just tackled this really quickly in a day or two. And you have the experts they can review and it's full of examples that like that one. So. Definitely. I agree. If you aim for the moonshot, you're gonna miss, it's gonna be a, a, an interesting ride, but probably not gonna produce anything valuable.

But there's so many opportunities that seem impossible to like prohibitively expensive to close two years ago that now are a brief, you should focus on those for sure. [00:22:00] Shifting gears a little bit, I think everyone here is passionate about experimentation and measurement and potentially gen AI could really revolutionize experimentation, both in helping to create traditional experiments and even identify hypotheses or explore the data to, um, proposing new experiments on its own and, and furthermore, creating even more variants of experiments.

If the Gen AI can, as a simple example, create marketing copy, that's actually quite compelling. Curious, you know, Dawn, you've been super close to explanation face for a long time. If you can talk a little bit about how you're seeing that that playing field evolve and where we might go. You do see this explosion in the rate of experiments and I think one thing we encountered internally within LinkedIn is as people are able to develop features so much more quickly, they want to be able to run experiments that much more quickly on our internal AB testing platform.

And we [00:23:00] just started hitting some limits around, not so much around can the platform scale, it can absolutely scale, but the bottlenecks that we had before around hitting a lot of statistical bias issues using the internal platform is something that's become prohibitive for us and really starting to block our development.

So, so it, it put the bottleneck in a different place, like being able to develop the features and develop new bottle model variants and so on is happening a lot faster. And because of that, the bottle, the, the. Gaps on our internal experiment platform have really become a bottleneck. So actually I just switched to an internal role rebuilding the AB testing platform here at LinkedIn.

It's, it's a fun journey because it's something that we did not the first time you did this. And, and so it's interesting because it's such a, IT building, an AB testing platform is subtle. It's statistically interesting and it's just surprising that in the [00:24:00] world of gen ai, that's what I needed to go back and do.

Right? But it just turned out that was the bottleneck. Because I think what HAP ends up happening is with all this creativity, right, you still need to have some sort of measurement of what's better, right? So what's better? And you can do lots of AI evals. And so in terms of like how do we measure offline.

That shifted as you have, if you have traditional recommender systems, for example, you might have a UC metrics or, or other kinds of, of offline quality metrics for the ranking system. But in the generative AI world, we have these more conversational interfaces and from much more qualitative. And so it switches to more of an AI evaluation based evaluation of this conversational experience.

So, so the world of offline evaluation changes a lot as the types of models change, but the, this need for an AB testing platform doesn't seem to have changed at all. And in fact it became the bottleneck, [00:25:00] which is the interesting dynamic. That's fascinating. Which kinda makes sense actually, 'cause you think the AI is good at generating lots of options and you need a evaluator to pick the best ones.

And so it makes a lot of sense to invest in the platform then pick the better answers. Yeah. And so are you seeing like more arms in each experiment or is it actually just sheer count of experiments that's, uh, grown or is it both of those things? It's both of those things. And our, this gets into nuances of, of how our experiment platform is built internally, but we made it like difficult to, for example, tear down an experiment and replace it with a new experiment with the same configuration.

And so because of that, people have these like long-lived experiments and they're just adding more and more arms to these experiments and they're, and they're not re randomizing experiments and it's just is very chaotic in terms of, of evaluation. Um, there's a real understanding internally that this is a problem.

So I'm not, I'm not saying people don't understand that, that these [00:26:00] practices are an issue, but, but because of the way we architected our. Our AB testing platform. Um, it just turns out to be very, very painful on the ground. And I think our AI engineers and product engineers and more are spending 15% of their time babysitting experiments, you know, is, and none of this is necessary.

I mean, experiment can be just as easy as, as pressing a button, do a little configuration and, and let it run for a couple of days, and then run an analysis. This does not have to be hard, but, but there's an unfortunate situation where at large tech companies typically develop their own ab testing platform internally.

And so everyone that I've encountered has issues with statistical problems that have occurred because we missed some things when we were building the platform. And at some point you have to go back and fix those things. And so this is the second time I've ended up doing this. It was kind of funny, something I'm hearing in the, it's such early days and yet you, you've been doing such forward thinking work at LinkedIn.

I [00:27:00] wonder, and not necessarily with respect to what you're up to at LinkedIn, but your thoughts on. The near term, medium term future of automating most of these types of things. You know, idea generation all the way through to evaluation and having some sort of, I suppose, evaluator optimizer loop with a human there to, of course help with guardrails and that type of stuff.

But how much of this will be automatable in the next year, do you think? So I, we're not there yet. And again, I'm always the person who just, I want more and we're not there. I feel AI is an amazing thought partner. I can brainstorm with ai. I can give AI very specific instructions about what I want it to implement, and it'll do a halfway decent job.

And as long as I keep an eye on things along the way, it'll provide something useful. But do I think that we're at the point where we can automate the full cycle? I just think that there's a lot of human judgment that still ends up going in to the process. And maybe we could just solve that with. [00:28:00] Pure brute force.

I mean, maybe, maybe if the AI has, if we just let it run enough distinct experiments, then it'll get to a good enough outcome. But at this stage and in the next six to 12 months, I still feel like that human judgment about what we should try next is still quite important. Agreed. But you, I think you could imagine a proactive agent that comes to you daily and asks you the questions, downloads for things it needs to do what it does.

Of course, there's a big challenge with what actually happens in the context, how forgetful they are, and how full on they are in executing on things that you never told it to do as well. So these challenges still remain. Presumably due due to a lot of the post training, uh, as well. But I'm wondering on Andreas, how you think about experimentation more, more generally at latam at the moment as well with respect to ai.

Yeah, so I, I think that we are on the other corner of the street, right? So I wish I had the problem that the bottleneck is the experiments I I see in, in, in my case that probably the case for a lot of companies around the world that are [00:29:00] not in in tech. We have a long way to go until that happens, right?

And if you look at the volume of customers and interactions we have, we are just not taking advantage of the volume of opportunities we have to actually test an hypothesis that we think might change or influence a behavior. And that's a huge volume of data that we are just letting go. And these technologies will help us do that.

We are, uh, implementing this as part of the product development, life life cycle, and we've seen improvement, but it's not fully automated yet. We want to automate most of it. But on the other hand. Uh, I think that some industries are exposed to the other side of the coin, and this one keeps me up at night.

So I really, uh, uh, hope to hear your thoughts on this. But, so what we've seen is that introducing these AI as a tool is also a tool for the society, right? So the customer, the expectation of the customer is going to change, right? They will [00:30:00] probably likely expect some level of intelligent interaction with any kind of product or service at some point.

So that begs the question, like, will be experiments designed for humans or ai, some portion at least, right? Will AI rather they kind of personalize to their, their, uh, their users, their human's needs. I don't know if that's the correct way to say this, and they just ask for your inventory. It's like, uh, let me decide.

Do we experiment it there like, or we just kind of stay away? Is that an engineering problem, a behavior problem? So that keeps me up at night because it might happen sooner than later. It, I'm already surprised every single month of how fast this is going. And yeah, that's gonna probably catch a lot of companies offside and they're not going to be able to figure out quickly what to do.

Now it's time for a quick break. I'm here with Duncan Gilchrist from Delphina, and I just wanna congratulate you on the launch, Duncan. For folks listening and or watching, you've just opened up a [00:31:00] public sandbox. Can you tell us a bit about it? So heading into March Madness. This week we made Delphia available as a free public sandbox.

Anyone can use Dina's data agent to create or analyze their basketball bracket. We've uploaded over a decade of Division one play Byplay game data, and also poly market prediction market data both updated every day. You can get pretty wild with Delphia even doing Monte Carlo simulations and making forecasts with site to learn.

Check it out at Delphia ai slash ncaa. I also saw you just published a case study with Substack. That's awesome. Uh, what can you tell us about it? You know, Substack has been a great partner. One of our business thesis is that there's a tremendous amount of latent demand for analytics within enterprises that most people at a company never even touch their data.

Not because they don't want to, but because the tools are just way too hard. As Substack over a third of the company is now asking Dina [00:32:00] data questions every week they estimate the number of questions business users are asking is up fivefold. It's really an exciting time for data. So something we've been talking directly to, particularly with respect to online experimentation, is measurement and measurement evaluation.

I'm wondering how you all are thinking about and how we all can be thinking about evaluation more generally. For example, you can ask the question like, how do organizations know if AI really works or is effective? And I'm wondering what the variance of the challenge is. So. You mentioned verifiability earlier on, on Andress, and it's kind of somewhat easy to measure on, well-defined, simple problems, harder for longer, longer running workflows.

So how do you think about evaluation of these things more generally? And I'm wondering if Dawn, you'd like to kick it off. I mean, evaluation of, of conversational experiences is, is, I think there's a large literature at this point, so maybe I won't go deep [00:33:00] into that. I think there are a lot of different creative ways to evaluate quality in that space.

One thing that I think is. Still really important is, okay, you have these complex systems that we built. You're talking about a large enterprise, for example, like a LinkedIn or a Microsoft, right? And yeah, uh, we have all of these different components and there are teams building each of these components.

And I feel like you still do need to have some way to know whether each of those components is separately doing its job in a way that when they're combined, they will provide the service that you wanna provide, right? And so you still need to have a definition of, of some kind of KPI, some kind of quality measure for a particular system or output measure, right?

Uh, that measures is this particular component playing its designated role within this larger ecosystem. And I haven't yet found a way to [00:34:00] automate that. I would love to hear from you guys if you have. But there's a bit of an art to doing this. And I think that aspect of, like, we, it's super important to be able to know whether the AI is giving us something meaningful, right?

And, and so when we, as we build these systems, do we, is it doing what we expect it to do? And so we still seem to need this definition of, of, of measurement for each component system offline and online. So, so we are a very traditional company in the regions and KPIs are paramount, right? So everything needs to be measured and validated or, or supported by some level of impact.

So, so it's, it's financed even if it's high risk in that sense. I, I think that a lot of companies are going to be subject to this and they are going to bias us towards measurable outcomes, right? So you will have AI embedded in workflows that have verifiable artifacts. We call them artifacts. I don't know if it's [00:35:00] best way to to call them, but you see that.

Organization to some extent define that they progress or they generate value with technical outputs, be it code, you know, like contracts, analysis, et cetera, right? Some of those are easily verifiable, and if you stitch those together in an end-to-end workflow, then it's monetizable. You can actually measure if it's you improved, you're spending less, et cetera, and you, when you see big numbers, like, uh, times 10, times 20, then you, you actually know that you progress.

When you start moving towards AI embedded experiences like towards customers, then it becomes a bit trickier. I think the good thing is that if you want to modify experiences, at least in the positive side, you have experimentation, and that's inherently a verification, an imperial empirical verification, so, so you have that.

If you're really good at that, then AI is a plus. The problem is on the downside, right? If you [00:36:00] don't want to inject. Uh, uh, lies or, or any kind of hallucination in the whole experience, right? And in, in South America, we have this saying in Chile in particular is like goza, which is kind of, um, an euphemism for when you are, uh, going back in your car, right?

And we, you didn't have this peepee like, uh, owner sound. You just would like really hit, like really cut. And there, you know, like you need to go forward, but you need to be very careful with the speed. You can't go full speed. You need to be very careful, like go like very, very, very slow. And then you'll figure out how far you can actually deploy these things, right?

Internal deployment is a really good tool. We deploy internally a lot of, a lot of the things that we want to, uh, uh, use to interact with customers. Our platform, we have an internal platform to deploy ai. We test 90%, 95% of the things that we have deployed internally. And 5% only to the [00:37:00] customers. We have an app, a voice bot that's completely deployed in this technology, full observability, and we are very careful, right?

We are an airline, so we are also very, very regulated. So we need to be very careful and that's the way we progress really slowly when it comes to customer interactions. I think there's this kind of interesting, there's this really exciting and and tricky part with AI where it's often hard to predict like what is really accomplishable with the ai and it can do so many things that you just like couldn't do 12 months ago or 24 months ago.

And so there is this kind of counterbalancing where you have to aggressively try out new things that might be possible Now knowing that actually the component parts are quite hard to verify maybe in sequence while also then kind of going back when it fails and figuring out like how it failed and how to ladder up to the next thing that you might try.

And I think that because it's, it's kind of counterintuitive, right? I think we all, we all grew up in a life without this new [00:38:00] force that can do all of these magical things that is changing every month or every couple of months. And so I, I do think that like on the one hand you wanna be really careful, which makes a ton of sense.

On the other hand, we also think that, and we found even in our startup that you have to push yourself to imagine what if and then try that thing. 'cause sometimes it actually just works outta the box. And if so, you need to know that now so you can take advantage of it. Can you give us an example? Yeah. I mean, for us it would be more sophisticated.

KPI deep dives actually figuring out what drives an outlier last week in someone's KPIs. And if you do have your data relatively well structured and you have the right context, you can let a long running agent go work at that thing and figure out what actually happened. It sounds super useful. Let's try it.

So Andreas, you mentioned or hinted at. Authentication and all the security concerns you, you have at latam. So I'm wondering from your perspective, what are the security implications of these new technologies and how you think [00:39:00] essentially what's missing in security now and what do we need to do as a broader discipline?

Yeah. Okay. So I'm responsible for this. Let me very careful with how I answer. I came to the right place. So one of the things that probably large companies that hold a lot of valuable data try to achieve is some scheme of permissioning that's scalable, right? For instance, hierarchical permission as I think a, this golden standard from my point of view.

But it breaks when you put AI in the system and you don't let AI inherit permissions, right? And that's a huge problem for a lot of the platforms. What I always tell vendors that come to La Dam, I say like, no, we need to make sure that your service is headless because I want to embed your service in my middleware.

That's the only way I can scale it securely. Because it can represent Andres bookie, right? So if I ask the AI to do something, it will run into a roadblock if I don't have permissions to do some. Whereas if UO has permissions, he can go [00:40:00] forward, right? And when you have tons and tons of data, we generate one terabyte a week, which is probably compared to other companies, small dataset, but it's huge, right?

And that data, there is a big portion of that. That's really important for us to keep it secure and make sure that it's high quality. And not everybody can go into that. Not every process can go in there. So, yeah, so I think that that was one of, one of the big investments we made. That was probably the reason where like we are huge partners with, uh, Google and their products are amazing, but we chose chat d PT as our platform for more democratized use of AI because it relied more on your middleware because they don't have a cloud service, right?

So it was easy for us to integrate this. Constraints into how they operate, right? Whereas Gemini is a bit more tricky now. They are letting go of that and I know that they are allowing us for, for better integration of middleware. Now the ideas probably are going to be a huge place [00:41:00] where you can work on analytics and perform your work locally.

So that is also pretty doable to like throw the, the identity token within the whole process. Yeah. So that's key. I think if any company is thinking about scaling this and they're not tech companies, I probably problem number one, make sure that you have observability and traceability informations throughout the whole network.

I was very careful. Let's talk data quality. So I think data quality has always been a challenge in the old world and all of a sudden in the new world, not only do we have kind of traditional structured data, but we also have potentially an explosion of new kinds of data, new types of data that can be analyzed.

And so curious to explore maybe Dawn with you, like how do you think about data quality in this new world where everything is data and like multimodal is data. Everything can be ingested by these kinds of models. And so is data quality, is it kinda the same problems or is it, is it different now? Like what, what [00:42:00] is that world today?

So you still have these core canonical data sources, right? That part didn't change, but what happened was we incorporated all of these additional data sources that might have a little bit of signal each, but then when you pull them all together, there's something powerful there. I just think that you can do some incredibly creative things with this and, and the types of user experiences that you can provide when you have open-ended kind of.

Access of an, an analytics agent to, to be able to source and attempt to create structure from this unstructured data. So what we're seeing is that generative AI is probably the cookie monster of data G readiness, right? So we are going to, I don't know if quickly, but at some point we're going to go extremely beyond our tabular data, [00:43:00] which we are super happy with, and we have built our systems around that.

So we, we realized this, and one of the problems that we're trying to tackle, uh, with Google is how do we make both worlds can exist really well because I have, uh, hundreds of data engineers that know how to work really well with tabular data. Also data s. I don't have a big talent pool to work on unstructured data.

So how those two worlds coexist is very important for, at least for a diamond, I'm guessing a ton of companies. And we figure out a way that probably is going to lever on the capability of extracting these little kind of pieces of signals. Like you said, Don, like how does this object, let's say, I don't know, conversation, how does it look from this ontology, from this perspective, right?

And that gives us some bits of information, but the same piece of information, the same conversation might hold other pieces of information from others' perspective, right? And [00:44:00] that's kind of the nice capability of LLMs, like where you look at the data, it extracts the correct pieces of signal, hopefully, right?

But that way we can connect both in terms of quality. Um, I think it's the same problem, but times a hundred. I think the documentation and having examples of kind of the range. That data has how a KPI relates to that. The onboarding process, like we discussed at some point with you guys, uh, in delphia is super important for ai.

So yeah, but that's the same thing that we do now, but to much higher expectation. The other one is a new problem that we are trying to figure out what to do. I will say, I think the opportunity to marry these types of data together to, for us to actually hear it seems enormous. An example is being able to marry product usage data with customer call transcripts and understanding kind of both like the words they use with how they are literally using it and what [00:45:00] predicts good outcomes versus bad outcomes.

That's the kind of thing that just wasn't possible I think a couple years ago. And now you can literally see like what it bubbles up and then use that to inform both how you work with that customer and also what you go build next. Totally. And I love Andres that you mentioned the supreme importance of being able to document and everything as well.

And AI could be very good at helping with documentation as long as we're there to spot check it and make sure mm-hmm. That it's good. There's actually a joke or a meme going around that agent skills were developed in order to get people to actually write documentation for humans also, um, which I kind of like.

I am. So we're gonna have to wrap up soon, but I am interested in thinking through the future of, of data science and data roles in, in particular, but looking back as well, so it was nearly 15 years ago that we were told data scientists would be the sexiest job of the 21st century. And I, I think we've perhaps seen data science as being really impactful at one end.

Of [00:46:00] course places, uh, like tech companies where we've seen, uh, data play fundamental roles at the other end. A lot of places haven't, it hasn't had the impact, I think places, places like latam in, in places that have been around for, um, uh, a hundred years or so where we've seen serious data functions established.

But I'm wondering, maybe Dawn, you can speak to this first, but how do you see the role of data scientists, data analysts and engineering engineers developing with the advent of AI now? Uh, that's interesting. Well, we've certainly seen a big change over the last 10 years, for example, in. The type of roles that used to be called data science.

And so we now have the AI engineer who was previously a data scientist or perhaps was a backend engineer previously, and this is migrated into the AI engineering space, right? That you have, you have the economist type of profile, which would've been in the data science category or was in the data science category at, at Uber, for example, 10 years ago.[00:47:00] 

And those folks, just for clarity, are doing things like causal analysis to understand the impact of a new marketing campaign or other types of changes in our, to our system that we may not be able to AB test or they're working on AB testing technologies as we we already talked about. So you have those like two really clear categories that came out of, uh, data science.

And then you still, you have the category of what you might think of as product analysts defining some of the KPIs. Providing the data insights to inform strategy. And so I, I see those like three roles that all came out of what used to be called a data scientist. There are some other subspecialties like operation, like operations, research scientists.

I think companies, logistics companies still have these folks as becoming much more machine learning techniques being incorporated into that, that field. But that, that's what I've seen over the last [00:48:00] 10 years. I, I love that you are speaking to kind of the specialization here because I mean it was even a few years after Jeff Uck and DJ Patel and then Tom Davenport popularized the term that people started saying the title data scientists may go the way of the webmaster and then we'll have front end and backend and that type of stuff.

And I think that's something you're speaking to wonderfully with that example. I also love that you mentioned from economics import techniques to. To data science and of course at platforms, um, such as the ones many of you have worked on. This is essential, but it's a, it's a hill I'm willing to, I'll probably die on, which is the more data scientists know about equilibrium.

And when you do an experiment, you're perturbing from a particular state and it may not actually reflect what you think it reflects that the better off we'll all be as a discipline. And Andreas, I'm wondering what you are seeing particularly in Latin America with respect to the role of data science, data analysts more generally.

Yeah. Uh, so I think that we are a bit late to [00:49:00] respect to the US in the changes. I think that the movement from predicting as the role of the data scientist to influencing is something that probably happened the last five years, whereas in, I remember vividly in the US probably 2015 was like, oh, this is modern data scientists.

Needs to do experiments roughly. Now, I think that the, the door is open to translate some of the, those capabilities to ai, which is a bit tricky because I think that AI is in, in the current state, like generative AI provides probably three like strong forces, right? One, which is the most important one is it's still a machine learning thing, right?

So it still needs some kind of evaluation. It still need kind of the, the focus of a data scientist. Like we need to make sure that we are influencing something. We need to measure it if it's doing like it's job or not, requires some grounding. Those kind of tools need to migrate to generative ai. Sure. As, as soon as possible.

[00:50:00] Probably the systems are not there yet. I guess like observability is probably very expensive. Like I've seen golden sets, like big golden sets, probably a thousand. Rows 10,000 rows, but I haven't seen a million rows in a golden set. It's probably impossible. Um, but then you have the engineering part because it's, and, and we discussed this at some point in another podcast, like it looks like an engineering piece.

So one of the things that this opened the door for the engineers to start doing kind of these hybrid things where they can combine data into their products and they can embed and build better products overall, they can think more of as a product manager from a product manager mindset. But probably what they're lagging is the concept of KPIs.

And like you probably said, I think you said informing strategy. I don't think that engineers are there yet in Latin America, like thinking about like, what's the next step in the prob, why did this happen, et cetera. That's of the table and we need to work on that. And the third one is this kind [00:51:00] of. Now everybody can do a bit of everything, and a bit is a very important word here.

And this come, I go back here to the, the verifiability of the outcomes, right? So, so what you want probably, and what we've seen in latam, right? Not what you want, but what we've seen in latam is that if you manage to align this automated workflows and their outputs, their technical outputs with the people that can verify them quickly, the decision making is really quick.

And that's where you see value. Really higher bandwidth, better decisions, better products. If people, because they can produce an app, they believe that means that they can scale it in a brownfield code base, that's a problem, right? It's not a problem when it's a greenfield application that you can deploy in a very safe environment, probably that's super useful, but you need to align the decision makers with the actual outcome of these automations.

And that's difficult because it requires. A [00:52:00] kind of an organiza organizational challenge, which is really slow in big companies. Probably a talent team is going to, we're gonna have a nice discussion about this podcast two weeks from now. But that's a problem that I see, like, how do we align this? People feel like really interested and happy that they can do code and that's great, but they are not able to verify high quality code in a, a brownfield code base.

They can't. Yeah. So that's the three things I see. One of the fun things we're seeing at Delphia, one angle might be that you might worry that a tool like Delphina might replace scientists at a company because they can do a lot of analysis that they otherwise would do. We're actually seeing in some cases the opposite, where now that you have this kind of platform where it makes it easy to do kind of trustworthy analysis, they're actually able to hire, um, kind of more scientists and actually get way more done.

Uh, the scientists can operate at a, a kinda a higher level, you know, less wrangling of data, less kind of mucking with, with data quality issues, but kind of with an AI power tool, you can actually. Keep more of them busy and [00:53:00] productive, um, in really meaningful ways. That's really fun. I know you are co-host here, Duncan, but I I would honestly love to hear your thoughts on this as well.

Yeah. I think, well, data science is such an interesting term because it's like an amalgamation of all of these roles. I think as kind of Dawn called out and these specialties and it, in a lot of ways, I think that the title might evolve over time, but I really believe that data has been taking over the world and will continue taking over the world for a very long time.

And that being data literate and taking advantage of your data is hard and requires some judgment and that those kind of abilities will continue to be very high leverage. And so the, I'm not actually sure like how the, the kind of, the label of data scientists evolves, but I think the capabilities imbued in, in data scientists will continue to be super important.

And ever more important, as more and more of the world collects data, [00:54:00] organizes data and then tries to leverage it and needs to verify it in some way, needs to be able to sense check, does this experiment actually make sense? Does this optimization actually make sense? Does this forecast actually make sense?

And so I'm very much a believer in the future of data science, although the word may change. I love it and I'm so glad you said ever more relevant. 'cause when you said as as relevant, where my mind went was the pace of the acceleration of models and systems that can help us with this type of thing.

Those skills I think will become increasingly important for data people and for e everyone else as well. I think that's a fantastic note to to end on and a very optimistic note with respect to the importance of data. I'd just love to thank you all, Dawn, Andreas, Jeremy and Duncan, the co-host with the most, not only for your time and expertise, but coming and sharing all of your wisdom as well.

Thanks for such a great chat. Can I jump in for a technical question that I've been thinking about a lot lately? [00:55:00] I'm just genuinely curious to hear how you all think about this. Gen AI is this set of tools that does an amazing job at next token prediction, right? And based on an incredible corpus that captures a lot, a lot of information about the real world, we still seem to need causality and causality.

Causal inference is this set of techniques that traditionally applied by the economics or statistics fields, and I'd love to hear how you think about, so I can ask an AI agent to run a causal analysis. I can absolutely do that, but the models themselves are not causal in nature. And so, so it's a generative model.

It's, it's saying like, it's basically predicting what people in. In social networks think would be a good answer to my question, right? And so if you can prompt the AI in a clever way to tell it that it needs to go [00:56:00] through the sequence of logic required in order to do a causal analysis, which you absolutely can do, and then you can validate at the end of the day.

But you're just asking the AI to imitate a human workflow. The models themselves are not inherently causal. And so I've been thinking a lot about is there a way to make these models inherently causal? What are your thoughts? I'll actually just build on that slightly by saying not only are they not in inherently causal, they're horrible probabilistically, and of course they're probabilistic next to token predictors, but they're horrible.

Uh, reasoning under un uncertainty is my real point. And I'm in a bay, I'm a Bayesian at heart, or I probably should say I'm probably a Bayesian at heart. Or I'm very likely to be a ba at at heart. But I do think having systems which are far better expressing uncertainty will help us with this causal, causal question as well.

And I actually, I'm not sure that these systems are actually a equipped or even the right types of ais to, to do this, but I'm interested in the group's thoughts more generally. Also, I mean, my, my one answer is that [00:57:00] the, is kinda what you said before, which is the eye kinda imitates a human and therefore the way to approach it is to arm it with tools for doing causal analysis and let it do that workflow, which is where my head would go.

I would also say pre-training will be very important here because humans are generally horrible at causal inference as as well. You all aren't of course, but I even in data science and data analysis more, more generally, causal in inference isn't state of the art. There's something interesting there, Hugo, about how the models don't really learn, um, on the fly.

And that's probably a, a major gap in the current ai. And as you said, humans out of the gate usually aren't good at causal inference unless they've learned about it, and it's actually kind of a hard thing to learn. It's one of these, it's like a, a relatively difficult concept to get your, your head around.

And so I I, I do buy that, like building a kind of super genius causal inference. AI is very difficult in with current technology. However, uh, I'll also say [00:58:00] that like, it's been stunning to me how general purpose intelligence has just increased in the AI over the last couple of years. And so two years ago I would've said, it's totally crazy today.

It doesn't seem totally crazy. You could get a pretty good causal inference bot out of, you know, the latest m would make human workflows for causal inference. Correct. There are enough documented workflows for causal inference out there that would work. The, I also saw some really interesting.

Presentation at NIPS about building a transformer model for general purpose causal inference, which I thought was a very creative, it's a model that takes an arbitrary set of covariates, an arbitrary set of confounders and an outcome variable, and attempts to predict the outcome variable based on this.

And they just train it based off of large sets of simulated data with different causal relationships, which is incredibly creative solution to this [00:59:00] problem. And there's one monolithic model for causal inference. That's wonderful. And I do where my mind did go the if I want to. If I wanna build some sort of multimodal search system, for example, over papers that have charts in in them, I'll use a particular model to do image to text on the charts.

That's different to clip, clip based stuff. And I, I, where my mind did go is maybe smaller models or bespoke models for this, this type of thing. Another thing is maybe post-training. I mentioned pre-training, maybe post-training needs to change. Maybe we want less apparent people pleasing and RLHF style systems to do actually proper causal inference.

One other place my mind went is, so when I. So I do use LLMs to do reasoning ba probabilistic reasoning, but what I get it to do is write code in pimc es essentially and give me the output. And then I have templates which frame that in natural language to me as well. So I wonder what type of [01:00:00] workflows and combinations we can build that have LLMs for the generative stuff, but for the actual meat that we really want.

We, we use it to execute scripts and frameworks that we already know. Does that well? Yeah. So I, I have a question. I think I'm gonna interpret the question a bit differently. So in as the end result, what you really want is a system that knows what to say in order to influence you. That's what you want, really, not the analysis itself.

So for instance, what I would want is to describe your past behavior in some way, and then I would ask the system like, what should I tell Don? To influence her in a way for her to, I don't know, like buy this nice family trip to The Bahamas. Right. That's what I want. And I think that with the available tools, this is totally do, you can actually make the, the language model kind of understand what, what kind of triggers an [01:01:00] action in a human.

I think that's doable. It is a deep learning model, so it should convert function. You're trying to bypass inference part, is that correct? Bypass the inference and go straight to the decision making. Yeah. That there's a decision being made off of the causal inference and you wanna do prediction of that decision instead of the Exactly.

Yeah. Yeah. Fair enough. And I suppose that's how, like, certain forms of online experimentation have worked for quite, quite a while now, right? There's the famous essay, oh, what is it? We, we don't need science anymore because we have have enough data. You put enough data through and it tells you, it tells you what to do.

Andreas, you asked the question, how can you get Dawn to buy a trip to The Bahamas? I'm now asking how can I get Andreas to give me a trip to The Bahamas? I need what, what interesting thing here. So if you want me to know that I need actually variability on your data, otherwise I cannot distinguish if actually you are somebody that should do this with.

And that's an important [01:02:00] point because I think that when companies try to understand better, their users need this kind of variability, and if we're going to go like full granular, then what kind of interactions are going to be available for you, right? Because it's, if it's one and zeroes interaction that we are usually used to gather, I don't think that covers the ground.

We need more variability, more differences to the granular level. Uh, but that's a whole new story for another podcast. We'll have to have you all, all back. And I kind of wanna invite Dawn back as a host. Given the wonderful question and generative discussion. Her question just prompted the, I I'll link to the essay I mentioned as well.

I, it was Chris Anderson's the End of Theory. The data deluge makes the scientific method obsolete, and I love that AI is bringing these types of questions. What can we automate and what do we need science and knowledge and, and causal inference for in, in terms of making decisions? Because Andreas's point, we wanna make decisions for some of them.

We wanna have an understanding of why we made those decisions as well based on the data and our, our, our models of the world. [01:03:00] Thank you all once again for such an interesting, thoughtful, and generative conversation. Thank you. Thank you. Good to see you. Awesome. Thank you. You, you all. Thanks so much for listening to High Signal, brought to you by Delphina.

If you enjoyed this episode, don't forget to sign up for our newsletter, follow us on YouTube and share the podcast with your friends and colleagues like and subscribe on YouTube and give us five stars and a review on iTunes and Spotify. This will help us bring you more of the conversations you love.

All the links are in the show notes. We'll catch you next time.