The following is a rough transcript which has not been revised by Vanishing Gradients or Jason Liu. Please check with us before using any quotations from this transcript. Thank you.

===

hugo: [00:00:00] So let's jump in, man. to set the scene, maybe tell us a bit about your background and how you ended up doing what you're doing now.

jason: Cool. So I started basically in the University of Waterloo studying like mathematical physics. So it's very similar to like theoretical physics. And basically by my second year, I'd already taken Andrew Ng's machine learning course in 2014. And at that point I realized like, Oh, wow. Like the data that the experiments generate is very interesting, but the new data is the data that like people are generating on the internet or are being generated by sensors or, and what I realized was that predicting those systems have are now just as interesting as predicting and modeling 

these physical systems.

And so I started going into analyzing social networks, doing computational social science and like epidemiology and got into machine learning that way. and around like 2017, 2018, most of my focus was on things like, very variational autoencoders and GANs and to do things like computer vision and ultimately recommendation systems at a company called Stitch [00:01:00] Fix.

Yeah, I was really dismissive of language models for a very long time. Took some time off in 2020. And by 2022,I ate my words, apologized to my friends, and now I'm gonna go into, prompt engineering full time.

hugo: That's awesome. And so I'm interested in your initial skepticism and then what swayed you with respect to generative AI and LLMs.

jason: Yeah, when the LLMs Came out for me. That was mostly around like GPT 2. This was even like maybe just before BERT. and for the most part, it just felt okay, like it can write me like some rap lyrics, but really the money to be made in these systems are the computer vision system to do detection, right?

Recommendations systems that do search and retrieval and the language models felt very much like toys. But when we saw things like chat GPT come out, that was really the first moment where I said, Oh, wow. Like there, there seems to be not only knowledge, but reasoning in these systems. And when Japanese chat, GP, not that chat DVD, when GPT three was out, it [00:02:00] was still only via the API, it was still a completion end point.

So I really didn't see the magic of what was going on, but when you made it a chat interaction, that's when it really clicked for me.

hugo: So I'm also interested in the chat interaction is one thing, but something we've talked about and something you've been thinking about recently is how chat isn't a way to deliver scalable value. And you've talked about, recently, document generation and path to decision making.

So maybe you can tell us a bit about your take on, on, on what we need to do in this field.

jason: Yeah, I think maybe last year there was a famous blog post that I've now forgotten. And one of the, one of the lines that really stuck out to me was the fact that chat will never feel like driving, right? When you go into these chat experiences in these AI products, it always feels like you are being pulled away from the work that you're doing to enter this chat mode and then being pulled back into the experience, right?

If you think about something like Cursor, Cursor works really well because you can just select code, give instructions and the code is generated. It stops becoming this [00:03:00] chat interaction. But then when you augment your existing chat application with more complex features, you get into the world of rag retrieval, augmented generation, where do you use the search of where you can give AI the ability to search documents and then use that to answer questions.

And so the comment I made yesterday was basically around the fact that even then like answering a question, isn't that. It's not useful per se, right? And a lot of the work, and a lot of the work that companies and organizations do is actually figure out what are the right questions to ask. And once you know those questions, what you're doing isn't, what you're doing isn't, question answering.

It just becomes report generation.And once you get these reports, these are things that are very high leverage. The templates for these reports, the standard operating procedures are things that you can use to make really important decisions. And the value that derives, the decision making, the leverage against the decision making, is much higher than just simply maybe saving like 5, 10 minutes to help me answer a question of something I [00:04:00] forgot.

hugo: Yeah, that makes a lot of sense. So we've talked briefly about rag. You mentioned you want to write more and more prompts. I'm just wondering your, you seem pretty bullish on rag. I'm wondering why your focus is on rag. Currently we hear about, prompt engineering and fine tuning and rag. So maybe you can.

talk about the whole landscape of different techniques and why you're focusing on rag.

jason: Yeah, primarily I think the reason I focus on RAG is primarily because of my experience in recommendation systems. so if you looked at the business model of what SwitchFix was, you'd A customer would send a request to Stitch Fix in plain text. We would have a system take these requests and turn them into search queries, execute the search queries to generate inventory.

We would show that inventory to a human stylist who would then take this information to produce a set of recommendations and a message to send to the customer. This basically looks like rag. Instead of the stylist note, you have an input query. Instead of [00:05:00] clothing inventory, you have text documents.

Instead of a stylist, you have a language model. And instead of a recommendation, you get basically the response in free text. And so what I realized was, okay, like this is the exact infrastructure I had spent five years thinking about. It's like a perfect transition instead of understanding language models from that perspective.

I, I had done things like fine tuning vision models and to me, language models don't feel that much different, right? Just do some hyper parameters. You care about the data. Prompt engineering is actually very interesting primarily because it's around basically being a better delegator and being able to describe your requirements well, because.

problem engineering for things like name entity recognition is very straightforward. There are metrics that you can use to optimize the systems you're developing. But when it comes to things like report generation or generating a summary, it's actually very difficult to figure out what a good summary means.

It might make a lot of sense to just say, as long as the message is shorter, that's fine. If we summarize this [00:06:00] entire transcript and the summary was just two people talking about AI, the summary is valid, the data was condensed, but it's not actually useful for anyone who's actually,trying to Save some time reading this course.

hugo: Totally. we've got a couple of comments and a good question. Tony G has written, I like rag because it makes money. Quoting Jason, probably, which we're, which I like a lot. Matt Miller has a great question. Have you seen any examples where LLMs can generate relatively sophisticated or valuable reports yet?

jason: I would say so. If you think of the examples of I would say there's two examples right now that are pretty valuable. The first one is doing things like diligence calls. So a buddy of mine runs a company called CTO Junior and what they do is they take a bunch of like hours and hours of expert calls that consultants do on behalf of these like private equity firms.

And so now you have maybe 20, 30 hours of transcripts. What you end up needing to do is turn those, that information to a report that the consultant can build, and then send it to the private equity firm. [00:07:00] Once that report is sent over, the decisions you make as a private equity firm is around allocating millions and millions of dollars.

And so there's definitely a lot of value there, right? and this is not something where you charge two dollars to generate a summary for, a 30 minute conversation. These are, this is where, real money is made. And I think a lot of the applications effectively follow this format. Another example would be, doing due diligence in data rooms, right?

A family office might get offloaded, 44 PDFs, but we know for every company what kind of questions we care about, what kind of questions we want to ask, and maybe specific questions we want to ask for specific industries. Again, you can just generate this report ahead of time and then for every data room, just compute the reports and diligence that you want to make decisions on whether or not you want to meet the founders or have a conversation with them afterwards.

hugo: Yeah, cool. We've got another great question from Yongsen. It seems like most startups are doing rag apps these days. Will this pattern stay for two years? I'm going to change that to, will this pen stay for one, two or, and or five years?

jason: Yeah. [00:08:00] question is yeah, will rag stay versus, long context. And then the answer is yes. And the reason is because we will always be making trade offs between compute and latency, right? When you log onto amazon. com, maybe Amazon has like. relatively speaking, like infinite compute, but they still won't compute every single product recommendation ahead of time, where you're going to log into the site.

They're going to figure out, okay, what warehouses are most optimal? What are the products are likely to see and only compute scores for those. And this is just because of the fact that in an e commerce setting, 100 could result in a, 1 percent change in revenue. And so making it faster will always be important for very like high stake, tasks.

All right. So if we have a 10 million context. Model will still want that to be faster in some depth in some ways and rag and compressing that context will be one of the techniques that we can use to do that.

hugo: Yeah. Great. When talking about really creating value for in your consulting work for [00:09:00] organizations, we've talked about how document generation can create a lot more value than just, a chat interface, but it's all in the service of making decisions. And I think a lot of the way data stuff, not just.

AI or LLMs or whatever can really impact, all of us is in helping us make decisions or automating decisions, right? So with that framework, thinking about the helping us make decisions, how do we go about using these systems to create documents? we've talked about templates, which is one thing, but maybe you can drill a bit down, into all of these types of things.

jason: Yeah. If you think about why you read or why you try to learn new things and why do businesses define processes? Ultimately, it's a result of having spent a lot of effort to figure out what are the structured ways of thinking that we can apply to make decisions better. there is such a thing called like SWOT analysis.

There are, values that a company has and the job interviews we give help us grade a candidate on those values. And so being able to do that work up front and having those be [00:10:00] the documents we want to generate just gives us a lot of pre work and figuring out how we should be thinking about the prompts, right?

If we just gave you access to, 15 job interviews, 15 transcripts of a job interview with one candidate, it's hard to figure out what kind of questions we might ask.But in general, we know that we care about things like leadership. We know we care about things like, works well under pressure and all that.

And if we just had the systems in place to extract those questions out, extract those insights out, we could go to a world where after, six, seven interviews, the language model builds up a report that the hiring team can just align against before, jumping into the deep reef.

And the here, we're not really comparing the value of the report to the time it takes to do these interviews. We should be comparing the value to the support to how much, like the salary of the person you're going to hire, right? if you think about recruiting, for example, recruiters ask for, 12, 15 percent of the first year's salary.

if a language model could ask for 1%, that's still a multiple 100, API call for maybe 2 of, GPT 4 [00:11:00] credits.

hugo: Awesome. So before getting into the nuts and bolts of your consulting work, I do want to just drill down a bit more into the similarities between, Rexis and rag. Cause we think of generative AI, of course, rag has generative capabilities, but when you look at it a bit deeper, there are a lot of similarities with Rexis.

So in what ways are they similar and what differences are there?

jason: Yeah, I think, the funniest thing that I described to a lot of people is the fact that like embedding search was my intern project in 2016 because I was too technically unsavvy to figure out how open search worked, right? things like embeddings have existed for a very long time.

Things like full text has existed for a very long time. The only new piece of this system is actually just around answering the question, right? Terms of evaluating these systems, the real metrics we still care about is just whether or not, documents were achieved of the documents that are, should have been retrieved, where they were achieved of the documents that were retrieved, how many of them should have been retrieved.

And what you get back down to is just precision and [00:12:00] recall. And there's other ranking metrics, for example, because not only is it the case that the language model might see these results. Maybe when we have six or seven documents that are being presented to the user, we also want to rank that very well.

Then you get into metrics like NDCG or like a mean reciprocal rank. And again, what you just find out is as you build this system that's a little bit more production ready, in a system that's more realistic and less magical, you ultimately go back down to the metrics we've been using for all time in terms of information retrieval.

And I think a lot of the people nowadays are just reinventing the wheel.

hugo: Yeah, absolutely. So I want to get into your, to your business now. firstly, what type of companies are you seeing you're able to deliver value with building rag systems and how are they useful?

jason: Yeah, I think a very good example would be, Limitless AI. So Rewind was originally the product, Rewind AI was a system that sort of was able to look at everything you've seen, said, or heard. And being able to do search against that meant that not only should we be doing [00:13:00] things embeddings. But we should also be understanding like time ranges or app utilization, right?

and being able to build indices that can query those pieces of information was very valuable for the customer in terms of this report generation side. One of the things I'm thinking of, like. Exploring more for Rewind is actually identifying, for these kinds of meetings, are there different kinds of summarization that are the most impactful?

if we knew it was an interview for a podcast, maybe we want to structure things as question answer pairs. If it was a product design meeting, maybe we really care about action items and key decisions that were made and who made those decisions. It's not always the case that you just want a general summary for a general conversation between two people right in my consulting for example, there are times where when I give recommendations, I would like those recommendations to be turned into memos.

If I was doing a sales call, I would want that to be structured in a way that is aligned with how I do my sales process. I want to be able to identify pain points and objectives. Metrics and [00:14:00] values. And these are things we have to always bake into the system because we know a priority that these are things that are going to be valuable rather than just giving it a transcript and saying, simplify this for the attendees.

hugo: Correct me if I'm wrong, there are also, there were, when we chat about this a while ago, there are some decidedly like non tech surprising industries where you've been able to deliver value as well. So I think construction and maybe sales as well. Is that right?

jason: Yep, exactly. But in the sales process, it's the same thing, right? It's just, for certain calls, if we know they're in a certain part of that pipeline. The ways that we want to extract information out of those calls will be different, right? For things like construction, some documents might have a lot of blueprints and diagrams we're going to, we're going to process that very differently than for example, requests for information or schedules and our ability to do search against those things really rely on the fact that we.

Again, no, what kind of search queries questions people want to answer. If you want to answer questions around deadlines, we need to have indices that can process deadlines. [00:15:00] If you want to have questions around ownership, we need to make sure that metadata is available in the text chunks that we would return to the language model.

hugo: Very cool. So we've got a couple of questions. and it's super interesting. Vijay has asked how, cause we're talking about summarization among many other things. How does one even evaluate how well an LLM summarizes a piece of text? Is that kind of judgment even quantifiable?

jason: Yeah, I think, I think in the literature, people think of summarization as just did the words get shorter or was it abstractive and was it extractive? And I think Eugene Yang has a lot of great writing on that.for these cases that I have really focused on, it's a lot more nuanced, right?

Like to me, summarization includes taking lectures and turning them into flashcards. It includes extraction, action items out of, transcripts, right? And you can build very specific, Evaluations on whether or not for every action item was the speaker assigned correctly. Was it, of all the important notes that we included, did we actually extract all the [00:16:00] correct action items, right?

We have very like localized evaluations. around these specific tasks, rather than evaluating summarization as a whole. And again, that speaks to the fact that we need to specialize when we become much more opinionated with what kind of, outputs we want from these language models.

hugo: So I now want to know like how you actually interact with these businesses. So the way I framed the question was what's your rag consulting playbook, but Ruben Alvarez in the chat has asked it in. What I think is probably an even better way and more specific way. can you ask Jason Hugo about his process in building these systems from understanding the business problem to interacting with the company to knowledge transfer?

So those are some parts of your playbook, but maybe you can walk us through the entire playbook.

jason: because I'm working with these companies that already have a RAG application in place, the playbook looks somewhat like this. One of the first things we want to do is just do some kind of topic clustering, and segmentation of the user queries. if you don't have any cosign distances, if you [00:17:00] don't have any feedback mechanisms of a thumbs up or a thumbs down, we still have all the questions.

We don't know what the questions are, we don't know if the answers are correct, we don't know what should have been retrieved, but we still have questions. So the first step is to run topic modeling against that. And when you run this clustering models, you might be using embeddings, you might be using LDA or BERT topic.

you end up just exploring a bunch of different methodologies, but ultimately what you get down is to two things. One, you're going to identify the topics of questions, and then you're going to identify the capabilities of the question answering system. So a topic might be in a sales context, but Oh, 30 percent of all the questions are around pricing and 20 percent of them around privacy and 2 percent of them are about the weather.

Okay, maybe something weird is going on and how the system is being processed. Then there's capabilities. Capabilities look like, okay, how are we doing time filtering? Is it all relative? Are we expected [00:18:00] to be able to slice up pricing by like,how big the business is, right? Are we being compared?

Are we at being asked to compare prices across industries? Those parts will never be solved by embeddings themselves. And you have to build. Secondary systems that can do these very explicit queries, right? if you have dates, you have to use a date range. If you are doing comparisons, you have to figure out how to split up the search query into multiple documents, multiple search queries, right?

So then you get topics and then you get, capabilities. Once you have that in place, you can have some part of the team figure out for topics that, we can review, do we feel like we have the data to actually answer those questions? And for the capabilities, for any of these capabilities, do we have the search infrastructure in place to answer those questions?

And then that's like the work that you do. And then secondly, you also put in some efforts to do more observability, right? For every question is really valuable to [00:19:00] have,maybe the cosine similarity or the coherent re ranker similarity. Just as some proxy for whether or not, the search results in the text chunks are relevant.

We're still not even at the point where we understand that the answer is correct. But now we have a little bit more data. And then we build out some features around, thumbs up and thumbs down that say, okay, did we answer your question? It's actually comically important to make sure that the copy is valid.

I've seen a couple companies just ask, how did we do? When that is the copy, people might thumbs down because it's too slow, or we might thumbs down because it's too verbose. But it's actually really important to make sure that the copy tries to capture exactly what we're worried about. In this case, user satisfaction, or whether we get the answer.

Once we have that data in place, we can do that same clustering, but now we have a couple of different labels. We have the topic name. We have the frequency of that topic. We have the relevance of the text chunks we pull out, and we have some kind of user [00:20:00] satisfaction metric. And we basically have is you have these three Boolean variables.

So you end up having, let me say, yeah, eight different options and you can just go, okay, if the volume is high and the satisfaction is high, we're doing great. If the volume is high, the relevancy is low. We need to do something about that. If the volume is high and the relevancy is high, but the satisfaction is bad.

Okay. That's something to do with the LLM. And what you basically get is you just get this grid of options. And for each topic, you can basically identify which segment. Of the problem space in my end,if the cluster is low volume, low relevance, low satisfaction, maybe the answer isn't trying to fix it, but just to educate the user and say, we can't do this well, and maybe you shouldn't be able to ask these kinds of questions in the first place,

hugo: What about. When you go in and start an engagement with an organization, the human side of things, is everyone already well equipped to work with you and iterate on [00:21:00] these things immediately? Or do you have to solve the way people think about the problems?

jason: I think, because we start with just, give me the CSV file of questions that you have asked, if you don't have that in place, you're probably in big trouble. we don't really start from a place where we say, okay, give me your like evaluation data set. Cause usually a company doesn't have one.

Usually people will come to me when they say, okay, Hey Jason, like we have a rag problem. I correct them when I say, actually you have a churn problem. That's cool. Address the churn. for the most part, if you start with just the list of questions, I think almost every company is able to meaningfully engage.

It'll just, it just becomes much harder when you start adding things like that feedback mechanism, once you can start adding things at the relevancy, it's And continue to do that analysis over time. Cause just because you changed today, when you make these improvements, you have to then measure whether or not these improvements make sense.

hugo: Yeah. And in that sense, we're not talking about anything different to with LLMs, as opposed to analytics or data, like it's all it's in disguise with like generative [00:22:00] crap coming out of it. we've got another great question from Tony G. what are some properties of orgs that give you signal on them being ready for some type of rag application?

jason: I think if they know that they're losing money, it means that this is there, like they really care about making these systems a lot, but I think a lot of the companies that reach out that are a little bit too early are the ones that just. think they need AI for some kind of like leadership to be happy.

but it's the companies that really want to make sure that they can deploy this system to the customers and have customer satisfaction, be stable. Those are the ones that I think would benefit from like needing more help in terms of whether or not you should be building it again. I think that's just a matter of, whether or not the company itself believes in the efforts.

Got a great question from AJ, which is how much does domain expertise matter in consulting for these companies or is, You being, with your background in ML engineering and all of this stuff. is that enough?

jason: Oh, domain expertise is super important, right? I think for the most part, [00:23:00] when I end up working with these folks, the first question is, okay, who is the domain expert? Can I just ask them a bunch of these questions, right? Like when I actually find these clusters. Without any kind of volume data, I should be able to send these to a domain expert and just get some feedback that says, Oh yeah, this makes a lot of sense.

That is surprising.let me explain why this topic is too general and how, why we need to split things up. the domain expert is. incredibly important, especially once we start getting into a place where we are also generating the question answers. Like for the first half, like the first one or two months of this engagement, we don't even think about the synthesis prompt.

We're just trying to figure out like what regions of the problem space should we be, committing resources to. Then the second step is actually doing this synthetic data generation, where for the data that we have right now, can we generate questions? And can we at least make sure that for data that we generate questions for, that the questions return the data, right?

And there, again, the domain expert is incredibly valuable because [00:24:00] they're the ones that are helping guide whether or not the synthetic data makes sense, and whether or not the evaluations and the clusters that we find are also reasonable. and usually they are, but sometimes they'll just come back to you and say, Hey, like these types of scheduling questions are not the priority of the business because these are things that we have other software for.

And. you think you're going to be clever by building some knowledge graph over like schedules and dates, but really, you should just pop up a model that says like for first scheduling questions, you should, check out this other part of the app.

hugo: Um,I want to go back to what we had, the conversation we planned to have, but there are so many interesting questions in the chat. And there's one that I think is nice to, forgive the pun, zoom in on. and it's asked by someone whose handle on YouTube is chat quest, the NLP three adventure game.

So I really just wanted to say that as well. But chat quest asks, how, for example, could this, zoom on YouTube be qualified and quantified in an executive summary report?

jason: Yeah. So if we [00:25:00] just generated a very general summary, what would come out of that? the result just might be, there are two people, one of them is a consultant. And this call is about,the importance of AI in businesses. But if you knew this was an advisory call, we could do something else.

We could say, okay, what are the pieces of advice that the people in this conversation have given and for each one and generate a short memo or description of what the problem is, and then give me the timestamps of when they give this conversation. If you give that prompt, what might come out of it is a single page where we just go over this runbook, where we go over this idea of, finding these topics.

And,doing this clustering, finding the coherent distances and et cetera. And that's because we know ahead of time that we are looking for recommendations. And we were looking to generate memos from these recommendations. If you do a general summary, maybe the thing it picks up on is my, bio, Which is useful if you want to understand if you're trying to source people to go on other podcasts, but then again, that becomes a different [00:26:00] tool. And so what parts of the transcript you focus on really depend on what your use cases are. And again, that comes with,having to know that a priori.

hugo: I, I love that you mentioned your bio because you don't, you actually don't even have a bio. so that's even better. So what we've promised to talk about how to build terrible AI systems. and of course the goal here is to learn about Through this process of inverted thinking how to build good, strong, robust, low latency AI systems.

and you've got a great blog post, which I'll link to about how to build terrible rag systems, which you call the ingredients for disaster, right? Don't worry about latency. Never show intermediate results. avoid curating your data, forget AB testing. but I'm wondering these among other things, what are like, I don't know, the top three to five things that you would do to build a terrible AI system.

jason: The first one is basically show no evidence, right? If you think about what these ad systems are doing, you're basically trying to do some kind of delegation. And so if you want to be the worst IC [00:27:00] ever, when your manager gives you some request, you should like never acknowledge that the message was received.

If you get stuck, you'd don't ask for help, right? If you know you're going to get a result soon, don't let them know and just message them like the moment you get the result and what you realize that as someone who is a delegator now, I need, I want acknowledgement that you've seen my message.

I want you to give me an estimate of how much, how long it takes. I need you to reason about, when you are stuck, how much of that is going to be something that you ask for help versus how much of it is something you're going to eat that time on. And I also want you to be able to give me a plan, right?

It was like, okay, this is a great idea. I think it's going to take me three days. Here's why I want to do this, and that this other step might take a little bit longer. I'll let you know in two days, if anything is blocking. And then by Friday, like we might be, there might be some results. If that communication happened throughout the process, I feel very at ease.

But if I send a message, and then Friday comes around, and I'm just going Okay, [00:28:00] does the guy see me? is he on vacation? I don't really know. Now I'm angsty. And then they send a result. Even if the result is good, I'm kind of unhappy with the, uh, You know, the engineer is like, Hey, why didn't you let me know?

So I think that's one of the main things is being able to show your steps. And then to add to that, there is a reason why we ask our engineers to write a spec and write requirements and also document things, Not only do I care about the result, sometimes I care about how you got the result.

Sometimes I care about why you made a decision to focus on a certain topic. And if you explain that to me. And it's wrong. I can correct you and I can put systems in place to correct you. Whereas again, if you just give me a number,again, it becomes a situation where I don't know if I can trust that result.

I think that is the biggest one

hugo: Okay, cool. what advice would you have for people getting started building these types of applications?

jason: I would say start very simple. When you use these like other language, like [00:29:00] alarm frameworks to build out these applications, you will get the result much faster. But part of being a good engineer is like suffering the consequences of your own design decisions, right? When you suffer the design decisions of other people, you might not realize that.

And then you don't really know what assumptions you're making. But I think for a rag app, like a lot can be done with it. Like one API call to a vector database, like a for loop and some, like a single prompt that generates results, right? Those systems for a long time were not built in a way that was returning intermediate results.

And so even just building a simple rag gap with a bunch of print statements that says, okay, this is the plan. I've recovered like six documents. Let me generate the question. Let me generate the answer. Let me. synthesize this prompt, just building that at least once will get you pretty far in understanding how these systems should work.

And the second thing too, is try really hard to figure out what an interesting data set looks like. if you use something like Paul Graham's essays, whether you use full text search or vector search [00:30:00] gets you like 97 percent precision and recall, right? it's easy either way.

Whereas if you do something like, searching GitHub issues. That ends up being very hard and you have to be very creative to figure out how you can build a system that works Well, and you know this like github issue sense and so thinking hard about having a very specific data set will also teach you a lot about how do you Manage the idiosyncrasies of that data.

hugo: I love that you mentioned the example of, Paul Graham's essays. but everyone listening, if you're getting started and you build a rag app, processes, Paul Graham's essays, don't tweet about it because your inbox will be filled with VCs trying to throw cash at you. I think, Jason, I love that as a way to get started.

I am interested. There's so much more to be done though from.from implementing all types of observability and logging, then evaluations, and it's an incredibly complex space. So how would you advise people, like, how to take baby steps in incrementally [00:31:00] making their, applications more robust and building better systems?

jason: Yeah, it's hard to say because at this point i've been doing it so long I just have an intuition for the most part again, it's if you end up trying to deploy these systems in production where You The quality of the results is something that customers depend on. you will ultimately learn this the hard way.

And I don't think there's anything wrong with that, per se. Unless, unless you're gonna do something, answering questions about, health records or, legal, litigation and whatnot. I think learning the hard way is a very reasonable way to get started. this might be also a good time to plug the survey, but I'm also thinking about building a language model course for building RAG systems.

And so I think some point, like in this video, we'll,push a link to the type form. and the goal really is to help you figure out, okay, like what are these issues that you're running into? What are your concerns? And then the goal is to basically build a curriculum that can help you address those things specifically.

And whether that's ingestion, whether that's like latency and intermediate results, whether or not that's grounding or [00:32:00] search. Yeah, I think one of my goals is to have kind of The cohort figure out what they're interested in learning.

hugo: exactly. So I'm actually going to share the link to the type form now. And I'll share that for those listening to the podcast and not watching the live stream, I'll share that in the show notes as well. someone did ask about your course in the chat as well. So that's great. Great timing. maybe you could tell us, a bit about, your ideas behind this course, why you're doing it.

and what you hope to achieve from it.

jason: So the course wouldn't really necessarily be something that a beginner should really take. I think that's something that can be left to maybe like a series of blog posts or maybe like a simpler course, but my goal really is to productize and distribute some of the knowledge I've developed from just consulting these larger companies.

I think the playbook that I've implemented is something that is like fairly effective and works very well in. Help you understand, again, that run book of understand the queries, do the instrumentation, do this clustering. [00:33:00] And then once you run this, suite of diagnostics, giving you the tools to reason about and how to make recommendations.

So I think the course would be, especially for someone who is on a team that is building a rag app, they already have user data and they want to figure out what are the next steps to go from one to two, rather than going from zero to one.

hugo: so a bit earlier, I think we mentioned briefly pedantic and instructor. You have a wonderful talk that I'll link to. I'm not going to do it now. Cause it'd take me too long to find it. called pedantic is all you need. I am, I don't necessarily want to get into, I don't have a conversation about tools, what I want to have a conversation.

about though, is what type of problems these tools solve and why they exist in

jason: It's funny you say that because for me, the library instructor, like from the talk, pedagogy is all you need really just talks around the, about the fact that. All I want the language model to have is the ability [00:34:00] to return more structured data, right? Whether that data is done with like constraint sampling, whether that's going through OpenAI or Yeah.

Thropic or some local model. Like I don't really care. I don't really care about the waste and the logist or anything like that. But I care about is the fact that instead of getting a string in a string out, I can have structured data go in and started to go out. And once I get to find this like magical pipe operator, you can basically build very simple, like Unix like functions.

that are backed by language models. And what this means is when you return a Pydanic object, what you're returning is a data structure. So if you have a plan, you can return a DAG, right? If you have, conditional logic, you can define this in a data structure and have the language model output this data structure, whether it's a DAG, whether it's an AST.

Whether it is a graph of action items and assignees and status codes, by building the schemas that you care about in the application that you build, you can have a [00:35:00] language model just naturally transform any form of data to those schemas. And once you get those schemas, you get, you're a very happy IDE, a very happy type checker, you're able to write to databases very safely because, again, you know the types that are coming out of these language models, and you're able to program with a little bit more Creativity and a little bit more reliability at the same time.

hugo: very cool. And for those who are new to the space, I'm wondering if we could step back a bit, and even think about. You know how function calling can help people at the start and then the happy path to using tools like pedantic and instructor.

jason: Yeah. So I think one of the reasons people really stood, one of the reasons why function calling stood out was it was the first time there was a sort of LLM approved way of generating structured outputs.And so the examples that you get in the beginning is okay, here's a language model.

Sometimes you can call like the get weather function, and sometimes you can call the,search [00:36:00] Google function. And what this allows you to do is not only basically provide a set of tools that a language model can use, but the language model can then reason about which tool to use and then what arguments to put into there.

But what you actually find is in practice, you don't really need the function calling part. Really, what you just need is structured output. Because once you define that structure again, once you define this data structure, you get a Python object, you can define methods on that Python object, you can define procedures and functions that act on this object.

and again, you've got, you get brought back into just very plain and simple programming that everyone already is familiar with, right? And you can build these components that everyone already is familiar with, is safe, and, throws error messages when you expect them to, and,works across these, large systems.

hugo: Awesome. Matt Miller has a really interesting question. have you found it valuable to ask the LLM if it's answer is correct once it returns it? he's found this helpful with chat GPT. I do want to mention another pattern that I've experimented with, particularly with [00:37:00] type things. getting and it's, I suppose you could frame it as an agentic workflow or something getting, having two LLMs and getting one to write the code and then one to examine the code and having some sort of not quite adversarial flow there, but a communication back and forth.

So how do you think about this for validation, evaluation and verification?

jason: So one of the benefits of Pydantic is that they already have very powerful validation built in place. And so when we're using another language model to verify something, the fact that it's another language model is an implementation detail of the validator.When I enter a password in and it doesn't have a social character, I throw a validation error.

In that same sense, I can build a validation error that checks if the password is matching, if it has uppercase letters or if it is, offensive language. And the way that you build these things, if you search the,LLM validator in the library docs, there are examples where the validator could be python functions or the validations can be prompts.

And behind the scenes, instructor [00:38:00] doesn't really care what the implementation is, right? Whether you generate Python and you send it to Ruff, or you send it to PyWrite, or a language model, these are implementation details. These are not the nature of the program that you're building. Yeah.

hugo: We haven't used the term AI engineer a lot, but essentially this is a new role, which is something where we're talking around, right? First, I'm wondering your thoughts on the, so we saw data analysts, data scientists, data engineers, all revolving around each other. Of course, the joke is that, you know, I've worked in startup land for 10 years.

So the first. Data science high ends up being a data engineer for 18 to 24 months. then machine learning engineers. Now we see the rise of the AI engineers. I'm wondering your thoughts even on this role of AI engineering, if it makes sense to you or it's useful

jason: it definitely makes sense. But yeah, I agree that having lived through the like big data engineering world and the data science world and the engineering world, [00:39:00] it sounds to me like the same kind of title, like big data engineer, data scientist, and air engineer, you're just analyzing the data really, but the data used to generate, be generated from the customer themselves.

And now that data is being generated by the language model. So you're still just In my mind, the engineer effectively is the data scientist. I think the nuance here is that a lot of, full stack developers are now able to be a lot more like machine learning adjacent, but I don't think it really has taken anything away from the machine learning engineer roles.

Or the data science roles that we have today. But I do recognize that I think data science is disappearing as a profession. I

hugo: as soon asall the big schools. Rebranded their statistics and math departments to data science departments as well. what advice do you have for people who are becoming AI engineers? And then maybe we can think through medium level and then sophisticated, AI engineers.

jason: think a lot of it is just the skill that everyone needs. Whether you're an agent [00:40:00] engineer or a software engineer in this new world is just Two things, one, the ability to be quantitative and have the technical skills to be quantitative. And what that means is, can you write like SQL queries? Can you run a clustering job?

Are you able to like make visualizations? like I think one of the biggest surprises when working with the engineering team is it's just the way they present data is terrible. All right. And so there is just the baseline quantitative skills of, can you work with data, can you present data?

And can you find the stories? That is something that you can learn very easily by just reading the pandas course or learning a little more about SQL. I think the harder thing and the thing that will require the most critical thinking is around building the data sense. I think there's a lot of stories of like, okay, this is, you know, if you are like a veteran, fire captain for a firefighter.

It's they just know when, like when a fire is going to explode or when the house is going to collapse, because they just have enough of the [00:41:00] sensory experience of evaluating the risk of these, of what's going on. I think as you become more familiar as a data scientist or AI engineer, The skill that's going to be the hardest to develop, but the most valuable is that like bullshit detector, that Oh shit detector, where you can understand a system and figure out whether something is wrong.

Like I think a really simple example is if you train a model and it's 99 percent accurate. Chances are like it's not that you were great, but something what was wrong, right? And just that's a really easy one that you learn like day one But then things get a lot more complicated then it becomes things like okay.

Am I training on the valuation set? Does it matter based on what the performance is? Okay. It doesn't mean does it matter to make these like performance? maintenance complexity and latency trade offs in deploying these systems, right? Those end up just being the things that you have to just learn to think critically over a long period of time to really develop.

But those are the skills that kind of take you very far in your career.

hugo: So [00:42:00] our mutual friend, sorry, this chat is just, hilarious as well. You're going to have to check it out, later. firstly, I'm sorry. he said, he says, ask Jason about his advice to stop coding. And I really want to hear your thoughts on why people should stop coding and what they should be.

And this is after a certain level, right? It's not right. As little code as possible, develop other skills. What's he talking about? What are we talking about?

jason: So I stopped coding. And so my advice isn't necessarily stop coding, but recognize that coding isn't everything that there is. my story was that in 2021, I had a relatively career ending hand injury that sort of forced me out of like my profession. And so it took almost two years off before I came back into doing machine learning.

But when I came back, what I realized was, okay, like I really cannot be joining like a new company right now and be working like 60 hours a week coding every day. And I have to really figure out what the other leverage points [00:43:00] I can apply. And maybe I was one tenth of the coder I was then.

But what I realized was, with the skill I already had, by developing other skills, like just understanding like marketing, sales, distribution, figuring out what value means, how do you communicate to executives, that was like a thousand X higher leverage than just being like a 10 percent better coder for where I was at that time.

And the way I describe it now is, if you are someone who is like weak and you go to the climbing gym or jiu jitsu gym, you have to really learn technique right away. But if you're a very strong engineer, you're the person who's coming into these sports and just being like very naturally gifted.

But as a result, you basically lack technique, right? Like it is actually very valuable to be a technical person, but you become even more valuable if you just learn to write just a little bit, right? It's every L six is an exceptional engineer and every L seven is an exceptional communicator.

hugo: Amazing.

jason: And if you go to [00:44:00] LA, the gap between L seven and L a is never going to be your technical abilities, right? It's going to be, it's going to be a little bit smaller. It's about like, how do you get people to buy it? And how do you convince whole teams and whole organizations to,shift like their focus in a direction again, like that is a leverage point that has is very valuable and something I think, you need to commit to developing if you're interested in getting that level.

And usually that level is not. Getting it, but becoming a better programmer.

hugo: so

jason: Are you just losing it?

hugo: I'm fucking losing it, man. so Hamill among other people keeps talking about how much RIS you have. dude, so Hamill's saying it's for real. So having RIS is hard to develop. Hamill says, start developing RIS is pretty much your basic message. And then he says, No, just be King Riz. Somebody said, so be an S. A. And Hamill's no, just be King Riz. S. A. is like baby Riz. Okay. So let's step back. Let's move away from solutions architect. Go back to, master is you've told us that you have a degree in mathematical physics. [00:45:00] I'm sorry, but you clearly have a degree in mathematical risks as well. so what, no, but this is for real, man, like your presence online, to be honest, You're a total baller in a lot of ways.

jason: And I don't throw that term around lightly. twirling my hair.

hugo: you've gained a huge following online in the space of a couple of years by providing a lot of value

jason: Oh, months. I have been on Twitter for less than a year, sir.

hugo: So can you just take us through yeah. how do you think about the role of charisma in your work and your own entrepreneurship?

jason: Yeah. this is a cheat mostly because, I went to art school before physics and I consciously went to art school with the realization When I was in middle school, I wanted basically to get the Nobel Prize in physics, right? And my conclusion was, if I want to do that, I should not go to a technical high school.

I should go to art school instead and just develop other sides of my brain because, if I went to a technical high school, maybe I'd skip like one course credit. But if I go to art school, I'm interested. And one of the [00:46:00] biggest things that art school really did for me was give me the ability to do public speaking.

Like I remember in like grade six current events, they would be giving some talk and I would just be holding the paper shaking. Like I was an, I am like incredibly introverted. I am like incredibly anxious all the time and I'm incredibly shy. But in our school, what they did was they basically made you do a 10 minute presentation anytime you made a piece of art and you would always have to document the work that you did, how you did it.

And then you just give it a 10 minute talk. And you just give this 10 minute talk like 40 times in, in four years. And what you realize is for me, at least confidence is just knowing that I did the work. And once you get to that point, everything else becomes very easy because you are consistently doing the work right with Twitter.

It was just a matter of, okay, I have 400 followers. I get about 1. 6 followers per tweet. I want 10, 000 followers. I should make 10, 000 tweets. Let me just get that [00:47:00] out there. A lot of what I do with my writing and a lot of how I write my tweets is informed by just watching like Mr. Beast videos and like thinking of like how he thinks about editing and like drawing hooks and just reading books that do that and so a lot of this Riz effectively is just learn through books like I'm sorry Hamel but it's just the way it's what I accept to be how I got here and just.

Just putting in the work. Like at this point, I've just talked for a long time. I never like, earlier this year, I was like, I'm going to do a podcast every other week to get more comfortable with talking on podcasts. And then you do six, then you're good. here's something that people might not believe, but the Pydantic is all you need talk.

That got 150, 000 views on YouTube.

hugo: more

jason: was my first public speaking engagement.

hugo: It's also hilarious that you didn't realize it was going to be a keynote.

jason: Oh yeah. Oh yeah. That's crazy. everyone else was doing like a VC backed like chat, keynote where there's Oh, like this is going to be like a 15 billion problem. This is why we raised the money. We're happy to [00:48:00] announce it. And I was like, man, if someone made all my type hints return dictionary, I would be super mad and this is why we need something better, but again, like at that point, like that was the first public speaking I had ever done.

That's not for a school project. At the age of 29 and I just rehearsed that talk like 40 times. It's an 18 minute talk you know how that's just how long it took right? But like in terms of risk, like I was like shaking I didn't eat food until I didn't eat anything until I gave that talk and then I just ate a pizza and took a nap a lot of it is just practice

hugo: Totally. So I actually, I want to come back, to AI systems, but I want to. I want to go through a personal journey with you as well. Cause we've been talking about like how you spend time to create value for organizations, for yourself, for your audience. We've also talked about how like coding more and more may not provide a lot more marginal value there.

This is [00:49:00] what we're dancing around. Something incredibly personal that happened to you with respect to your hands as well. so maybe you can just talk us through this. This journey of yours

jason: Yeah, you know being the immigrant like coming to the u. s I came to the u. s like 400 right and it's just like borrowing money to pay rent I had always valued myself as someone who was able to work really hard and think really hard and think for a very long time and build interesting things.

And when I had the hand injury, I was like, Oh, if you have this farm animal and I can't pull like the cart. You like eat the animal or something like, I don't know. what do you do with that? you don't just let this thing hang out for a long time. It was like a mouth to feed.

And that was like how I got up on myself for the first, like six months of unbeatable, unable to work. I was like, Oh man, like my identity is in working hard. My confidence comes from working hard and now

hugo: and you couldn't type it at all pretty much.

jason: I couldn't put my shoes on. Yeah. I like, yeah. Like socks was a lot of work because you have to like, [00:50:00] put your thumbs in the sock hole.

So I couldn't put pants on, I couldn't, hold a knife, I couldn't use chopsticks. I remember eating Korean barbecue and like the meat at the end of the chopstick was like too heavy for me to pick up So i'd ask for a fork and i'm like dude like my life is over Like i'm never like how will I make money?

all I gotta do now is like move to some random city and save my money and live like the most boring life possibleand that whole journey ended up just being around understanding obviously why that's flawed, but then figuring out a systematic way of getting out of that, right?

And just acknowledging to myself, Hey, you know what, maybe I'm just enough. Maybe what I have is enough. And if I do get any more, it's great. If I don't have any more, that's also fine. And like the line I remember writing to myself and thinking about over and over again was just this idea that Hey, the greatest gift you can give yourself is the gift of being enough.

And then when you have that feeling, everything else you do is just because you deserve it. Rather than going like approaching your whole life is I [00:51:00] want these things because I don't have it. I think, in like the entrepreneurial world, it's if I don't have the private jet and someone else does, because I wouldn't work hard enough or something, right?

these, I think are really toxic thoughts that happen in Silicon Valley and very if you hang out in like high network circles. but I'm like, very glad that I was able to let go of that relatively early in my life and in my career.

hugo: so how did that result in kind of your change of thoughts about how to actually deliver value?

jason: Again, it's just the leverage, right? Like I still, I still want to work hard. Like I still want to make a ton of money, but now it's coming from this place of if I can, if I am coding 10 times less, what I am building has to be like a hundred times more leverage.if I'm going to write any code, it should be open source and like instructor was that, right?

I wrote this line of code. It was like 600 lines of code. And now it's being used like a quarter million times a month.And the grid, that's leverage. People were like, wow, Jason, like how much time do you spend on this library every week? And I'm like four [00:52:00] hours. That's all I can muster. But I have to make sure that the abstraction is right.

And the documentation is good.

hugo: Yeah,

jason: And when you do that, you realize that, Oh, most of the effort in building this library is going to be around documentation and SEO. And I have to nail that. That is more important than having like more code in my library. That's gonna result in like more downloads and like more people benefiting from the systems I build.

Same with the course. It's okay, I learned these skills through consulting one or two companies but this knowledge needs to be distributed across more people.Every video that I post on Twitter, if it has 100, 000 views, I will have to write a blog post about it. And then that distribution and that leverage come from the writing and the audience that I build rather than just being like smart in a cave somewhere.

And I think, 25 year old me would have been super happy with being smart in a cave. But,now I just can't, I just can't do that. And I've accepted that. And I have to figure out again, what are the leverage points that I can play with?

hugo: absolutely. Jeremy Howard always talks, says that [00:53:00] he's considered himself super lazy. And I think there's a joke in that because he's an incredibly hard worker, but I think what he means is he builds software, in order so that future him doesn't have to do so much as well. and it's creating leverage.

The term leverage, this is something you and I've talked about before, actually, about having levers. And the term leverage is about having levers. So instead of building muscle, we want to build levers. We want to build winches. We want to oil the levers. and I suppose, quote unquote, scale ourselves in those ways as well, right?

jason: Exactly, but it's like you still need to be able to exert force like it is still very important to you know In this physical analogy like still be physically strong And then once you have that strength Your job is not to pull the same lever a thousand times is to also go figure out like what's the longest lever that you?

Can find right and you know the path to Solving one kind of problem might not be the way that you solve all problems But again, the skill is now to identify the levers because you've already developed the skill of being like very hard working You gotta [00:54:00] let that go at some point in your career Like when you're 21, I think it's very much the case that you should just develop the hard work skill But by the time you're like 35 If you think success is working 80 hours a week You're gonna like you're gonna be that guy like who has those regrets, you know in their like 50s and 60s That's almost guaranteed right like

hugo: So coming back to AI systems, there's so much complexity, right? how do we think about finding as much simplicity as possible in all the noise?

jason: Yeah, I mean there's two things one is a me thing Which is like I just don't read what's going on in the AI world like I just find that so much of it is noise and even the quality of research has been very low like You've read a for loop in a prompt and it becomes like a paper like if you like the chain of thought stuff I think is very important But in terms of, like, how easy it is to, find new things, it's almost, a little too easy.

And [00:55:00] I'm gonna let the market decide what is the good research. And I'll read that when the time is right. Personally, though, what I try to do is, I optimize for writing code that is easy to delete, right? The reason is because these systems are changing very quickly, new things are being discovered all the time, and your ability to adapt to these new, improved systems is effectively around how you can refactor these systems.

Again, it's one of those things where because you do it for a long time, you really have a sense of what that means. And you have a sense of what it means to write code. It's very easy to delete. I really haven't been able to figure out how to capture that, more concretely and that's something I'm working on, but ultimately I think the simplicity comes from the ability to delete code, like if you wanted to delete some system and it requires.

400 lines across 17 files, you know that you mess up somewhere in that process, right? And being able to figure out where that is and developing that taste will allow you to have the simplicity that lets you, never [00:56:00] have to fall behind because new things like new innovations are coming or, you build this agentic system, but GPT 5 can just do it in one shot.

Like you don't want to be in that situation. And I think a lot of companies end up doing that because they tend to write code a little bit too quickly. And, write code that they think will live forever.

hugo: And how about thinking, the tooling landscape is just so complex and growing so quickly. So how do you even think about adopting new tools or building new tools? 

jason: there, I just adopt no new tools. Because I think the instructor ethos has just been, like, it's just types. there's no such thing as an instructor library. if you use instructor, what you really use is Pydantic and OpenAI. You would never say something was an instructor app. In the same way you would never say something is like a requests app, right?

hugo: And I think it's that simplicity that makes things very attractive for a lot of folks. In terms of building tools themselves, I think I get a lot of inspiration from the Tidyverse, I don't know if people are [00:57:00] familiar with like dplyr and GG plot two.

jason: exactly. Like there, what they do is they build very complex verbs and nouns and they give the user the ability to compose these verbs and nouns to write and generate code,

hugo: It's also, what you're also saying, it's also Unix like with the pipe operator, so it's like the Unix like philosophy is we just need more of that everywhere.

jason: I think so. and I think just, again, it's just this like new, this new generation, really haven't developed that taste for that Unix like pattern, but there's a reason they've aged so well.

hugo: And I suppose in the Python landscape, in terms of these types of, I suppose we can think of them as like human centric, human friendly APIs. We've got Requests is all time, right? And spacey in the NLP space and Keras and scikit learn these, and speaking to documentation as well, part of the massive adoption of these technologies is not only the human centric APIs, but the incredible documentation and a lot of the practice documentation, driven [00:58:00] development as well.

jason: Yeah. I think that's exactly what all the leverage is, right? Like I say, instructor is 12, a 12 Python functions and 200 markdown files. And, that's like where I found the good ratio is you really want six times more markdown than Python cut in your doc, in your library.

Otherwise it's probably gonna be very undocumented and good, not very difficult to use, but you really only learn that when you're like a lot, much, a lot older and you realize, Oh, the goal of this library is to help people do things, not to convince them that I'm smart.And you end up building more things because you want to feel smart.

But once you let go of that,you just write terrible code that works for everybody.

hugo: Yeah. I apprecIate that. I'm interested in what the biggest challenges you think we're facing just as a community, trying to build these things and talk about these things and help others build these things.

jason: It's hard only because I've only really been focused on my own community right now and a lot of it has just been around like, hey, let's [00:59:00] go just keep everything super simple, focus on documentation, and recognize that it's the responsibility of the user of this code to figure out how to write code. I think a lot of the systems that we're seeing right now are, almost remind me of pandas where, I would work with analysts that knew how to write pandas code, but I didn't know what a for loop was. And it's not to like their discredit or anything. They're just like an economist. They just, this is just, they just know how to write.

if you gave them DF, they can do everything else, but. They can't really realize that you could actually write a for loop over a list of strings that access the columns to make a plot or whatever. and I think writing tools that are simple and modular is basically saying, hey, I trust you to know how to write code. I'm not going to, I'm not going to abstract a for loop away from you. I'm not going to abstract, saving data away from you. And so you're responsible for basically figuring out serialization. But I'm going to give you a really good primitives that allow you to do what you need. And so you don't [01:00:00] need to come to me and say, Hey, can you like build the, like Pydantic to YAML adapter?

no, there's libraries out there that can do that. You go install that library, right? Oh, can you do this? Which can you do that? It's no, like you should figure out what to do, how to do a for loop. I don't need, you don't need me to build you opinionated ways to do logging. that's, those are decisions you have to make.

And taking the developer serious, I think is around just Respecting the fact that they can do that work and the libraries that run into a lot of problems For example are the ones that do too much in the beginning and when the serious developers come they really oh They don't like this library does not take me very seriously at all.

Like I can't change the prompts it does not support async. It has no parallelization. These are the systems I care about in production settings and you can't even Let me log anything. And, yeah, I like taking this developer more seriously and assuming that they can write code. I think it was something that has gone backwards a little bit.

hugo: Yeah. And the relation between those affordances and those abilities to write and [01:01:00] read and understand code, and also to be able to do the same with data in a lot of ways. we came up in a time, we haven't talked about this, my, my backgrounds in, in, in math as well, in pure math, but then I, I worked in mathematical physics in biology and cell biology in biophysics and systems biology for a handful of years.

and then I entered the machine learning space and also from Andrew ings Coursera course. man, that guy's a baller and I am. The reason this is relevant is we came up in a time where, it was a very model centric approach with a common task framework, with Kaggle, which was awesome in a lot of ways, but it did, put a focus on, On a lot of things which were important, but it made us forget about a bunch of other things behind us.

So that's why several years ago when Andrew Ng among other people, of course, not just him, but he helped to popularize this idea of data centric AI as opposed to model centric AI. And I think even I need to. Find this, but even started a, like a Kaggle like competition where the model is [01:02:00] held and you do a bunch of work with the models frozen, and then you work with the data, which is cool.

So we all started then to start thinking in a data centric fashion, then this generative AI stuff started happening and then it became like models all the way down. Again, yet we're generating more data than we've ever had before as well. So it's, I suppose this is barely a question, but, the real question is everywhere we look it's model focused, but it always takes us back to the data.

So how do you, how do we get our heads around thinking about models, but in the end, trying to work with the data and how do we learn to focus on it again, instead of talking about like the newest model?

jason: Yeah. I think that's already been the case, right? Because now switching between two different models is very easy. And so I think a lot of what I talk about with the companies I work with is just going Hey, you're asking me questions on should we do this method or that method?

You're asking me questions on should we lowercase the strings? Like [01:03:00] what these are called in the old world or experiments, because we did the work of defining these evals, we can spend 35 minutes, 45 minutes debating whether or not we think an experiment will get a result, but if the experiment takes five minutes, just run the damn experiment, right?

and I think a lot of the advice I've been giving has just been around like, Engineers think that, you need 99 percent accuracy before you deploy a system. That's wrong. The second thing they think is, a couple of examples is enough to get a good result. That's also wrong, right? But they also think maybe that they need to have this four hour, sorry, four hour evaluation suite to get the result, right?

There's going to be trade offs on how quickly you can iterate, how much data you can process and how like significant the results are going to be before you can make a decision on what is the right experiment. And learning to navigate that, I think will be what will happen in the next generation, right?

Like these language models are definitely slow. maybe the eval is [01:04:00] costing 50 bucks. Maybe they take 20 minutes, right? But 20 minutes is still relatively very little time compared to like how much time I wasted on a standup debating whether or not to run an experiment. That is like a three line code change.

And we just cost 40, like the meeting costs more than 40.And yeah, like in terms of this data world, I think a lot of it is around man, like we should be doing like tons and tons of micro experiments.Because these micro experiments are just, are like character level changes in a system rather than retraining whole models on GPUs.

And it's actually much easier, even if it costs like the API credit, it's still much easier than it was before. And we should be running way more experiments and the metric we should be caring about isn't the metric of the model, but I think just the velocity and the volume of experiments that we can run on these tiny systems. 

hugo: That makes sense. And also when thinking about all the data to your points earlier, I do think the function calling pedantic instructor, all of these [01:05:00] things allow us to focus on the data a lot more. And even you've written about the need to, be able to like. Do time filters and this type of stuff when working with,with structured output of LLMs, right?

jason: Yeah, exactly. And not even just like time, it was like a funny example that I ran into last year was. We found a whole cluster of search queries were severely underperforming, and we identified the search cluster as ones that were querying for FY24, but it was November 2023. And we're like, Oh, what's going on?

I was like, Oh, in this industry, FY24, Is in the year 23 because of like how things are shifted. And so because we cut out, normalize the year, the financial year to this document search, we could not retrieve these documents correctly, but then you for loop that says okay, if the industry is mining, the year starts in November and all of a sudden, all of a sudden the quality of the search results come back up and you're like, okay, again, this is around like [01:06:00] that data intuition of just looking at data, looking at clusters and going, huh, what is going on here?

Why is there like FY 24 documents, but in November, let me go talk to somebody. Let me figure that out. I did the spelunking and now I have a solution and the experiment is very simple. You just rerun every query in that cluster and see if the results got better. I got cosine distance.

hugo: Totally. We've got, Hamill's dropping a truth bomb in the chat. I'm interested in your thoughts on his, statement. You learn to look at data by actually trying to make an AI system work.

jason: Yeah. no comment.

hugo: Yeah, absolutely. We've, AJ has a great question. Is there a trade off between doing many experiments versus doing a few good ones with, good engineering code?

jason: I think these experiments don't necessarily have to run on production systems. It really depends on what kind of results and what kind of recommendations you have. Like at Facebook, the experiments you run are almost never in, production systems. And so one [01:07:00] of the coaches I had there would basically say, Hey, Jason, I will never give a shit about how you ran the experiment.

The code does not need to be good because it does not need to be rerun. And so again, like in this, don't let perfect be the enemy of good. if you can be, if you can run these experiments outside of loop and you know that these are things that take time. I'm going to just give you information on what kind of direction to take that's like the beauty of Python is that like you can serve spaghetti, right?

And again, the goal of decision making, the goal is to figure out how to allocate the resources I have in implementing some of these things in production. And if I could do 10 experiments in one day, and one of them is a like,is good and to none of them fail, that's when I would go and improve the production system and deploy something.

hugo: Nice.

jason: If the velocity really matters and the cycle time really matters.

hugo: Also, speaking of, looking at your data, introspecting into your data, that, that type of stuff I've been having, I'm going to paste a link to Simon Willison's CLI [01:08:00] utility, LLM in, in the chat and in, in the show notes, also speaking to a Unix like philosophy, this is a really fun way to play with a lot of different LLMs from the command line.

and it logs all the conversations to a local SQL light database, but then you can immediately play around with a dataset, which is something else Simon made,

which is. Which is really, it's a lot of fun, man. So definitely recommend. yeah. And Eugene has just written in the chat, hands down rate of iteration trumps most other factors.

jason: Yeah. Oh, all the goats are in the chat. I can't even see it.

hugo: has written data set is so much fun, especially with the plugins. Absolutely. And the plugins that the community have built for LLM, the client utility that Simon made, it's just great. And well, Simon's amazing as well. Like a new model will drop. And then three minutes later, he's got a blog post with a plugin for LLMT and you're like, what?

so that's, that, that's super cool. Eugene

jason: And again, that speaks to velocity.

hugo: exactly. Eugene did drop in the chat earlier. You mentioned we've got a lot of goats [01:09:00] here. Eugene dropped. Jason is the goat for what that's worth. So that's yeah, exactly.

jason: I remember starting Twitter in 2023, looking at like, how most Twitter and Eugene Twitter go oh, like those are like some smart dudes, right? And it's to me, it's wild that like a year later and we're like, we're in group chats. you know,it's,it still blows my mind.

hugo: Absolutely. So I want to.

You know, we're 

jason: posting some crazy things again. It's

hugo: constantly, where do you think the field is going in the next one, five, and even 10 years?

jason: hard to say in a way that like, isn't

I think, I basically think that like history will repeat itself. I think exactly what happened to the data science profession will happen in the engineering, right? In the beginning, the data scientists can be writing spark jobs. They can be training models and they can be doing analytics and dashboards.

And I think the AI world is going to get there for a while. And [01:10:00] then we're going to see specialization. Like now there's basically like the data scientists are data scientists. They're not data engineers and they're not machine learning engineers, right? like I really much went down the path of 70 percent machine learning, 20 percent analytics, and 10 percent data engineering.

I think that will be the same for the engineering role. I think we're going to probably specialize into building evaluation sets. People who are training language models and those who are doing prompt engineering and it just makes sense to me that these are going to be like Separate specialties, because I do think, rag is going to be here for a very long time.

I don't think search will ever go away. Right.I don't think prompt engineering is going to go away becausethe easier it is to prompt, the more complex we're going to try to prompt things. The more context we're going to put in, the more we're going to, we're going to want to put in afterwards.

It's not the case that when things get simpler, we don't need them anymore. It's usually the case that when things get cheaper, the demand increases. It's like the milliamp hours of the iPhone battery has [01:11:00] been increasing every goddamn year. But the battery life is the same. Because we just keep pushing more and more complexity in the apps that we built.

hugo: well.

jason: yeah, and so I think just in terms of prompting and rag and all that things, I think these things are gonna be effectively here forever.

hugo: talked a bunch about prompting and rag, not a lot about fine tuning, and we're both involved in a wonderful course, which is around fine tuning among other things at the moment. But when your consulting work is fine tuning important, when would you encourage people to fine tune and whatnot?

jason: I think actually a lot of the great instructor applications in production are focused around fine tuning. And this is because usually once we have enough validation logic in there, we will identify maybe three or 4 percent of API calls that fail the validation. A simple example is we write sales emails and not only do we write sales emails, we attach URLs and documents and reference them in the email for,for [01:12:00] an attachment.

It's really simple. Oh, let's see, Or URLs have UUIDs, and when we actually paste the URL in, about 2 percent of URLs don't exist. And what,half of that is because the UUID is incorrect, the other half is because the URL, just simply does not exist. And so we have validators that make sure, and by, by re asking, That every URL returns at 200 status by using like a post request.

And we verify that all the UUIDs are ones that come from the context that we give the language model. So day one, we have 2 percent errors. We add the validations and then we set, we ask, we do re asking and now we get to a hundred percent accuracy. We run this for a couple of weeks and then we have thousands of requests.

And then we can turn that request data into a fine tuning data set. And what this allows us to do is to go from turbo, four turbo to 3. 5 with zero, hallucinations of URLs. And now your system is just three times faster and just [01:13:00] basically correct every time. And those are the cases where I've seen, fine tuning to be very effective in the instructor context.

Same with report generation, right? We will do, we will generate a report. That's a very, that's very opinionated, pedantic object. We have validation that say Hey, like every pain point should be real. Yeah. If any of these pain points aren't real, throw them out. And then again, we find like a 2 percent re ask logic.

And then we just say, okay, given the final object, fine team.and these again have just been like incredibly effective in production settings.

hugo: So John Biz has written, my data set is too small for that. Hamill has replied, no data set is too small. Use an LLM to expand it. something he's getting out there is among other things, synthetic data generation as well. I know you've worked on this a lot, thought about it a lot, written about it a lot.

So maybe you can tell us a bit about that.

jason: Yeah. I can almost just give the analogy of the computer vision synthetic generation first. And maybe this was like, it makes it a lot easier to think about, and then I'll extend this to language models. So [01:14:00] in the computer vision world. you basically always needed a ton of data. And if you didn't have enough vision data, what you would do is you would do small perturbations to the image because you wanted some like robust properties.

you wanted them all to have some robustness. And so maybe you have a picture of a cat. Maybe I hit that cat and I rotate 20 degrees and I skew it up and I zoom in. And what you end up doing is for every image you have, you can generate like 20 different perturbations, like flip on the axis, black and white, rotate it upside down, whatever kind of like scale and variance you want.

So from that same analogy, it is obvious or it holds that like synthetic data is going to work very well. The interesting thing with synthetic data is because these parameter counts are so high. You already need fewer data points to fine tune these models. So instead of having a hundred thousand images and needing to generate a million images, you could have 50 examples and only have to generate 200 high quality examples.

And I think that's where you're like, you will get a lot [01:15:00] of knowledge. I have not done like anything that's beyond like very task specific fine tunes. I definitely imagine that data quality, like it is always because that data quality matters. And so even if it goes from 50 examples to a thousand examples and you sub that's you subset that down to 300, that's still probably enough to fine tune a 3.

5.

hugo: How about, we're talking about moving from vendor based API models, models that you ping with an API, downgrading, fine tuning to get same performance, that's cheaper. about switching to fine tuned open source models?

jason: I think there, you have to separate yourself from like the curiosity. Versus whether or not you want to use something like, any scale or together, or you want to do your own inference because for the most part, I'll say Hey, I'll take your money. Like I'll put this up on modal, but if we're finding coding errors, like there's no way you as a business, it will be profiting in the near [01:16:00] term.

If you do this inference yourself. So I think obviously 7 billion parameter model can be fine tuned and run well. But for most businesses, the talent you need to build that is usually going to be much more expensive than using a fine tune 3. 5. So we really have to figure out what are the motivations for wanting to do something open source, but maybe like also owning that inference, It's usually never going to be the cheaper option in the near term. obviously if you're processing millions of documents, it will be cheaper in the long run. But again, I think for a lot of startups, I think people ask me about fine tuning. And usually my answer is just like, if you're worried about what price, increase your prices and focus on the value aspect of things, rather than trying to save like a dollar per user per month, because you're going to eat your, it's going to be painful, like either way.

Yeah. Yeah. So

hugo: So we're nearly at time.I was on vacation recently in Germany and devouring your blog posts [01:17:00] and like podcasts and like a lot of stuff. to inform this conversation, but also cause it's super interesting stuff. and there were two books that you mentioned in a blog post by, Byung Chul Han, Psychopolitics and the Burnout Society.

and I think these are super relevant when thinking about the way we approach work, the way we approach entrepreneurship, the way we approach consulting, contracting. one takeaway I had was that in this current age of like individual, very metric driven, measurable entrepreneurship, we've actually, and this is getting a bit, I suppose, Well, significantly philosophical in long ways, but we've actually, instead of being in a society like industrial societies where, power is driven from above down, we've actually internalized systems of power and as entrepreneurs, we may think we have freedom, but we're actually, in this strange zone where we've internalized, like people forcing, exerting power on others and [01:18:00] we exert it upon ourselves in a deeply metric driven way.

And. We can, we never forget the chase now because everything's measurable and on, on devices as well. So I'm wondering how, firstly, what made you write about these books and, how does this metric driven internalization of power structures, resonate with you?

jason: I basically read these books. After my friend Ava had recommended them to me after my hand injury. and I think that the word that, Psychopolitics really burned into me, was like auto exploitation, right? Is that, in this society, because you are told that you, Own your own fate.

It also means that everything you don't have is your fault and you will always push yourselves to try to do better. Like the reason I didn't raise the money was because of the thing internal to me. Like I need to be better, right? If I don't have this, it's because like I didn't want it enough.

And I think that's sadly, I think it's true. 

hugo: It is. 

jason: personally believe that and the way I've gone [01:19:00] around it is just acknowledging that I just don't want it right. Like I definitely believe I could be in the Olympics, but I don't think I'm willing to eat the concussion and the broken arms and like missing all my friends and family and, pushing myself to a point where I'm like, fest out and like injured and whatnot.

And I'm gonna choose not to have these things and I will be happy because I chose not to have them. So that's one big aspect that really stuck out to me. The second thing about psychopolitics that also stuck out was when there is like the worker and the boss relationship like You can protest, you can go on strike, and if you are your own boss, there is no, external power struggle.

There's only, internal conflict. I think that's also true, and there's also something I'd like, work with. But there, I think that the gift of being enough is, I think, the thing that really, worked through that. And then, man, that took, two years. yeah.

hugo: yeah, for most of us, we've barely started the process and definitely for me, it's on, on, on ongoing. I am interested, and this is. [01:20:00] Getting a bit self helpy, but I mean that in the most, beautiful way actually how It's easy to rationally know that, I'm enough or whatever, but, a lot of things in the world and internally have been conditioned in a lot of ways to make it really challenging to internalize that and wake up believing it, to be honest.

So is there any, especially we work in a field which is known for everyone having imposter syndrome as well, right? So just any words of advice on self love?

jason: I don't know about self love. I think like the inversion that I talked about was effectively like the inversion I wrote in my advice article was basically around the fact like, hey unsure what self love looks like but I do know everything I would do if I hated myself. So I'm just gonna list that out and do the opposite and that should be good enough, right?

That's how I think about these things now and in particularly with this feeling of nothing. It's I've been able to demonstrate to myself that I can have everything that I've [01:21:00] wanted. And now I can choose to not want things. And I am confident enough now that when I don't have something, I believe that it is because I don't want it.

And that is also something that's very,really, relieving, right? Yeah, I'm not working on OpenAI because I really do not want to do LeetCode.that's the end. I feel very good about that, right? And I think that takes a lot of time to build the confidence to be able to actually say that, oh, everything I know I want, I will have.

that is a lot of confidence, and it is probably delusional, but again, my advice article is basically lies I tell myself, but, these days I truly believe it. And then when I don't have these things, I'm choosing not to. I'm very confident that I can make 100, 000 a month. But I recognize how much work that's going to be.

I recognize that over the summer I want to spend time with my friends. So I'm going to make these decisions to not have that. And there's no doubt in my mind that I couldn't have that if I wanted to. And I don't and like it sounds a [01:22:00] little crazy, but it's quite effective

hugo: I do love this idea of listing the things that you do if you didn't like yourself. could you just talk us through a few of those?

jason: Yeah, I mean off the bats like I think i'd stay home all day. I wouldn't go outside I would probably just work all the time, right? I would like work to keep the demons away. I would stay at home I would probably you know Not get my O1 and just stay at home and be mopey. I definitely know I wouldn't exercise, right?

and like those are some like really simple things, right? Like I definitely know I wouldn't try to socialize, right? I wouldn't try to like, like If people like if students reach out to me for help and I hated myself I definitely don't think I would reach out and try to help them Because I wouldn't believe that I could do it.

And it's really easy to come up with this stuff. what do you do to love yourself? It's I guess I would exercise. Feels silly. But knowing that you wouldn't, and doing the opposite, but now I just have to exercise, right? And so I think a lot of it ends up being the fact that people, [01:23:00] like people like me, who are paid to identify the risks and edge cases of systems, can find problems all the time. And so let's just find all the problems and then do the opposite. Yes.

hugo: Fantastic. I want to bring this back and wrap up with some data stuff. we have a really nice question from Vijay. which I want to give a brief answer to myself actually, because I think about this, I've thought about this far too much over time. Do you ever find yourself needing to use linear algebra probability in your work?

Or is it largely independent from math? I'm wondering if I should invest the time needed to maintain these skills. Now, my. Like zeroth order advice there is focus on data problems, and that will lead you down a path where you start to pick up statistical skills, start to pick up probabilistic skills, start to, I think thinking probabilistically and statistically is one of the most important things in our discipline, but I think if you focus on data, you'll do that.

If you get interested in linear [01:24:00] algebra, let's say you're. Designing architectures for neural networks or whatever. Like maybe you're not, maybe you're just using models and you don't need to know that stuff at all. But if you're designing architectures and want to learn a bit more about,fine tuning with Laura or something like that.

And you want to learn a bit more about like matrix multiplication. It all comes out in the wash from your interests. So follow the data. Follow some of the models and the statistics and linear algebra that you need will reveal itself. I wouldn't read a textbook on multivariate calculus necessarily, unless you want to.

so that's my zero thought or advice. I've actually got a lot more thoughts on that, but I thought I'd set the seed there, for Jason to feel free to disagree with me completely.

jason: I think that the key word is maintain. So my question would be like, how much do you know right now? How much do you think you need to maintain? I maintain nothing, but as a result, there are three things that I know, and I use them every day. One of them is just singular value decomposition, very useful.this idea that, there's [01:25:00] this such thing as, effective degrees of freedom, that, a lot of data can be described by eigenvalues and regularization, these things, that kind of concepts go with it. Bayes theorem, super useful, right? It's just, general understanding of, just, almost just being able to do, Fermi type problems.

Are you familiar with, are people

hugo: Yeah, like napkin math. And in fact, in the course we're teaching Jono Whitaker, who works, at answer AI legend. and he's also, he was, we did a, we did an episode here. I was going to say we did a great episode, but it was great. But because Jono was there, he's giving a talk on a napkin math for fine tuning, just to promote that.

But yes.

jason: Yeah, and then the last thing is understanding, empirical Bayes. There's this idea that if there was a restaurant that was 4 out of 4, like 4 stars, but it was, had 3 reviews. And a restaurant that was like, 3. 8 stars, but a thousand reviews. There's some notion that like the 3. 8 stars is better, right?

That plus like some random, like a multi armed bandit stuff. I think that's really all I've ever used in my life, but it is [01:26:00] incredibly useful in the sense that for all the probability work, I use those tools to, again, like it's all about, like I use those tools to basically estimate like how I should allocate resources, right?

It's never because I needed to do a proof, right? It's always okay, when I do this, there's just is it 50, 50 odds? Or is it like One tenth of a percent odds and just understanding how to do the how to do the napkin math to estimate those things have been super helpful Right and just have an understanding priors like but deriving anything.

I think that's probably definitely very underutilized But if you already have it, I don't think you need to think about maintaining it You will just consistently be using it in, in, in your every day. And just thinking about using it every day will be enough, right? I think forgetting things is incredibly important.

Like when I did Jiu Jitsu, I took a year off. And when I came back, I forgot everything. I didn't, that didn't work. My Jiu Jitsu got better. Cause I just remember it's the six moves that worked all the time. And now I only use those six [01:27:00] moves and I'm winning more than ever because I'm not trying to do some fancy thing I saw in a textbook.

hugo: I for one could definitely help with forgetting. there's stuff I need to remember that I can't, but I can still remember the quadratic formula, which I'll probably 

jason: Oh, I forgot that I'm blessed.

hugo: Yeah, no, I'm yeah. I need the blessings. look, I've just, shared your website in the chat and I'll do so in the show notes.

your Twitter and your GitHub and all of that is available from there. I've shared the type form for your course as well. Is there any other ways people can get in touch or is that. That pretty good.

jason: I think that's basically it. Yeah. if you guys want to get touched, it's like the website, jxdnr. co and then there's some contact forms there that people can fill out. But I think that's all for me.