00:00.28 James Welcome back, everyone, to Merge Conflict. Frank, I forgot to put you on the other one over here. on o Welcome back, everyone, to Merge Conflict. I know how to use a computer. I am one of your hosts, James Montemagno, and with me, as always, the one, the only, the most gorgeous man that I'm looking at right now, Frank Rieger. 00:19.83 Frank okay 00:19.96 James How's going, buddy? Yep. 00:21.14 Frank Out of all the men that you're looking at, I'm the most gorgeous. Great. Thanks, James. Thanks for the backhanded compliment. Hi, James. It's wonderful to see you. Welcome, everyone, back to Merge Conflict. I missed you all. It's been a week. 00:33.38 Frank How's your life? 00:35.03 James It's been a week. I was in Guatemala for a wedding and it was absolutely fantastic. It's good to be home. ah But yeah, Guatemala, absolutely awesome. And the wedding was the craziest wedding i've ever been to. So there was stilt dancers. 00:48.76 Frank Oh, that's quite a skill. I didn't even know that was a skill to have, but somewhat envious now. 00:55.35 James There were pyrotechnics. 00:59.10 Frank See, the stilt dancers I get because like I've been doing housework and I kind of wanted stilts the whole time. So like that's a skill. i Don't really need the pyrotechnics. Don't need to set the plates on fire. 01:09.34 James There were really cool like LED blinking lights or sunglasses that you got and like cool foam hats. 01:10.05 Frank Yeah. 01:13.37 Frank yeah 01:14.58 James It was wild. 01:15.51 Frank Are you sure you didn't go to a rave? 01:16.02 James It was. 01:17.15 Frank Are you misremembering what just happened in the last week? 01:20.73 James There was a lot of dancing. There was also a bunch of live bands and also like a live DJ set and it was kind of absolutely amazing. and 01:30.82 Frank Cool. 01:31.60 James a couple that we went or, know, we just became friends with friends with them in the last year. And, uh, yeah, there's like so cool and the wedding was awesome. So it was really, I'm glad we made the trip down there. It was pretty It wasn't too bad of a flight actually, you know, Guatemala actually isn't that far from us. 01:44.38 Frank Oh, yeah. 01:45.61 James It's not that far. 01:47.06 Frank It turns out the Americas tend to stick together. um And it turns out you can just go up and down. i don't know. I should do it more often than myself. 01:54.48 James Yeah. 01:55.64 Frank But I do love. I got to see more of the Americas myself. You're making me a little jelly. 02:01.30 James Yeah, we really want to go um over there. There's like Belize and a few other countries that we want to go. 02:07.67 Frank Mm-hmm. 02:07.80 James Once you're over there, well, maybe we'll go some other countries. So anyways, if we have any listeners in Guatemala, thanks for being a listener and a great country and excited to come back one day. 02:13.46 Frank Mm-hmm. 02:17.14 James i do There's lots of volcanoes, lots of hiking volcanoes. We didn't get to do that, but they're active. 02:22.11 Frank Right. 02:22.59 James they're It's happening. 02:25.30 Frank Okay, again with the pyrotechnics, man. I don't i don't need that. i don't need I need the calm water, like an oasis. Just nice yeah nice and calm. 02:33.96 James Yeah. 02:35.18 Frank Maybe a waterfall. 02:36.86 James Yeah. 02:37.20 Frank That's as spicy as I get. 02:39.03 James Well, it has been a week. um There's a whole beautiful list of things that we're going to about today. ah Is there any Internet controversy that you want to talk about or should we skip over any Internet controversy? 02:53.48 Frank Oh, God, have I not been keeping up? I can't even think, like, you know, the controversy has come and go in waves. I think there's a trial going on. I'm trying not to pay too much attention to it, but it's hard not to watch a little a bit of high profile trials with ah some companies that start with an O and end with an AI. 03:12.35 Frank It's kind of fun to watch that stuff go. 03:13.05 James I have seen those. common 03:15.29 Frank Yeah. 03:15.80 James That commentary has been interesting. I've been trying to watch like the the dailies of that. Very fascinating. I have missed the first few, but it it does seem that ah there's lots of distillation happening and people are confessing to things and all this stuff, which is fascinating. 03:29.78 James Yeah, that's of interest. 03:30.84 Frank Yeah. But no, I don't think there's any like blue and black, gold and yellow, white dresses happening or anything like that. 03:32.46 James That's for sure. 03:39.41 Frank I think, yeah, the the controversies aren't too crazy. um The AI world has completely moved on. You know, everything you knew last week is irrelevant. It's all changed. 03:50.49 James As always, I mean, that's a new week. There's new skills to be learned. I think someone maybe asked us like what skills we're using. I don't look up, but we'll do a listener bag. Maybe if you leave comments on any of our videos on YouTube, we'll do a listener bag. 04:00.12 Frank Yeah. Okay. 04:02.01 James We have some good questions and good tweets coming in, I think. 04:04.47 Frank Yeah. 04:04.97 James ah But someone did ask us a little bit based off our episode that we just did. 04:05.91 Frank I think. 04:08.97 James were talking about some of the ah pricing changes around AI. 04:12.93 Frank Mm-hmm. 04:12.98 James And one of the things that I had mentioned um was that one of the kind of you know cool benefits of some of these harnesses is not only the ability to use frontier models through the service that you're paying for, 04:29.18 James ah that are readily available. So all of them have them, right? Whether you're using Codex or GitHub Copilot or Cloud Code or OpenCode or Cursor or whatever, they all have them. 04:36.96 Frank Kimi! 04:38.81 James Yeah, and many are the same. Obviously, some are also multi-providers, right? With GitHub Copilot, you get a bunch of different stuff from the different providers that are out there and custom models as well. But one of the really fascinating things seems to be that you can do something called bring your own key when you're using this service. 04:57.14 Frank yeah. Yeah. 04:58.58 James And bring your own key BYOK as we often call it is the ability to say, i would like to use this other service over here to to route all my queries through. 05:13.18 James And that may be specifically, maybe an API key with Anthropic or with OpenAI or maybe Microsoft Foundry. 05:20.76 Frank Yeah. 05:22.77 James And ah you would be paying those providers for the token usage, however their billing is, right? And you would just use GitHub Copilot. You wouldn't use the models from from from Microsoft, right? That are available through GitHub Copal, you use them through these other services. 05:38.36 James And you might be saying, well, those all three sound like models that I might already have. So why would I do that? Well, the real reason I think is also to use a bunch of other models out there that maybe you don't necessarily access to. 05:50.24 James And a very popular one is called Open Router. And Open Router allows you to navigate through and have a whole slew of things. Now, obviously with bring your own key, you know 05:58.30 Frank Yeah. 06:04.28 James the tuning, I'm very interested in your usage of the next thing I'll talk about, but I'm just gonna i'm gonna go long here and then we'll get into the thing, which is the the the biggest thing. 06:11.13 Frank All right. Preach, preach. 06:13.75 James preach. um But the system prompts, right, the prompts that are being sent, like the VS Code team, the Copal CLI teams are finely tuning those and the tools and stuff like that, right, as much as they can. They're open, you see them. Obviously, there's going to be a general system prompt for these other providers. They don't know exactly what you're picking necessarily. But then there's this other option, which ah Frank Kruger, I did a video on i'm gonna I'm gonna look it up here. 06:42.26 James Let me go through my video history. Let's see Copilot CLI, this, that. I gotta to scroll for a while. I'm scrolling, I'm scrolling, I'm scrolling. 06:50.14 Frank Produces so many videos, everyone. who Who keeps up? 06:52.60 James um 06:52.95 Frank Raise your hands if you keep up with James. 06:55.03 James Ah, here it is. One year ago, it was called Bring Your Own Key and Models to Get Up Copilot and Visual Studio Code. 06:57.79 Frank Wow. 07:03.77 James And I talked about bring your own key, but also, running local models with Ollama or now you could also use Foundry Local. 07:11.45 Frank um Okay. 07:13.25 James So running local agents and local models on your machine, and instead which you wouldn't have to pay for anything because it's on your machine. 07:22.78 Frank Yeah. um I've actually I've gone down every single one of these rabbit holes now with the great pricing apocalypse. 07:29.41 James Hmm. 07:32.08 Frank ive I just wanted to feel out the world and see like what kind of pricing can I expect out of different services and all that stuff. I did want to make one joke, though, about open router. Open router so bring your own key that within open router, you can bring your own key. 07:46.55 Frank So you can bring your own open router key, but with an open router, you can bring your own key in there too. It is such a weird messed up world because open router actually merges all these different providers together. It's it's very funky, to be honest. 08:00.86 Frank um I've been rocking the DeepSeek model right now. Because it is stupidly cheap. They're running this promotion where it's like 75% off the price. And the price is already cheap. And so 75% off that. I was just curious what I was doing. 08:18.97 Frank So I did like a full day of work on it. And it cost me like $2. was like... 08:23.21 James Wow. Nice. 08:23.74 Frank That's a pretty good deal. pretty That was like 60 or 80 million tokens total. i was I was burning through the tokens and they have a good context length and it was just a couple bucks. 08:34.68 Frank So I was trying to bring your own key thing. So I still rock everything through vs Code. chat, AI chat, agent chat. i don't know what the feature's called anymore because I used to call it copilot chat. 08:46.04 Frank Is it copilot within VS Code? I can't keep all this straight, James, because it's just chat. 08:50.42 James It's just chat. Just chat. 08:52.63 Frank Just chatting, man. 08:53.05 James Just chat. 08:53.98 Frank ah 08:54.04 James Just chatting. 08:55.19 Frank Yeah, because Copilot's GitHub's thing. VS Code is more generic. VS Code has an interesting um implementation, though, where you, in general, you have to get an extension if you want to add models to it. 09:09.94 Frank So it's a little bit funny because you were mentioning like generic prompts and such. All the model providers out there have kind of adopted the same API. They all just copied OpenAI's chat completions API. 09:23.26 Frank But oddly enough, VS Code doesn't allow you to just plug into that. They really want you to get extensions and do funky stuff because they assume that every model wants to be tailored. It's all a bit of a joke, but whatever. 09:34.51 Frank So, yeah. 09:34.94 James Are you sure, Frank? Because you can go into model, manage, manage, manage model, model, manage, manage, man manage your models. 09:38.17 Frank Model manager, head model, amp. and Yeah, manage your model models. so 09:46.06 James Now, that being said, I will say that extensions can bring their own models into the mix as well. 09:51.00 Frank Yes, yeah. So hit add model and there's going to be a fixed list. 09:52.09 James So you go in. 09:55.70 James you have your you have your language models and it has all of them in there. So you have your your opuses, your sonnets, your GPTs, but there is a add models, which has a dropdown for Anthropic, XAI, Google, Open Router, OpenAI, i OLAMA, OpenAI compatible and Azure. 10:03.98 Frank Yeah. Yeah. Yeah. Sure. Yep. 10:18.26 Frank oh Oh, maybe that's an insider one. I'll have to check out the OpenAI compatible one. I hadn't noticed that. So that's good. Thank you. Thank you, VS Code people, for doing that because yeah. 10:28.66 James So my assumption is if you install an extension from a provider provider like Cerebris or something like that, then they would just basically pre-configure with that. 10:39.22 Frank Yeah, and honestly it was getting a little bit annoying because we're we're to the point where there are so many AI shops out there offering models as services too. 10:50.31 Frank They're they're providing the full everything um and they all have adopted the open a API. So um yeah, I guess like they're they're all still a little bit weird around how they do thinking and stuff like that. 11:03.66 Frank But in general, they all kind of provide the same API and you can just plug into them. So yeah, I've been trying out all the models, man. Like I have put $10 credits into like every person out there. 11:15.39 James Hmm. 11:15.98 Frank i probably should be doing everything through Open Router, but I wanted to get a feel for all the different providers and see how they work. um And I've been pretty pleased. I got to be honest. um I think I'm still falling back to like the codex and the sonnets of the world. 11:33.02 Frank um But I've really been enjoying just taking taking a break from the norm and going out and trying the more wild side of everything. But BYOQ has been fun for me. 11:44.38 James Okay, so when you go in and you add a ah model, let's say you're adding this, don't know, how'd you do DeepSeek? Was that an extension or something? 11:52.92 Frank Yeah, so DeepSeek has their own extension, um but it's just it's a plugin for the chat window, basically. 12:00.41 James Oh, that's what I was going to ask. 12:00.68 Frank That's, 12:02.78 James So it shows up and it just shows up in the model selection. 12:06.87 Frank Yeah, it it shows up ah it shows up too many models in the model selection, is to be thoroughly honest, because DeepSeq decided be to be a proper provider and offer many kind of models, not just the DeepSeq models. 12:20.20 James Oh. 12:20.28 Frank So honestly, ah the UI needs a bit of work because DeepSeq just threw in like a billion models into the list. And you can just scroll and scroll and scroll and scroll to find all the stuff that you want. 12:30.92 Frank But yeah, that's the general idea. 12:31.80 James Interesting. 12:32.60 Frank These extensions can just add models to that list. And i don't fully understand these rules, but sometimes models added to that list are hidden by default. Other times they're visible by default. I don't fully get it. So if you add an extension looking for a model, you don't find it, go to the model manager and you can make it visible. 12:51.99 Frank Or if you have too many lists there, go make some invisible because there can easily be too many. 12:57.11 James Got that makes sense. So I think that's what the important factor is, is that in general, like the chat window in VS Code is a chat window, like all up, and it has all these integrations and ecosystem into it. 13:11.22 James I still think you need a GitHub Copilot subscription even to use that. 13:16.49 Frank Perhaps. It's hard to say. I don't know because I do have a GitHub Copilot subscription. So it's hard for me to say, but I will, I do watch my GitHub Copilot and the number is not going up. Like the requests aren't going up, um but I don't know. 13:26.96 James Correct. 13:29.42 Frank um Like sub agents are always kind of a weird thing. Like I don't know. If it always uses the model I chose for subagents or does it ever use something else as subagents? You know, I wonder about little details like that that honestly, I just don't know myself yet. 13:45.85 Frank um i think that I think VS Code being VS Code, there's a million settings you can choose. I just don't know how the defaults work exactly, to be honest. 13:55.16 James Let's see, Copilot Search says, yes, bring your own key in GitHub Copilot VS Code, only available to to certain tiers. 14:04.79 Frank Interesting. Okay. 14:07.77 James Yeah, but you connect to your model provider and you're built through them. You do not consume your GitHub Copilot quota. 14:12.49 Frank Yeah. 14:14.84 Frank Yeah. 14:15.00 James Perfect, gotcha. 14:16.09 Frank Fascinating. 14:16.20 James Okay, so yeah. 14:16.33 Frank Yeah. And this is how a lot of the other um apps work, like OpenCode and CloudCode and all all the, I'm not sure, can you do it in Codex? I'm not sure if that's available. 14:26.90 Frank ah 14:26.92 James I don't know. 14:27.93 Frank Yeah, me neither. Sorry. Sorry, everyone. I'm still living the VS Code life. i i I love VS Code. I'm sticking around in there. um But VS Code is more than just the chat window. It is what everyone's calling the harness these days. I hate that term. So I tend to just call it agent because it took me a while to come around to agent. And I prefer calling things agent. And what agent is basically is a bunch of system prompts and a bunch of tools for editing files and ah but other little basic things like memories and plans, modes for plans and things like that. 15:04.25 Frank So that's the agent to me. But a lot of people call that the harness because the word agent got overused a bit. So what's interesting is when you do choose like a model, and I'll just keep saying deep seek just because my brain's um stuck there for some reason. um You're still using the VS Code coding model. 15:24.29 Frank harness it's still their system prompts it's their tools and all that stuff so my little mcp tools that i've added to my like settings file they they all work just fine so the models really have been commoditized quite a bit 15:42.06 James That's pretty neat. Yeah, I think that the ability to just go in and then do this and then even run things like we'll talk about locally is very fascinating. Because obviously, you know you decide to um get build in one way or another or ideally even not get build at all. 16:04.41 James I actually just installed the Foundry Toolkit for VS Code, which is for Microsoft Foundry. 16:11.68 Frank Right. 16:11.90 James And if, so this is one of those like deep seek seek that has like a bajillion models. 16:16.12 Frank Yeah. 16:19.67 Frank right 16:20.15 James So I'm looking here, I'm looking, we have a code stroll, cohere command, deep seek R1 V3 V3, 3.2 V4 flash llama for metal llama, Mr. 16:20.82 Frank yeah 16:29.75 Frank Yeah. 16:33.43 James Large, all the GPT models. 16:35.39 Frank Yeah. 16:36.31 James So I would could get build like through this stuff, basically, I assume. 16:39.45 Frank Yeah. 16:40.19 James And then there's those are get up models via the toolkit. And then there's Microsoft Foundry models, which are like Haiku and Opus and Sonnet and Codex and more. There's like, oh, my gosh, there's a rock and llama. 16:52.74 James I was actually like there's a bajillion there. Yeah, there's tons. That's crazy. So they're all in there. 16:56.92 Frank Yeah, and and if it's a bit overwhelming, I agree with you. 16:58.63 James Wow. 17:01.04 Frank It's a bit overwhelming. So um you can go to multiple websites out there, have kind of like shootouts, or they try to rank these models for different tasks. Obviously, the only task I care about is coding. um so and And the normal ones generally win. But, you know, there's some upstarts doing pretty good. The the Kimi gets a lot of good reviews, MinMax, Memo. 17:23.67 Frank all sorts of throw a consonant and a vowel together. And there's probably a model out there with that name on it and you can try it out. um But yeah, those those are all still paying for stuff. You're still paying for a service. You're going over the internet. um You said something I got interested in was like, well, what if I don't want to pay anything? 17:44.70 Frank And what if I just want to turn off my internet? Can all this stuff still work? And it can, James. It can. 17:52.98 James It can. Yeah, okay. I'm interested because i did just also have that in my video, but I feel like I made that video a year ago and i my assumption here is that the models that I can run on my machine have dramatically changed between then and now. 18:12.79 James So i'm really fascinated to see how that goes ah because I also literally just installed like again that foundry toolkit. 18:15.99 Frank Yeah. 18:22.18 James I feel like I installed it a while ago but I didn't go into the model selector. 18:22.74 Frank Yeah. 18:25.06 James Now it makes sense as a model provider and 18:25.43 Frank Right. 18:29.08 James there is a bunch of groupings and one is called Foundry Local via AI Toolkit. And there is all the DeepSeq models, GPT-OSS, Mistral, Phi Silica, all the Phi models, Quen, all the way up to three. 18:35.09 Frank Cool. 18:41.16 Frank Yep. Quan, my buddy Quan. 18:44.85 James yeah So, and um and it shows you if there are capabilities is of vision or tools or, you know, what their context window is. 18:45.78 Frank if 18:50.51 Frank Yeah. 18:51.66 James So um help us, Frank Kruger, understand, 18:54.87 Frank Okay. 18:56.44 James what you needed to do and what that experience was and what type of hardware you're running it on. Because right now I'm on my Windows laptop, which is running something, I don't even know. 19:08.74 James um i can I can see what it's running. Let me go to the task manager and it is currently running a, a what is that, CPU, an AMD Ryzen 7 Pro 350 with 19:13.30 Frank h 19:27.54 Frank I don't know my Radeons very well. I apologize. 19:29.47 James I do have an NP, i have an NPU. 19:31.22 Frank so 19:31.43 James Yeah, 19:33.19 Frank NPU. 19:34.33 James yeah neural processing unit, yeah. 19:34.58 Frank Motion Processing Unit. Oh, NPU. Yeah. Okay. So, um I mean, in some ways, what you probably did a year ago is not too different from today because in the world of open source models, we all still kind of upload our open source models to Hugging Face, and there's billions upon billions of them up there. 19:56.86 Frank all with the big names and everything. i would say the biggest change, James, is that um we've gotten, the models have just gotten better, like to a scary level. 20:08.50 Frank So I remember just a few years ago, All I could run on my kind of Macs were the 7 billion parameter models. And even those were kind of slow because the software wasn't tuned for them. 20:21.65 Frank People were happy to just get them working. They're like, oh my God, look, a local large language model running on my computer. 20:26.61 James Yeah. Hmm. 20:27.25 Frank Isn't that cool? Yeah. Well, this past year, people have just been obsessed with the engineering aspect of just making bigger models run faster. And it's glorious. I've been waiting for this time to happen for so long because like, yeah, all you got to do is throw some engineering prowess against these things. They're just big calculators, you know, just people love writing calculator code, just make better calculator code people. And so um what has happened is the models that are available are better. 20:58.50 Frank you And the things that can run them can run bigger models and in a more sophisticated fashion. So, um and I was a bit out of the loop also. So I wanted to take this last week and try some things out and just see how they feel. So I was living in the Opus world, just for context. I spent all of April, checking what month it is. I spent all of April living that Opus life. 21:25.33 Frank um It's my buddy. It's my friend. We get along great. um so But then I spent a few days running the DeepSeq, and now I've spent few days running QWEN. 21:37.02 Frank QWEN, Q-W-E-N. It's a 27 billion parameter model. The ones that people like are QWEN version 3.5 and QWEN version 3.6. It's well understood these are not the greatest coding models out there. 21:55.00 Frank But on my hardware, which I'll get to, you can run 256k context windows, which is huge. 22:02.20 James and That's pretty good. 22:03.34 Frank Because even with Opus, um VS Code was limiting to me to 192k before it did like compaction and that kind of stuff. 22:03.74 James Yeah. 22:12.50 Frank So a 256k window locally is really impressive. like It blows my mind. James, when I, you know, I work on that cuneiform project where I'm training large language models. You know what the largest context window I could put onto that was? 22:26.42 Frank Just guess. 22:28.95 James ah Maybe like 20 twenty k 22:32.86 Frank 512.5 K's. 22:34.01 James now No, no. 22:37.02 Frank That was the biggest I could get on there. That's well, because that was 32 bit floating point math and it was doing all sorts of complicated things. And the engineering has gotten to the point where I can do 262 It's just impressive. 22:51.24 James That's crazy. 22:51.86 Frank So um we both have Mac minis, kind of overpowered Mac minis. So we can talk about those, but I'll talk about what I've, yeah, there she is. Isn't she cute? Yeah. um What I've actually been running mine on is I've had an RTX 3090. 23:05.52 Frank Anyone who's ever listened to me talk about things, it's it's my favorite GPU ever. 23:06.37 James right. 23:09.92 Frank I've had it for a few years now. I love using it for everything. ah But truth is, it's just been sitting there idling, doing absolutely nothing for the past few months just because I haven't been training neural networks or anything. So I'm like... 23:23.29 Frank The reason I like the 3090 is it has 24 gigabytes of VRAM, RAM, whatever you want to call it, fast RAM built into it. And 24 gigabytes can actually run these 31 billion parameter models if you use 4-bit quantization. So that is, we are using 4 bits for every parameter in the model. 23:46.97 Frank So you need roughly 15 gigabytes in order to run these models. And then you need the other nine gigabytes for your context and your output and all that kind of stuff. 24:02.42 Frank So I went through the process of installing llama.c++. It's one of my favorite runners out there. You've already mentioned OLAMA is a good choice. um Kind of the industry standard is another one called VLLM. 24:18.24 Frank In the Apple world, there's MLX now. you know there's There's runners everywhere. People have gotten the religion of optimizing these things and installing them. So I went through the process and I wrote a little blog entry about it. Everyone can follow along if they want to go to my blog, preclorum.org. 24:34.36 Frank And I give all the setup instructions. It was easy, Jane. Download the model, pass a few command line arguments, be amazed at how port forwarding still doesn't work in the year 2020. Yeah. 24:46.07 James and 24:47.64 Frank Try to get your stupid network into good shape. Try to figure out what incantations vs Code wants. And all of a sudden, i had I chose Quinn to start with. 24:58.42 Frank um It was there. It was just VS Code, acting like an agent, running sub-agents, doing planning mode, running with autopilot because life's too short to care about permissions and all that kind of stuff. 25:13.08 Frank And i had it analyzing my code right away. and I want to talk about like my experience with it, but that's pretty cool that I was pretty easily able to just get this thing up and running. This software has really advanced, I think. 25:26.87 James Yeah, no, I think when the the team first started adding that feature, you know, it seems like an advanced feature because you were talking about hardware requirements and GPUs, you know, the 3090s on cheap ah nowadays, maybe back in the days of yore. 25:37.04 Frank Yeah. 25:39.52 Frank No. 25:41.78 James But... ah ah but you know, being able to plug in an API key or plug in a thing, you know, that's relatively straightforward, and but it's still running remote remotely. But having something running on your machine that you'd be on a plane, you'd be disconnected, it's just sitting there ready to go is really magical. 26:02.14 James That being said, there is still some edumacation because like you said, you have to install a thing and to get this thing and then do some port forwarding stuff. But that being said, 26:15.38 James as far as the steps go to get that required and then it just kind of works is a magical feeling at the end of the day. It's kind of like when you get like an Android emulator up and running, you're like whoa, look at that thing. 26:24.36 Frank Yeah. 26:24.68 James And then you're like, wow, this i run it's like, wow, that only took a thousand steps and a bunch of pieces of software installed and bunch of SDKs to download. Then boom, there it is, great. I did it, right? 26:32.25 Frank Yeah. 26:33.52 James But that's really neat that one, it's built in, it's just ready to go. 26:33.78 Frank Yeah. 26:39.16 James And then that you're able to figure it out. It seems relatively relatively straightforward. 26:44.92 Frank Yeah, I think what made it so comfortable is I didn't have to switch out of my normal workflow because over the past whatever months, I've gotten very used to using the agent chat and everything in VS code. I know how it works. I know what to expect out of it. 26:58.23 Frank And it was so weird to just see a model that I knew was running because, you know how I knew it was running, because I could hear the jet engine on the other side of the room as every fan on that RTX spun up. 27:05.24 James Yeah. yeah 27:09.55 Frank You know what you know what optimizing a model runner really means? They are... pushing that video card as hard as it can possibly be pushed. And that means power usage. And that means fans, thermal regulation. So like I do a lot of network training and I hear the fans on that thing. I have never heard it screaming the way it screams when I ask a VS code at chat thing to go analyze every single file in my repo and report back every bug. Fine. 27:40.22 Frank ah It just starts howling. but um i So that was just such a satisfactory feeling knowing and I even did that funny trick of I disconnected from the internet. Obviously, my network was still up, but I disconnected from the internet just to prove that everything was happening locally. And it was, it was beautiful. 28:01.88 Frank So model quality, though, I think this is kind of the big one. um We are running compromised models here. For one, they're much smaller than even like Sonnet. I don't know how big Sonnet is, but it's probably five to 10 times larger than the Quinn 27 billion parameters. It's probably like 300 billion, something out there. 28:24.34 Frank Um, And on top of that, I'm running a compromised version of the model. I'm using a quantized version of the model. So four bits per parameter, when usually you have 16 bits per parameter. 28:38.51 Frank And then I'm compromising it even more. The context that it's keeping and the output buffer it's using are also quantized. Those should all be 16-bit. 28:48.83 Frank They ain't. They're 4-bit also. 28:50.46 James Mm. 28:51.43 Frank Yeah. That's how you get the big context out of it. You quantize everything. yeah You really compress it down. um But at the same time, James, I can't really tell the difference between Opus, DeepSeq, and the stupid 27 billion parameter model running on my other machine. 29:12.38 Frank And I don't know what it is because if you asked me a month ago, I'd say opus forever. Don't you take the opus from me ever. And now I'm just like, can I even tell the difference between any of these models? I'm starting to get very suspicious of myself. 29:29.40 Frank and So honestly, for the past day, I've just been like running the same prompt on like six different models to just see what happens with all of them. So I do want to say, OK, it's slower for sure. 29:40.55 Frank I'm getting like 40 tokens a second. I think Deep Seek, you get like 60 tokens a second. Opus, you get three tokens per second. Just kidding. Opus is fast when Anthropic wants it to be fast. Other times it's horrendously slow. 29:54.71 Frank Um, it's a big jet engine over there and it does make mistakes once in a while, but you know what? Opus makes mistakes once in a while. Um, it's sometimes bad at file merging. This is like, you can tell where the VS code, um, 30:11.62 Frank harness and prompts aren't perfect because like it keeps messing up how it uses the file read tools like oh the file read tool needs line numbers i guess i'll pass it line numbers like yeah okay maybe you've just done that in the beginning like there's these funny little things but like it figures it out at if at just every session it has to figure it out every time but it figures it out 30:21.79 James Oh, yeah. 30:33.08 Frank And I just, I don't know what's wrong with me or the world or what's going on, but I have been so satisfied with these tiny little models. And I'm pretty sure I'm not deluding myself, but like, I think it's just the systems that we've developed, like good agents files, doing planning, asking questions, and then the coding part just becomes kind of rote once you've kind of approved all the plans and everything. 30:50.52 James Hmm. 30:58.34 Frank So yeah. Aside from the slowness, I honestly can't tell the difference between my little local model and a sonnet or something like that out there or a small codex. 31:11.45 Frank And I've also been trying, um Google has Gemma or Gemma, I'm not sure how you're supposed to pronounce it. ah Gamma 4. And that's a great little model, too. It's hard for me to tell the difference between QN and Gamma. You go to all the websites that benchmark these things. They all use that stupid SWE bench. 31:31.58 Frank Benchmarks are all a joke, everyone. Just put that out there. But... um 31:34.76 James That's 31:36.28 Frank you know, the graph says this one's way better than the other one. But then in practice, you're like, I don't know. They're all making my job better and my life happier. And aside from the fans roaring, it's really hard to tell the difference. So I've i've been really excited by all of this. 31:53.36 Frank It's, it's, yeah. 31:53.91 James pretty... Yeah, it's pretty neat. I mean, I think the biggest thing that I had a while ago was the speed of it, but I was also, I forget what hardware, maybe i was running it on my Mac Mini, I'm not positive. 32:00.73 Frank Yeah. 32:04.66 James But mostly the speed of it, you know, I use a lot of GPT models, which are quite fast, especially in CopileX, they're hosted in Azure, right? So they've got speediness to them. and agency and then or some of the smaller models like a like a sonnet or like a haiku or quite quick quick right compared to an opus so i i'd use a lot of those those models all the time and to me the speed is one factor but also correctness and being able to read and just the the answer at the end of the day is like is it correct or is it not i've been doing this 32:30.21 Frank Sure. 32:36.22 James trial ah since the beginning of the month. I posted about it, which is I'm not picking any model manually by hand anymore. 32:43.16 Frank Oh, God. 32:44.18 James I'm only using auto model, which auto model. Well, well, it seems to be different for for each tool, um but On the trip to Guatemala, I only have my phone. 32:58.00 James And as you know, on flights and long layovers, now that I have Starlink on Alaska, I do a crap ton of coding. 33:02.69 Frank yeah 33:04.48 James And I do it all with the cloud agent, the GitHub coding agent, the cloud agent. 33:08.44 Frank yeah yeah 33:09.23 James And the default is auto. So I just let it YOLO. And then when I got home, I said, I'm only going to use... I mean, when I say only, I mean like 95% of the time, unless I need to switch something because something is just not clicking. 33:20.50 James But inside of VS Code and now inside the CLI, you can just end VS, you can just hit auto. And based on like, it's not based on your prompt, it's a based on like availability and a few other factors of the model, it'll pick a model. Often this is like a Sonnet model or a GPT model, but it really just depends on time of day XYZ and they give you a little bit of a discount, 10% off for using auto. 33:44.28 James And what I like about auto is that if it's based on availability, that means it's probably pretty quick and it's moving really quick too, because it's not being stressed. 33:50.74 Frank Right. Yeah. 33:53.91 James And um I'm trying to figure out, and the reason for this experiment is, How much does the model matter? And this is exactly what you're doing, right? 34:02.84 Frank Yeah. 34:04.57 James And I do believe that the models do matter in some instances. 34:04.89 Frank Yeah. 34:07.93 James So what I'm trying to comprehend in my mind is when does the model matter? When does the reasoning matter? 34:13.08 Frank Right. 34:15.00 James Because it's so easy to default into models that we think are the thing that's good for this thing. 34:19.62 Frank Mm-hmm. 34:20.81 James compared to just saying, maybe they're all pretty good. And then just kind of YOLO it on the auto model selection. Now, when I mean auto, I don't mean autopilot, I don't mean bypass. I mean like there is a dropdown that says auto and just go and it will pick from the available models. 34:33.16 Frank 0.9x, baby. 34:34.47 James 0.9X and just a little as a rip, right? So you get that little sweet discount and go to town there. So I'm trying to do almost the same experiment but obviously running you know normalm you know foundation um models in in the cloud, but but kind of the same thing. 34:41.96 Frank Mm-hmm. 34:48.42 James Does the Gemma, does the Quinn, does the DeepSeq model, how much of it does it make a difference? There's probably, I'm interested in this, like was there a point, i know it's only been like a week or so, but has there been a point where you did switch back to Opus or Sonic? 35:03.39 James Because you said in the beginning, like you were still favor of these models. Favoring is not the same as using or having to use. 35:06.70 Frank Right. 35:11.54 James them? Has there been a point so far where you're like, you know what, I'm actually going to switch to this other model because my house is getting too hot. 35:21.53 Frank You know, I kind of like hearing it spin up. I keep making fun of the fan noise, but like, you feel like you're a power user. You're like, hey, ah rename this variable and then jet engine start off. You're like, oh boy, that's getting really renamed over there. 35:36.38 Frank um First, I want to say I really appreciate like the auto thing because i i think that's the biggest lesson I want to take out of this. I don't even want to promote like... um local models are the way to go necessarily. More I'm just realizing how important your processes and systems and the the the information we give these models, just how important that all is. 36:01.27 Frank But to answer your question, I've had zero interest in going back to the big provider models. um Every so often you have like a little Tweety bird in the back of your head saying like, I wonder if Opus would have solved this already. 36:14.66 James Mm-hmm. 36:17.23 Frank you know Every time it introduces like a little bug or something like that, you wonder would Opus have made that mistake or would Sonnet or would Codex have made that mistake? 36:24.60 James yeah 36:27.96 Frank But in the end, like, who cares? It's free. I just tell it to go fix the mistake and it goes and fixes the mistake. You know, it's it's the same UI and everything. So I think maybe it's maybe too soon to know whether I'll switch back and probably I'll switch back because I'm going to keep paying for a co-pilot and I'll use some credits there. You know where I'm really going to use co-pilot is the cloud stuff, like you mentioned. 36:52.18 Frank Because what I found is when I'm being truly productive, I'm using issues on GitHub and PRs on GitHub and using the cloud models. That's when I'm getting like my six things are happening at once and I'm being ultra productive. 37:08.44 Frank The model I'm using on my dev computer, it's okay because I'm usually thinking through a problem or I'm in a greenfield application and I'm doing design work. You know, I'm not trying to implement a feature or fix a bug. 37:19.60 James yeah 37:22.82 Frank That's that's all happening and on the web. 37:25.11 James yeah 37:25.78 Frank That's just happening in the background. this is though I'm having a discussion with the AI and it's okay if it takes a second or two to have its little discussion and I don't mind that. 37:36.12 Frank So I think that that's actually where I'm going to kind of settle. Like for my dev machine, I might just keep cranking on the 3090 and just enjoying it from enjoying trying different models to just getting out of the hegemony of anthropic and open AI. It's fun to use these open source models. They're all different. They have different capabilities. um You know what they'll allow you to do and that kind of stuff. 38:03.77 Frank ah So, i yeah, I'll put it this way. We are at the 5th of May, not too much behind the scenes on how our ah recording schedule. 38:15.03 Frank And I have used 0.6% of my co-pilot credits because I've just been rocking these local models and it's been fine. And I have felt very little need to go to the bigger models. Yeah. 38:29.85 Frank That said, when I use the cloud-based models, that's when I'll probably be digging into my co-pilot credits or tokens, whatever they're going to call them after June. 38:43.49 James Usage. 38:45.06 Frank Usage. 38:46.24 James Usage. 38:46.37 Frank Fat. 38:47.22 James So yeah, that's really, that kind of gets back to that ecosystem play, right? And yeah we're doing those GitHub Copilot dev days and I kind of talk about that, like the ability to, 38:58.71 James tap into the ecosystem, tap into other model providers, tap into other, you know, um, harnesses, if you will, and being able to use things on the plane, on your computer, in the terminal, you know, on your phone, you know, all these different areas, assign issues, do these things. 39:12.57 Frank Yeah. So, 39:14.86 James And that's how I use it. So to me, it's not about one tool. it's about the combination of all the tools. Now what's fascinating is like, we're going to use those tools and then we're going to, um, use different models in different instances based on where those tools are running. 39:27.06 James Right. Because If you're using a cloud agent, it's gonna be running and needs to talk to cloud stuff, right? So that's really, um really, really interesting in general to think about. 39:38.91 James um I, yeah. 39:40.63 Frank i want I want to talk about performance a little. Sorry, didn't mean interrupt. 39:43.90 James Yeah, yeah. I was gonna ask, like did you have a specific thing around models, performance, um outputs that you were leaning towards? 39:45.59 Frank Okay, so, yeah. 39:52.80 James Because you talked about the different ones and that you're wanting to explore with more, but yeah. 39:57.72 Frank Yeah, and i I've been doing AIs forever, and people keep talking about tokens per second, and I honestly don't know, like... What's a good number? You know, obviously bigger is better. 40:08.34 James Yeah. 40:09.43 Frank That much I know. But is 40 tokens per second tolerable? And what I've discovered is it very much is. You definitely get annoyed with the thinking loops that a lot of these reasoning models can get into. 40:22.33 James yeah 40:24.54 Frank um There's a lot of people who are just turning reasoning off because like it's faster for it to make a mistake and you correct that mistake than watch it think in loops forever. 40:35.43 Frank It seems these things really seem to get into cyclic loops and all that kind of stuff. So I think that there there are benefits to turn that off. But 40 tokens per second, it's still faster than I can read. 40:47.22 Frank So when it's editing code, it's plenty fast. When it's pumping out text for me to read, it's faster than I can read. So it's fine. um Now, that 40 is, again, with the the model quantized and all that running on today, I think, the 3090s, a 1,500-hour GPU. 41:08.95 Frank ah But we both have Macs. And you can take that exact model, same exact model, same quantization, everything, run it on the Mac. And you're probably going to get like 17 tokens per second. 41:24.21 Frank Which, again, is fine for text. But when you have like sub-agents going and reading your entire repository, too slow. 41:35.42 Frank When you have it thinking in loops, too slow. um So the 17 is really bad. The good news is, James, in the last week, all of this has really been improving. um There's a new technology out there, MTP, and I'm trying to remember what the stupid thing stands for. Multi-token magic, MTM. I forget what the P stands for. 42:00.57 Frank um it's It's this like predictive, oh, maybe it stands for predictive, where there's actually these... models have smaller models that are just kind of dumber. And they're just guessing what the next few tokens are going to be. 42:17.11 Frank And then the big model just checks whether the next few tokens should have been those things. And it turns out this cooperative way of um doing inference is faster. 42:31.61 Frank So you have a little stupid model doing a bunch of predictions, and then the bigger model is actually um validating those predictions. And both Quen and gamma slash Gemma ah support this. 42:47.03 Frank And the new software out there, um there's on Mac, there's MTPLX, which uses the, oh my God, it's too many libraries. 42:47.04 James Oh, cool. 42:58.84 Frank And this is the MLX library, i like a fork of it to do this, where it does the intelligent thing, where it has a stupid model running ahead and doing stupid predictions and then the bigger model validating. 43:12.59 Frank And then um this is something Google invented, by the way. 43:13.01 James Cleaning it up. 43:16.43 Frank So everyone just like, oh, that's a good idea. Let's take that. And so ah Google just released that feature for Gemma, which already was a powerful model, but now it's fast. 43:26.65 Frank So now you can run these models, these 30-ish billion parameter models, at 60 tokens per second on our max, James. 43:38.71 Frank So I'm telling you, that's plenty fast because even a lot of the service providers out there for the big, big models, they're only giving you about 60 tokens per second between network bandwidth and them being overloaded by DOS attacks and all that kind of terrible stuff that they have to deal with. 43:58.98 Frank 60 tokens per second is plenty fast enough. And it's really cool that just in the just in the last month, I would say, this new technology has come out and is really revolutionizing inference speed. 44:12.95 James That's crazy. Well, I do want to ask one thing because not everyone has a 3090 sitting around Frank Kruger. 44:17.70 Frank Nice. 44:18.52 James And the real question becomes like, what if I just have like my Mac mini or my Mac mini or more realistically, maybe a MacBook air that's sitting over here. 44:30.71 James Like, I think I almost want to follow up next week, which is like, what can you do on your MacBook air inside of a coffee shop? 44:30.87 Frank Yeah. 44:36.72 James Right. 44:37.56 Frank Yeah. 44:37.72 James What could you do on your Mac mini that's over here? Cause I don't think realistically, that going out and asking people to buy a 3090 to run these models is going to be realistic, because then I would just spend that on toke token usage maybe, right? 44:48.61 Frank Right. Yeah. 44:55.05 Frank Yeah. 44:55.03 James um More realistically, what I'm really fascinated about is maybe running smaller, tiny models as sub-agents automatically so they don't use tokens. 45:04.63 Frank Right. Yes. 45:05.86 James Or maybe there is a, hey, this one works really well on the Mac Mini. And so I'd love to see a follow-up article, which is, hey, you have this, but here is how you can use an optimized model based on the hardware that you have. Cause some people like us have our beefy, what, how many gigs of randomness thing have like 64, 128, 45:25.11 Frank I went with 64. I can't remember what you went with. I think you went the same. 45:28.16 James 64 then, whatever you got. 45:28.86 Frank Yeah, 64. Yeah, cool. 45:29.66 James And then I got the two terabyte, you got the four terabyte. 45:30.17 Frank yeah cool 45:32.70 James So we got this, right? So this isn't even realistic because many people aren't going to buy this configuration. 45:34.51 Frank Yeah. Yeah. 45:37.50 James They're going to buy the base model, which is 16 gigs of RAM, which means it's more realistic on our M1. know it's a different M4 Pro, but M1, which is a 16 gigs of RAM on our little MacBook Air. So the question is like, what could be a good optimized model? How would you run in here locally versus running it on this thing here? Because I think that is a more realistic scenario. 46:03.83 Frank Yeah, 100%. And I even make that point in my blog because I price out what would it cost to build a 3090 machine these days? And it's about $2,000. 46:13.75 Frank And $2,000 gives you 10 months of the highest tier of Claude code, you know, that that are, yeah, don't know. 46:13.94 James Oof. 46:21.08 James Yeah. 46:24.69 Frank What's the most expensive copilot, like $40 a month. So that's, that's, 46:29.82 James A lot. Yeah. 46:30.36 Frank months ah so yeah you can i'm not even going out there and saying go build a machine or anything like this but what i do say is if you have a machine like that you'll probably be using and then yeah yeah 46:32.13 James Yeah, 46:42.97 James yeah there you go. But I'm also interested in, I do have small, I have a bunch of machines. I don't have that machine, but I do have like little, I mean, I have to imagine a Mac mini, it's gotta be doing something. 46:53.21 James Did you run any models on the Mac mini directly? 46:54.23 Frank yeah 100%. Yeah, I have been. um So I have mixed feelings about it because our Macs do a great job running it, but it is definitely eating at the memory and it's eating at the GPU. So you're slowing down basically everything else on the computer. And the fan turns on on my Mac mini. Every time the fan turns on in the Mac mini, I'm like, oh, I'm sorry. 47:17.10 Frank I'm sorry. I don't mean to hurt you. Like, I don't care. 47:19.29 James Yeah. 47:20.52 Frank The RTX can start a fire back there. I don't care. You know, burn, baby, burn. But... My little Mac mini, I'm like, oh, I'm sorry, are you getting too hot over there? So um that's probably stupid of me. 47:32.98 Frank But probably the biggest, weirdest change that I've seen um in AI since I started is in the beginning, we were all compute obsessed. How fast is your computer? 47:43.31 Frank How many flops can you do? 47:44.00 James Mm. 47:45.56 Frank Today, that's almost irrelevant. All anyone cares about is how much RAM do you have and how fast is that RAM? Because these models are so big. The reason the 3090s, 4090s, are nice is because they have 24 gigabytes of the RAM. 48:02.14 Frank And therefore, these small models are small. 30 billion parameters can fit on them. Um, because we have 64 jig lights, you can fit some even bigger models, but it turns out like the model makers out there, they, at there the, the small models are about 30 billion parameters. 48:22.26 Frank And then the next step up is 300 billion. There's not really an in-between ground there. 48:25.09 James Wow. 48:27.26 Frank So, um, you definitely can take like the Quen 27 billion, run it on your, um, Mac and it's going to run great, but it is going to use all the resources and you are depleting your dev machine and you'll notice hiccups here and there. Like I had YouTube music playing in the background and every so often you'd get like some static because the poor little processor is cranking away. 48:53.37 Frank um and you And when you go to activity monitor, you see the GPU usage at 90%. You're like, ooh, okay, we are burning this puppy. um And then it becomes a thermal game. 49:04.30 Frank Like everything else, it becomes a thermal game. So yeah, it's it's a mixed bag running these things yourself. 49:07.26 James I think, yeah, I think that the bummer part here may be, I could be wrong, but the Mac mini does not support eGPUs. So you can't just like plug in an external GPU into your Mac mini. 49:29.37 Frank ah Yeah, I agree. And, you know, I've actually run eGPUs with Macs in the past, and that's even a bit of a mess, to be thoroughly honest. 49:38.41 James Hmm. 49:39.06 Frank um Thunderbolt is fast, but you're sending so much data so fast to make these models work. It's... a little bit weird. One of the reasons MTP is such a good technology is it actually keeps a lot more of the data on the video card before it ever has to get back to the CPU. 49:59.70 Frank um They try to do like their sampling on the video card. 49:59.65 James Hmm. 50:03.86 Frank People don't think about it, but these are probabilistic models. For every vocabulary word out there, it's giving a probability that it's the next word. And something has to take that list and sort that list and take the top K and do the top P off of that top K. And that's not, that is a serial operation. That is not efficient on a GPU. GPUs are parallel devices. And that is a giant serial operation that has to happen. And so for unoptimized models, there is a very huge cost we all have to pay if you're running these things yourself, where lots of memory has to be copied down from the video buffers into CPU buffers. 50:47.58 Frank Unless you have a unified memory architecture like on a Mac. and then 50:51.61 James Yeah. 50:52.18 Frank Life is good. I think some of the Windows ARM computers have unified memory architectures too. I just know less about them than M's. 50:58.90 James Interesting. 51:03.60 James All right. 51:05.18 Frank All right. 51:05.72 James I am going to try something on my Mac, that's for sure, because i do have it just sitting here ready to go. 51:06.20 Frank Well. 51:13.58 James I think it's fascinating and do want that little little fan to spin up on occasion. 51:17.88 Frank Yeah. 51:18.05 James But I do also want to point out that you can actually do this in the CLI now too. 51:18.84 Frank and 51:22.41 James You can actually configure it. 51:23.68 Frank Hmm. 51:23.65 James don't think it's as simple, but I know Kayla Cinnamon on hers went out and um has a whole thing about how to configure OLAMA and run it as the the select the models that are running there inside the CLI as well if you want to run in the CLI. 51:39.24 James So it could be cool too to give it a try. 51:42.49 Frank Yeah, and I think it's just one of those. here's Here's the real deal, James. I never want to go back to writing code in the old way anymore. 51:53.53 James Mm. Yeah. 51:54.49 Frank But I don't like the idea that my ability to code is dependent upon paying someone a service fee every month. I don't love that. 52:03.64 James Mm. 52:05.30 Frank So what I like about all of this, and it's really more of a mental thing than anything else, to be thoroughly honest, is I feel like I'm back to I myself in my little office without an internet connection can still code again in the way I prefer to code these days as coding has gone through a transformation in this past year. 52:26.49 Frank And I don't want to go back. It's just, I'm over it. I want to stay in this world. 52:30.20 James Yeah, me too. 52:31.74 Frank And ah it's it's a little bit of a liberation to run these models yourself and to see, okay, they're good enough. I can still keep doing my planning, keep doing all my, keep doing my agentic workflows. Feels good. 52:45.67 James I'm glad that we changed you over. Took a little bit what we got there, Frank. And yeah, I'm excited for the year ahead. I mean, it's only May and who knows? I mean, for me, I think with the next next steps here as we wrap up is it's more feasible than ever to use these models locally and combine them with the ones that you're using on the cloud and the different services of where you're doing your coding and how you're coding. 53:05.94 James And what I'd love to see even progress even more is for all of these bits and pieces to come together. And I think they have come a little bit together in the form of extensions, but i would love to see that ecosystem unify even more. like I feel like we have the capability of building some sort of GUI or some sort of thing that really streamlines these processes and makes it easy to configure and connect all these different harnesses and tools that we're using out there to make it easier than ever. like That's what I would kind of expect because that would then enable more individuals to do that or maybe just build it into the tooling. like I do think that there is 53:40.95 James something good and something bad about what Apple did on their machines. And I guess technically even on my Windows machine, there's there's models that are running locally for all the AI stuff. But it is like, do I want to install this thing? 53:52.95 James And one thing that might be fascinating is, is there a future that I see, which is streamlining this process to say, okay, download this model, like an easier configuration wizard is what I'm saying. 54:05.25 James Like there's all these libraries and all these things, like how do we streamline that bit and piece? 54:05.43 Frank Yeah. 54:10.86 Frank Unified, I'll just take because I'll be honest, I have like six different model runners on this computer and they all store the models in different directories. And I have models everywhere on this hard drive and it's just eating all my gigabytes and I want my gigabytes back. 54:22.36 James Yeah. 54:27.01 Frank But I'm not going to go organize these. 54:27.38 James Yeah. 54:29.09 Frank I have to run my grand perspective to go find all these stupid models. So I'm hoping at this WWDC, Apple comes up with some unification for um having like background services and good things that can 54:33.35 James That's great. 54:44.31 James Yeah. 54:47.23 Frank it's It's tough because, sorry, i we are trying to wrap up, but like we're still in the wild, wild west here. you know Innovation is happening. So you don't want to unify, prematurely unify and prematurely standardize this stuff. 54:59.35 James yeah 55:01.01 Frank It's fun having everyone ah competing against each other for speed and all that. And I don't want to lose that competitive spirit. But it is a little bit exhausting. So hopefully within a couple of years, it'll all be unified and standardized. 55:13.84 James I'd like to see it. Well, let us know if you're running a local model, where and how you're running it. Maybe you have a 3090, maybe you don't like me. And then what are you running it on? Let us know. Go head over to our YouTube, youtube.com forward slash at Merge Conflict FM. And that's a great way to leave a comment on this episode or any of our past comments. And we'll do a listener bag episode. There's quite a few chimed up, so we'll get it going. So let us know. um But ah that's it for your local edition of Merge Conflict. until next time, I'm James Montemag now. 55:41.24 Frank And I'm Frank Kruger. thanks for watching and listening 55:44.89 James Peace.