00:00.28
James
Welcome back, everyone, to Merge Conflict. Frank, I forgot to put you on the other one over here. on o Welcome back, everyone, to Merge Conflict. I know how to use a computer. I am one of your hosts, James Montemagno, and with me, as always, the one, the only, the most gorgeous man that I'm looking at right now, Frank Rieger.

00:19.83
Frank
okay

00:19.96
James
How's going, buddy? Yep.

00:21.14
Frank
Out of all the men that you're looking at, I'm the most gorgeous. Great. Thanks, James. Thanks for the backhanded compliment. Hi, James. It's wonderful to see you. Welcome, everyone, back to Merge Conflict. I missed you all. It's been a week.

00:33.38
Frank
How's your life?

00:35.03
James
It's been a week. I was in Guatemala for a wedding and it was absolutely fantastic. It's good to be home. ah But yeah, Guatemala, absolutely awesome. And the wedding was the craziest wedding i've ever been to. So there was stilt dancers.

00:48.76
Frank
Oh, that's quite a skill. I didn't even know that was a skill to have, but somewhat envious now.

00:55.35
James
There were pyrotechnics.

00:59.10
Frank
See, the stilt dancers I get because like I've been doing housework and I kind of wanted stilts the whole time. So like that's a skill. i Don't really need the pyrotechnics. Don't need to set the plates on fire.

01:09.34
James
There were really cool like LED blinking lights or sunglasses that you got and like cool foam hats.

01:10.05
Frank
Yeah.

01:13.37
Frank
yeah

01:14.58
James
It was wild.

01:15.51
Frank
Are you sure you didn't go to a rave?

01:16.02
James
It was.

01:17.15
Frank
Are you misremembering what just happened in the last week?

01:20.73
James
There was a lot of dancing. There was also a bunch of live bands and also like a live DJ set and it was kind of absolutely amazing. and

01:30.82
Frank
Cool.

01:31.60
James
a couple that we went or, know, we just became friends with friends with them in the last year. And, uh, yeah, there's like so cool and the wedding was awesome. So it was really, I'm glad we made the trip down there. It was pretty It wasn't too bad of a flight actually, you know, Guatemala actually isn't that far from us.

01:44.38
Frank
Oh, yeah.

01:45.61
James
It's not that far.

01:47.06
Frank
It turns out the Americas tend to stick together. um And it turns out you can just go up and down. i don't know. I should do it more often than myself.

01:54.48
James
Yeah.

01:55.64
Frank
But I do love. I got to see more of the Americas myself. You're making me a little jelly.

02:01.30
James
Yeah, we really want to go um over there. There's like Belize and a few other countries that we want to go.

02:07.67
Frank
Mm-hmm.

02:07.80
James
Once you're over there, well, maybe we'll go some other countries. So anyways, if we have any listeners in Guatemala, thanks for being a listener and a great country and excited to come back one day.

02:13.46
Frank
Mm-hmm.

02:17.14
James
i do There's lots of volcanoes, lots of hiking volcanoes. We didn't get to do that, but they're active.

02:22.11
Frank
Right.

02:22.59
James
they're It's happening.

02:25.30
Frank
Okay, again with the pyrotechnics, man. I don't i don't need that. i don't need I need the calm water, like an oasis. Just nice yeah nice and calm.

02:33.96
James
Yeah.

02:35.18
Frank
Maybe a waterfall.

02:36.86
James
Yeah.

02:37.20
Frank
That's as spicy as I get.

02:39.03
James
Well, it has been a week. um There's a whole beautiful list of things that we're going to about today. ah Is there any Internet controversy that you want to talk about or should we skip over any Internet controversy?

02:53.48
Frank
Oh, God, have I not been keeping up? I can't even think, like, you know, the controversy has come and go in waves. I think there's a trial going on. I'm trying not to pay too much attention to it, but it's hard not to watch a little a bit of high profile trials with ah some companies that start with an O and end with an AI.

03:12.35
Frank
It's kind of fun to watch that stuff go.

03:13.05
James
I have seen those. common

03:15.29
Frank
Yeah.

03:15.80
James
That commentary has been interesting. I've been trying to watch like the the dailies of that. Very fascinating. I have missed the first few, but it it does seem that ah there's lots of distillation happening and people are confessing to things and all this stuff, which is fascinating.

03:29.78
James
Yeah, that's of interest.

03:30.84
Frank
Yeah. But no, I don't think there's any like blue and black, gold and yellow, white dresses happening or anything like that.

03:32.46
James
That's for sure.

03:39.41
Frank
I think, yeah, the the controversies aren't too crazy. um The AI world has completely moved on. You know, everything you knew last week is irrelevant. It's all changed.

03:50.49
James
As always, I mean, that's a new week. There's new skills to be learned. I think someone maybe asked us like what skills we're using. I don't look up, but we'll do a listener bag. Maybe if you leave comments on any of our videos on YouTube, we'll do a listener bag.

04:00.12
Frank
Yeah. Okay.

04:02.01
James
We have some good questions and good tweets coming in, I think.

04:04.47
Frank
Yeah.

04:04.97
James
ah But someone did ask us a little bit based off our episode that we just did.

04:05.91
Frank
I think.

04:08.97
James
were talking about some of the ah pricing changes around AI.

04:12.93
Frank
Mm-hmm.

04:12.98
James
And one of the things that I had mentioned um was that one of the kind of you know cool benefits of some of these harnesses is not only the ability to use frontier models through the service that you're paying for,

04:29.18
James
ah that are readily available. So all of them have them, right? Whether you're using Codex or GitHub Copilot or Cloud Code or OpenCode or Cursor or whatever, they all have them.

04:36.96
Frank
Kimi!

04:38.81
James
Yeah, and many are the same. Obviously, some are also multi-providers, right? With GitHub Copilot, you get a bunch of different stuff from the different providers that are out there and custom models as well. But one of the really fascinating things seems to be that you can do something called bring your own key when you're using this service.

04:57.14
Frank
yeah. Yeah.

04:58.58
James
And bring your own key BYOK as we often call it is the ability to say, i would like to use this other service over here to to route all my queries through.

05:13.18
James
And that may be specifically, maybe an API key with Anthropic or with OpenAI or maybe Microsoft Foundry.

05:20.76
Frank
Yeah.

05:22.77
James
And ah you would be paying those providers for the token usage, however their billing is, right? And you would just use GitHub Copilot. You wouldn't use the models from from from Microsoft, right? That are available through GitHub Copal, you use them through these other services.

05:38.36
James
And you might be saying, well, those all three sound like models that I might already have. So why would I do that? Well, the real reason I think is also to use a bunch of other models out there that maybe you don't necessarily access to.

05:50.24
James
And a very popular one is called Open Router. And Open Router allows you to navigate through and have a whole slew of things. Now, obviously with bring your own key, you know

05:58.30
Frank
Yeah.

06:04.28
James
the tuning, I'm very interested in your usage of the next thing I'll talk about, but I'm just gonna i'm gonna go long here and then we'll get into the thing, which is the the the biggest thing.

06:11.13
Frank
All right. Preach, preach.

06:13.75
James
preach. um But the system prompts, right, the prompts that are being sent, like the VS Code team, the Copal CLI teams are finely tuning those and the tools and stuff like that, right, as much as they can. They're open, you see them. Obviously, there's going to be a general system prompt for these other providers. They don't know exactly what you're picking necessarily. But then there's this other option, which ah Frank Kruger, I did a video on i'm gonna I'm gonna look it up here.

06:42.26
James
Let me go through my video history. Let's see Copilot CLI, this, that. I gotta to scroll for a while. I'm scrolling, I'm scrolling, I'm scrolling.

06:50.14
Frank
Produces so many videos, everyone. who Who keeps up?

06:52.60
James
um

06:52.95
Frank
Raise your hands if you keep up with James.

06:55.03
James
Ah, here it is. One year ago, it was called Bring Your Own Key and Models to Get Up Copilot and Visual Studio Code.

06:57.79
Frank
Wow.

07:03.77
James
And I talked about bring your own key, but also, running local models with Ollama or now you could also use Foundry Local.

07:11.45
Frank
um Okay.

07:13.25
James
So running local agents and local models on your machine, and instead which you wouldn't have to pay for anything because it's on your machine.

07:22.78
Frank
Yeah. um I've actually I've gone down every single one of these rabbit holes now with the great pricing apocalypse.

07:29.41
James
Hmm.

07:32.08
Frank
ive I just wanted to feel out the world and see like what kind of pricing can I expect out of different services and all that stuff. I did want to make one joke, though, about open router. Open router so bring your own key that within open router, you can bring your own key.

07:46.55
Frank
So you can bring your own open router key, but with an open router, you can bring your own key in there too. It is such a weird messed up world because open router actually merges all these different providers together. It's it's very funky, to be honest.

08:00.86
Frank
um I've been rocking the DeepSeek model right now. Because it is stupidly cheap. They're running this promotion where it's like 75% off the price. And the price is already cheap. And so 75% off that. I was just curious what I was doing.

08:18.97
Frank
So I did like a full day of work on it. And it cost me like $2. was like...

08:23.21
James
Wow. Nice.

08:23.74
Frank
That's a pretty good deal. pretty That was like 60 or 80 million tokens total. i was I was burning through the tokens and they have a good context length and it was just a couple bucks.

08:34.68
Frank
So I was trying to bring your own key thing. So I still rock everything through vs Code. chat, AI chat, agent chat. i don't know what the feature's called anymore because I used to call it copilot chat.

08:46.04
Frank
Is it copilot within VS Code? I can't keep all this straight, James, because it's just chat.

08:50.42
James
It's just chat. Just chat.

08:52.63
Frank
Just chatting, man.

08:53.05
James
Just chat.

08:53.98
Frank
ah

08:54.04
James
Just chatting.

08:55.19
Frank
Yeah, because Copilot's GitHub's thing. VS Code is more generic. VS Code has an interesting um implementation, though, where you, in general, you have to get an extension if you want to add models to it.

09:09.94
Frank
So it's a little bit funny because you were mentioning like generic prompts and such. All the model providers out there have kind of adopted the same API. They all just copied OpenAI's chat completions API.

09:23.26
Frank
But oddly enough, VS Code doesn't allow you to just plug into that. They really want you to get extensions and do funky stuff because they assume that every model wants to be tailored. It's all a bit of a joke, but whatever.

09:34.51
Frank
So, yeah.

09:34.94
James
Are you sure, Frank? Because you can go into model, manage, manage, manage model, model, manage, manage, man manage your models.

09:38.17
Frank
Model manager, head model, amp. and Yeah, manage your model models. so

09:46.06
James
Now, that being said, I will say that extensions can bring their own models into the mix as well.

09:51.00
Frank
Yes, yeah. So hit add model and there's going to be a fixed list.

09:52.09
James
So you go in.

09:55.70
James
you have your you have your language models and it has all of them in there. So you have your your opuses, your sonnets, your GPTs, but there is a add models, which has a dropdown for Anthropic, XAI, Google, Open Router, OpenAI, i OLAMA, OpenAI compatible and Azure.

10:03.98
Frank
Yeah. Yeah. Yeah. Sure. Yep.

10:18.26
Frank
oh Oh, maybe that's an insider one. I'll have to check out the OpenAI compatible one. I hadn't noticed that. So that's good. Thank you. Thank you, VS Code people, for doing that because yeah.

10:28.66
James
So my assumption is if you install an extension from a provider provider like Cerebris or something like that, then they would just basically pre-configure with that.

10:39.22
Frank
Yeah, and honestly it was getting a little bit annoying because we're we're to the point where there are so many AI shops out there offering models as services too.

10:50.31
Frank
They're they're providing the full everything um and they all have adopted the open a API. So um yeah, I guess like they're they're all still a little bit weird around how they do thinking and stuff like that.

11:03.66
Frank
But in general, they all kind of provide the same API and you can just plug into them. So yeah, I've been trying out all the models, man. Like I have put $10 credits into like every person out there.

11:15.39
James
Hmm.

11:15.98
Frank
i probably should be doing everything through Open Router, but I wanted to get a feel for all the different providers and see how they work. um And I've been pretty pleased. I got to be honest. um I think I'm still falling back to like the codex and the sonnets of the world.

11:33.02
Frank
um But I've really been enjoying just taking taking a break from the norm and going out and trying the more wild side of everything. But BYOQ has been fun for me.

11:44.38
James
Okay, so when you go in and you add a ah model, let's say you're adding this, don't know, how'd you do DeepSeek? Was that an extension or something?

11:52.92
Frank
Yeah, so DeepSeek has their own extension, um but it's just it's a plugin for the chat window, basically.

12:00.41
James
Oh, that's what I was going to ask.

12:00.68
Frank
That's,

12:02.78
James
So it shows up and it just shows up in the model selection.

12:06.87
Frank
Yeah, it it shows up ah it shows up too many models in the model selection, is to be thoroughly honest, because DeepSeq decided be to be a proper provider and offer many kind of models, not just the DeepSeq models.

12:20.20
James
Oh.

12:20.28
Frank
So honestly, ah the UI needs a bit of work because DeepSeq just threw in like a billion models into the list. And you can just scroll and scroll and scroll and scroll to find all the stuff that you want.

12:30.92
Frank
But yeah, that's the general idea.

12:31.80
James
Interesting.

12:32.60
Frank
These extensions can just add models to that list. And i don't fully understand these rules, but sometimes models added to that list are hidden by default. Other times they're visible by default. I don't fully get it. So if you add an extension looking for a model, you don't find it, go to the model manager and you can make it visible.

12:51.99
Frank
Or if you have too many lists there, go make some invisible because there can easily be too many.

12:57.11
James
Got that makes sense. So I think that's what the important factor is, is that in general, like the chat window in VS Code is a chat window, like all up, and it has all these integrations and ecosystem into it.

13:11.22
James
I still think you need a GitHub Copilot subscription even to use that.

13:16.49
Frank
Perhaps. It's hard to say. I don't know because I do have a GitHub Copilot subscription. So it's hard for me to say, but I will, I do watch my GitHub Copilot and the number is not going up. Like the requests aren't going up, um but I don't know.

13:26.96
James
Correct.

13:29.42
Frank
um Like sub agents are always kind of a weird thing. Like I don't know. If it always uses the model I chose for subagents or does it ever use something else as subagents? You know, I wonder about little details like that that honestly, I just don't know myself yet.

13:45.85
Frank
um i think that I think VS Code being VS Code, there's a million settings you can choose. I just don't know how the defaults work exactly, to be honest.

13:55.16
James
Let's see, Copilot Search says, yes, bring your own key in GitHub Copilot VS Code, only available to to certain tiers.

14:04.79
Frank
Interesting. Okay.

14:07.77
James
Yeah, but you connect to your model provider and you're built through them. You do not consume your GitHub Copilot quota.

14:12.49
Frank
Yeah.

14:14.84
Frank
Yeah.

14:15.00
James
Perfect, gotcha.

14:16.09
Frank
Fascinating.

14:16.20
James
Okay, so yeah.

14:16.33
Frank
Yeah. And this is how a lot of the other um apps work, like OpenCode and CloudCode and all all the, I'm not sure, can you do it in Codex? I'm not sure if that's available.

14:26.90
Frank
ah

14:26.92
James
I don't know.

14:27.93
Frank
Yeah, me neither. Sorry. Sorry, everyone. I'm still living the VS Code life. i i I love VS Code. I'm sticking around in there. um But VS Code is more than just the chat window. It is what everyone's calling the harness these days. I hate that term. So I tend to just call it agent because it took me a while to come around to agent. And I prefer calling things agent. And what agent is basically is a bunch of system prompts and a bunch of tools for editing files and ah but other little basic things like memories and plans, modes for plans and things like that.

15:04.25
Frank
So that's the agent to me. But a lot of people call that the harness because the word agent got overused a bit. So what's interesting is when you do choose like a model, and I'll just keep saying deep seek just because my brain's um stuck there for some reason. um You're still using the VS Code coding model.

15:24.29
Frank
harness it's still their system prompts it's their tools and all that stuff so my little mcp tools that i've added to my like settings file they they all work just fine so the models really have been commoditized quite a bit

15:42.06
James
That's pretty neat. Yeah, I think that the ability to just go in and then do this and then even run things like we'll talk about locally is very fascinating. Because obviously, you know you decide to um get build in one way or another or ideally even not get build at all.

16:04.41
James
I actually just installed the Foundry Toolkit for VS Code, which is for Microsoft Foundry.

16:11.68
Frank
Right.

16:11.90
James
And if, so this is one of those like deep seek seek that has like a bajillion models.

16:16.12
Frank
Yeah.

16:19.67
Frank
right

16:20.15
James
So I'm looking here, I'm looking, we have a code stroll, cohere command, deep seek R1 V3 V3, 3.2 V4 flash llama for metal llama, Mr.

16:20.82
Frank
yeah

16:29.75
Frank
Yeah.

16:33.43
James
Large, all the GPT models.

16:35.39
Frank
Yeah.

16:36.31
James
So I would could get build like through this stuff, basically, I assume.

16:39.45
Frank
Yeah.

16:40.19
James
And then there's those are get up models via the toolkit. And then there's Microsoft Foundry models, which are like Haiku and Opus and Sonnet and Codex and more. There's like, oh, my gosh, there's a rock and llama.

16:52.74
James
I was actually like there's a bajillion there. Yeah, there's tons. That's crazy. So they're all in there.

16:56.92
Frank
Yeah, and and if it's a bit overwhelming, I agree with you.

16:58.63
James
Wow.

17:01.04
Frank
It's a bit overwhelming. So um you can go to multiple websites out there, have kind of like shootouts, or they try to rank these models for different tasks. Obviously, the only task I care about is coding. um so and And the normal ones generally win. But, you know, there's some upstarts doing pretty good. The the Kimi gets a lot of good reviews, MinMax, Memo.

17:23.67
Frank
all sorts of throw a consonant and a vowel together. And there's probably a model out there with that name on it and you can try it out. um But yeah, those those are all still paying for stuff. You're still paying for a service. You're going over the internet. um You said something I got interested in was like, well, what if I don't want to pay anything?

17:44.70
Frank
And what if I just want to turn off my internet? Can all this stuff still work? And it can, James. It can.

17:52.98
James
It can. Yeah, okay. I'm interested because i did just also have that in my video, but I feel like I made that video a year ago and i my assumption here is that the models that I can run on my machine have dramatically changed between then and now.

18:12.79
James
So i'm really fascinated to see how that goes ah because I also literally just installed like again that foundry toolkit.

18:15.99
Frank
Yeah.

18:22.18
James
I feel like I installed it a while ago but I didn't go into the model selector.

18:22.74
Frank
Yeah.

18:25.06
James
Now it makes sense as a model provider and

18:25.43
Frank
Right.

18:29.08
James
there is a bunch of groupings and one is called Foundry Local via AI Toolkit. And there is all the DeepSeq models, GPT-OSS, Mistral, Phi Silica, all the Phi models, Quen, all the way up to three.

18:35.09
Frank
Cool.

18:41.16
Frank
Yep. Quan, my buddy Quan.

18:44.85
James
yeah So, and um and it shows you if there are capabilities is of vision or tools or, you know, what their context window is.

18:45.78
Frank
if

18:50.51
Frank
Yeah.

18:51.66
James
So um help us, Frank Kruger, understand,

18:54.87
Frank
Okay.

18:56.44
James
what you needed to do and what that experience was and what type of hardware you're running it on. Because right now I'm on my Windows laptop, which is running something, I don't even know.

19:08.74
James
um i can I can see what it's running. Let me go to the task manager and it is currently running a, a what is that, CPU, an AMD Ryzen 7 Pro 350 with

19:13.30
Frank
h

19:27.54
Frank
I don't know my Radeons very well. I apologize.

19:29.47
James
I do have an NP, i have an NPU.

19:31.22
Frank
so

19:31.43
James
Yeah,

19:33.19
Frank
NPU.

19:34.33
James
yeah neural processing unit, yeah.

19:34.58
Frank
Motion Processing Unit. Oh, NPU. Yeah. Okay. So, um I mean, in some ways, what you probably did a year ago is not too different from today because in the world of open source models, we all still kind of upload our open source models to Hugging Face, and there's billions upon billions of them up there.

19:56.86
Frank
all with the big names and everything. i would say the biggest change, James, is that um we've gotten, the models have just gotten better, like to a scary level.

20:08.50
Frank
So I remember just a few years ago, All I could run on my kind of Macs were the 7 billion parameter models. And even those were kind of slow because the software wasn't tuned for them.

20:21.65
Frank
People were happy to just get them working. They're like, oh my God, look, a local large language model running on my computer.

20:26.61
James
Yeah. Hmm.

20:27.25
Frank
Isn't that cool? Yeah. Well, this past year, people have just been obsessed with the engineering aspect of just making bigger models run faster. And it's glorious. I've been waiting for this time to happen for so long because like, yeah, all you got to do is throw some engineering prowess against these things. They're just big calculators, you know, just people love writing calculator code, just make better calculator code people. And so um what has happened is the models that are available are better.

20:58.50
Frank
you And the things that can run them can run bigger models and in a more sophisticated fashion. So, um and I was a bit out of the loop also. So I wanted to take this last week and try some things out and just see how they feel. So I was living in the Opus world, just for context. I spent all of April, checking what month it is. I spent all of April living that Opus life.

21:25.33
Frank
um It's my buddy. It's my friend. We get along great. um so But then I spent a few days running the DeepSeq, and now I've spent few days running QWEN.

21:37.02
Frank
QWEN, Q-W-E-N. It's a 27 billion parameter model. The ones that people like are QWEN version 3.5 and QWEN version 3.6. It's well understood these are not the greatest coding models out there.

21:55.00
Frank
But on my hardware, which I'll get to, you can run 256k context windows, which is huge.

22:02.20
James
and That's pretty good.

22:03.34
Frank
Because even with Opus, um VS Code was limiting to me to 192k before it did like compaction and that kind of stuff.

22:03.74
James
Yeah.

22:12.50
Frank
So a 256k window locally is really impressive. like It blows my mind. James, when I, you know, I work on that cuneiform project where I'm training large language models. You know what the largest context window I could put onto that was?

22:26.42
Frank
Just guess.

22:28.95
James
ah Maybe like 20 twenty k

22:32.86
Frank
512.5 K's.

22:34.01
James
now No, no.

22:37.02
Frank
That was the biggest I could get on there. That's well, because that was 32 bit floating point math and it was doing all sorts of complicated things. And the engineering has gotten to the point where I can do 262 It's just impressive.

22:51.24
James
That's crazy.

22:51.86
Frank
So um we both have Mac minis, kind of overpowered Mac minis. So we can talk about those, but I'll talk about what I've, yeah, there she is. Isn't she cute? Yeah. um What I've actually been running mine on is I've had an RTX 3090.

23:05.52
Frank
Anyone who's ever listened to me talk about things, it's it's my favorite GPU ever.

23:06.37
James
right.

23:09.92
Frank
I've had it for a few years now. I love using it for everything. ah But truth is, it's just been sitting there idling, doing absolutely nothing for the past few months just because I haven't been training neural networks or anything. So I'm like...

23:23.29
Frank
The reason I like the 3090 is it has 24 gigabytes of VRAM, RAM, whatever you want to call it, fast RAM built into it. And 24 gigabytes can actually run these 31 billion parameter models if you use 4-bit quantization. So that is, we are using 4 bits for every parameter in the model.

23:46.97
Frank
So you need roughly 15 gigabytes in order to run these models. And then you need the other nine gigabytes for your context and your output and all that kind of stuff.

24:02.42
Frank
So I went through the process of installing llama.c++. It's one of my favorite runners out there. You've already mentioned OLAMA is a good choice. um Kind of the industry standard is another one called VLLM.

24:18.24
Frank
In the Apple world, there's MLX now. you know there's There's runners everywhere. People have gotten the religion of optimizing these things and installing them. So I went through the process and I wrote a little blog entry about it. Everyone can follow along if they want to go to my blog, preclorum.org.

24:34.36
Frank
And I give all the setup instructions. It was easy, Jane. Download the model, pass a few command line arguments, be amazed at how port forwarding still doesn't work in the year 2020. Yeah.

24:46.07
James
and

24:47.64
Frank
Try to get your stupid network into good shape. Try to figure out what incantations vs Code wants. And all of a sudden, i had I chose Quinn to start with.

24:58.42
Frank
um It was there. It was just VS Code, acting like an agent, running sub-agents, doing planning mode, running with autopilot because life's too short to care about permissions and all that kind of stuff.

25:13.08
Frank
And i had it analyzing my code right away. and I want to talk about like my experience with it, but that's pretty cool that I was pretty easily able to just get this thing up and running. This software has really advanced, I think.

25:26.87
James
Yeah, no, I think when the the team first started adding that feature, you know, it seems like an advanced feature because you were talking about hardware requirements and GPUs, you know, the 3090s on cheap ah nowadays, maybe back in the days of yore.

25:37.04
Frank
Yeah.

25:39.52
Frank
No.

25:41.78
James
But... ah ah but you know, being able to plug in an API key or plug in a thing, you know, that's relatively straightforward, and but it's still running remote remotely. But having something running on your machine that you'd be on a plane, you'd be disconnected, it's just sitting there ready to go is really magical.

26:02.14
James
That being said, there is still some edumacation because like you said, you have to install a thing and to get this thing and then do some port forwarding stuff. But that being said,

26:15.38
James
as far as the steps go to get that required and then it just kind of works is a magical feeling at the end of the day. It's kind of like when you get like an Android emulator up and running, you're like whoa, look at that thing.

26:24.36
Frank
Yeah.

26:24.68
James
And then you're like, wow, this i run it's like, wow, that only took a thousand steps and a bunch of pieces of software installed and bunch of SDKs to download. Then boom, there it is, great. I did it, right?

26:32.25
Frank
Yeah.

26:33.52
James
But that's really neat that one, it's built in, it's just ready to go.

26:33.78
Frank
Yeah.

26:39.16
James
And then that you're able to figure it out. It seems relatively relatively straightforward.

26:44.92
Frank
Yeah, I think what made it so comfortable is I didn't have to switch out of my normal workflow because over the past whatever months, I've gotten very used to using the agent chat and everything in VS code. I know how it works. I know what to expect out of it.

26:58.23
Frank
And it was so weird to just see a model that I knew was running because, you know how I knew it was running, because I could hear the jet engine on the other side of the room as every fan on that RTX spun up.

27:05.24
James
Yeah. yeah

27:09.55
Frank
You know what you know what optimizing a model runner really means? They are... pushing that video card as hard as it can possibly be pushed. And that means power usage. And that means fans, thermal regulation. So like I do a lot of network training and I hear the fans on that thing. I have never heard it screaming the way it screams when I ask a VS code at chat thing to go analyze every single file in my repo and report back every bug. Fine.

27:40.22
Frank
ah It just starts howling. but um i So that was just such a satisfactory feeling knowing and I even did that funny trick of I disconnected from the internet. Obviously, my network was still up, but I disconnected from the internet just to prove that everything was happening locally. And it was, it was beautiful.

28:01.88
Frank
So model quality, though, I think this is kind of the big one. um We are running compromised models here. For one, they're much smaller than even like Sonnet. I don't know how big Sonnet is, but it's probably five to 10 times larger than the Quinn 27 billion parameters. It's probably like 300 billion, something out there.

28:24.34
Frank
Um, And on top of that, I'm running a compromised version of the model. I'm using a quantized version of the model. So four bits per parameter, when usually you have 16 bits per parameter.

28:38.51
Frank
And then I'm compromising it even more. The context that it's keeping and the output buffer it's using are also quantized. Those should all be 16-bit.

28:48.83
Frank
They ain't. They're 4-bit also.

28:50.46
James
Mm.

28:51.43
Frank
Yeah. That's how you get the big context out of it. You quantize everything. yeah You really compress it down. um But at the same time, James, I can't really tell the difference between Opus, DeepSeq, and the stupid 27 billion parameter model running on my other machine.

29:12.38
Frank
And I don't know what it is because if you asked me a month ago, I'd say opus forever. Don't you take the opus from me ever. And now I'm just like, can I even tell the difference between any of these models? I'm starting to get very suspicious of myself.

29:29.40
Frank
and So honestly, for the past day, I've just been like running the same prompt on like six different models to just see what happens with all of them. So I do want to say, OK, it's slower for sure.

29:40.55
Frank
I'm getting like 40 tokens a second. I think Deep Seek, you get like 60 tokens a second. Opus, you get three tokens per second. Just kidding. Opus is fast when Anthropic wants it to be fast. Other times it's horrendously slow.

29:54.71
Frank
Um, it's a big jet engine over there and it does make mistakes once in a while, but you know what? Opus makes mistakes once in a while. Um, it's sometimes bad at file merging. This is like, you can tell where the VS code, um,

30:11.62
Frank
harness and prompts aren't perfect because like it keeps messing up how it uses the file read tools like oh the file read tool needs line numbers i guess i'll pass it line numbers like yeah okay maybe you've just done that in the beginning like there's these funny little things but like it figures it out at if at just every session it has to figure it out every time but it figures it out

30:21.79
James
Oh, yeah.

30:33.08
Frank
And I just, I don't know what's wrong with me or the world or what's going on, but I have been so satisfied with these tiny little models. And I'm pretty sure I'm not deluding myself, but like, I think it's just the systems that we've developed, like good agents files, doing planning, asking questions, and then the coding part just becomes kind of rote once you've kind of approved all the plans and everything.

30:50.52
James
Hmm.

30:58.34
Frank
So yeah. Aside from the slowness, I honestly can't tell the difference between my little local model and a sonnet or something like that out there or a small codex.

31:11.45
Frank
And I've also been trying, um Google has Gemma or Gemma, I'm not sure how you're supposed to pronounce it. ah Gamma 4. And that's a great little model, too. It's hard for me to tell the difference between QN and Gamma. You go to all the websites that benchmark these things. They all use that stupid SWE bench.

31:31.58
Frank
Benchmarks are all a joke, everyone. Just put that out there. But... um

31:34.76
James
That's

31:36.28
Frank
you know, the graph says this one's way better than the other one. But then in practice, you're like, I don't know. They're all making my job better and my life happier. And aside from the fans roaring, it's really hard to tell the difference. So I've i've been really excited by all of this.

31:53.36
Frank
It's, it's, yeah.

31:53.91
James
pretty... Yeah, it's pretty neat. I mean, I think the biggest thing that I had a while ago was the speed of it, but I was also, I forget what hardware, maybe i was running it on my Mac Mini, I'm not positive.

32:00.73
Frank
Yeah.

32:04.66
James
But mostly the speed of it, you know, I use a lot of GPT models, which are quite fast, especially in CopileX, they're hosted in Azure, right? So they've got speediness to them. and agency and then or some of the smaller models like a like a sonnet or like a haiku or quite quick quick right compared to an opus so i i'd use a lot of those those models all the time and to me the speed is one factor but also correctness and being able to read and just the the answer at the end of the day is like is it correct or is it not i've been doing this

32:30.21
Frank
Sure.

32:36.22
James
trial ah since the beginning of the month. I posted about it, which is I'm not picking any model manually by hand anymore.

32:43.16
Frank
Oh, God.

32:44.18
James
I'm only using auto model, which auto model. Well, well, it seems to be different for for each tool, um but On the trip to Guatemala, I only have my phone.

32:58.00
James
And as you know, on flights and long layovers, now that I have Starlink on Alaska, I do a crap ton of coding.

33:02.69
Frank
yeah

33:04.48
James
And I do it all with the cloud agent, the GitHub coding agent, the cloud agent.

33:08.44
Frank
yeah yeah

33:09.23
James
And the default is auto. So I just let it YOLO. And then when I got home, I said, I'm only going to use... I mean, when I say only, I mean like 95% of the time, unless I need to switch something because something is just not clicking.

33:20.50
James
But inside of VS Code and now inside the CLI, you can just end VS, you can just hit auto. And based on like, it's not based on your prompt, it's a based on like availability and a few other factors of the model, it'll pick a model. Often this is like a Sonnet model or a GPT model, but it really just depends on time of day XYZ and they give you a little bit of a discount, 10% off for using auto.

33:44.28
James
And what I like about auto is that if it's based on availability, that means it's probably pretty quick and it's moving really quick too, because it's not being stressed.

33:50.74
Frank
Right. Yeah.

33:53.91
James
And um I'm trying to figure out, and the reason for this experiment is, How much does the model matter? And this is exactly what you're doing, right?

34:02.84
Frank
Yeah.

34:04.57
James
And I do believe that the models do matter in some instances.

34:04.89
Frank
Yeah.

34:07.93
James
So what I'm trying to comprehend in my mind is when does the model matter? When does the reasoning matter?

34:13.08
Frank
Right.

34:15.00
James
Because it's so easy to default into models that we think are the thing that's good for this thing.

34:19.62
Frank
Mm-hmm.

34:20.81
James
compared to just saying, maybe they're all pretty good. And then just kind of YOLO it on the auto model selection. Now, when I mean auto, I don't mean autopilot, I don't mean bypass. I mean like there is a dropdown that says auto and just go and it will pick from the available models.

34:33.16
Frank
0.9x, baby.

34:34.47
James
0.9X and just a little as a rip, right? So you get that little sweet discount and go to town there. So I'm trying to do almost the same experiment but obviously running you know normalm you know foundation um models in in the cloud, but but kind of the same thing.

34:41.96
Frank
Mm-hmm.

34:48.42
James
Does the Gemma, does the Quinn, does the DeepSeq model, how much of it does it make a difference? There's probably, I'm interested in this, like was there a point, i know it's only been like a week or so, but has there been a point where you did switch back to Opus or Sonic?

35:03.39
James
Because you said in the beginning, like you were still favor of these models. Favoring is not the same as using or having to use.

35:06.70
Frank
Right.

35:11.54
James
them? Has there been a point so far where you're like, you know what, I'm actually going to switch to this other model because my house is getting too hot.

35:21.53
Frank
You know, I kind of like hearing it spin up. I keep making fun of the fan noise, but like, you feel like you're a power user. You're like, hey, ah rename this variable and then jet engine start off. You're like, oh boy, that's getting really renamed over there.

35:36.38
Frank
um First, I want to say I really appreciate like the auto thing because i i think that's the biggest lesson I want to take out of this. I don't even want to promote like... um local models are the way to go necessarily. More I'm just realizing how important your processes and systems and the the the information we give these models, just how important that all is.

36:01.27
Frank
But to answer your question, I've had zero interest in going back to the big provider models. um Every so often you have like a little Tweety bird in the back of your head saying like, I wonder if Opus would have solved this already.

36:14.66
James
Mm-hmm.

36:17.23
Frank
you know Every time it introduces like a little bug or something like that, you wonder would Opus have made that mistake or would Sonnet or would Codex have made that mistake?

36:24.60
James
yeah

36:27.96
Frank
But in the end, like, who cares? It's free. I just tell it to go fix the mistake and it goes and fixes the mistake. You know, it's it's the same UI and everything. So I think maybe it's maybe too soon to know whether I'll switch back and probably I'll switch back because I'm going to keep paying for a co-pilot and I'll use some credits there. You know where I'm really going to use co-pilot is the cloud stuff, like you mentioned.

36:52.18
Frank
Because what I found is when I'm being truly productive, I'm using issues on GitHub and PRs on GitHub and using the cloud models. That's when I'm getting like my six things are happening at once and I'm being ultra productive.

37:08.44
Frank
The model I'm using on my dev computer, it's okay because I'm usually thinking through a problem or I'm in a greenfield application and I'm doing design work. You know, I'm not trying to implement a feature or fix a bug.

37:19.60
James
yeah

37:22.82
Frank
That's that's all happening and on the web.

37:25.11
James
yeah

37:25.78
Frank
That's just happening in the background. this is though I'm having a discussion with the AI and it's okay if it takes a second or two to have its little discussion and I don't mind that.

37:36.12
Frank
So I think that that's actually where I'm going to kind of settle. Like for my dev machine, I might just keep cranking on the 3090 and just enjoying it from enjoying trying different models to just getting out of the hegemony of anthropic and open AI. It's fun to use these open source models. They're all different. They have different capabilities. um You know what they'll allow you to do and that kind of stuff.

38:03.77
Frank
ah So, i yeah, I'll put it this way. We are at the 5th of May, not too much behind the scenes on how our ah recording schedule.

38:15.03
Frank
And I have used 0.6% of my co-pilot credits because I've just been rocking these local models and it's been fine. And I have felt very little need to go to the bigger models. Yeah.

38:29.85
Frank
That said, when I use the cloud-based models, that's when I'll probably be digging into my co-pilot credits or tokens, whatever they're going to call them after June.

38:43.49
James
Usage.

38:45.06
Frank
Usage.

38:46.24
James
Usage.

38:46.37
Frank
Fat.

38:47.22
James
So yeah, that's really, that kind of gets back to that ecosystem play, right? And yeah we're doing those GitHub Copilot dev days and I kind of talk about that, like the ability to,

38:58.71
James
tap into the ecosystem, tap into other model providers, tap into other, you know, um, harnesses, if you will, and being able to use things on the plane, on your computer, in the terminal, you know, on your phone, you know, all these different areas, assign issues, do these things.

39:12.57
Frank
Yeah. So,

39:14.86
James
And that's how I use it. So to me, it's not about one tool. it's about the combination of all the tools. Now what's fascinating is like, we're going to use those tools and then we're going to, um, use different models in different instances based on where those tools are running.

39:27.06
James
Right. Because If you're using a cloud agent, it's gonna be running and needs to talk to cloud stuff, right? So that's really, um really, really interesting in general to think about.

39:38.91
James
um I, yeah.

39:40.63
Frank
i want I want to talk about performance a little. Sorry, didn't mean interrupt.

39:43.90
James
Yeah, yeah. I was gonna ask, like did you have a specific thing around models, performance, um outputs that you were leaning towards?

39:45.59
Frank
Okay, so, yeah.

39:52.80
James
Because you talked about the different ones and that you're wanting to explore with more, but yeah.

39:57.72
Frank
Yeah, and i I've been doing AIs forever, and people keep talking about tokens per second, and I honestly don't know, like... What's a good number? You know, obviously bigger is better.

40:08.34
James
Yeah.

40:09.43
Frank
That much I know. But is 40 tokens per second tolerable? And what I've discovered is it very much is. You definitely get annoyed with the thinking loops that a lot of these reasoning models can get into.

40:22.33
James
yeah

40:24.54
Frank
um There's a lot of people who are just turning reasoning off because like it's faster for it to make a mistake and you correct that mistake than watch it think in loops forever.

40:35.43
Frank
It seems these things really seem to get into cyclic loops and all that kind of stuff. So I think that there there are benefits to turn that off. But 40 tokens per second, it's still faster than I can read.

40:47.22
Frank
So when it's editing code, it's plenty fast. When it's pumping out text for me to read, it's faster than I can read. So it's fine. um Now, that 40 is, again, with the the model quantized and all that running on today, I think, the 3090s, a 1,500-hour GPU.

41:08.95
Frank
ah But we both have Macs. And you can take that exact model, same exact model, same quantization, everything, run it on the Mac. And you're probably going to get like 17 tokens per second.

41:24.21
Frank
Which, again, is fine for text. But when you have like sub-agents going and reading your entire repository, too slow.

41:35.42
Frank
When you have it thinking in loops, too slow. um So the 17 is really bad. The good news is, James, in the last week, all of this has really been improving. um There's a new technology out there, MTP, and I'm trying to remember what the stupid thing stands for. Multi-token magic, MTM. I forget what the P stands for.

42:00.57
Frank
um it's It's this like predictive, oh, maybe it stands for predictive, where there's actually these... models have smaller models that are just kind of dumber. And they're just guessing what the next few tokens are going to be.

42:17.11
Frank
And then the big model just checks whether the next few tokens should have been those things. And it turns out this cooperative way of um doing inference is faster.

42:31.61
Frank
So you have a little stupid model doing a bunch of predictions, and then the bigger model is actually um validating those predictions. And both Quen and gamma slash Gemma ah support this.

42:47.03
Frank
And the new software out there, um there's on Mac, there's MTPLX, which uses the, oh my God, it's too many libraries.

42:47.04
James
Oh, cool.

42:58.84
Frank
And this is the MLX library, i like a fork of it to do this, where it does the intelligent thing, where it has a stupid model running ahead and doing stupid predictions and then the bigger model validating.

43:12.59
Frank
And then um this is something Google invented, by the way.

43:13.01
James
Cleaning it up.

43:16.43
Frank
So everyone just like, oh, that's a good idea. Let's take that. And so ah Google just released that feature for Gemma, which already was a powerful model, but now it's fast.

43:26.65
Frank
So now you can run these models, these 30-ish billion parameter models, at 60 tokens per second on our max, James.

43:38.71
Frank
So I'm telling you, that's plenty fast because even a lot of the service providers out there for the big, big models, they're only giving you about 60 tokens per second between network bandwidth and them being overloaded by DOS attacks and all that kind of terrible stuff that they have to deal with.

43:58.98
Frank
60 tokens per second is plenty fast enough. And it's really cool that just in the just in the last month, I would say, this new technology has come out and is really revolutionizing inference speed.

44:12.95
James
That's crazy. Well, I do want to ask one thing because not everyone has a 3090 sitting around Frank Kruger.

44:17.70
Frank
Nice.

44:18.52
James
And the real question becomes like, what if I just have like my Mac mini or my Mac mini or more realistically, maybe a MacBook air that's sitting over here.

44:30.71
James
Like, I think I almost want to follow up next week, which is like, what can you do on your MacBook air inside of a coffee shop?

44:30.87
Frank
Yeah.

44:36.72
James
Right.

44:37.56
Frank
Yeah.

44:37.72
James
What could you do on your Mac mini that's over here? Cause I don't think realistically, that going out and asking people to buy a 3090 to run these models is going to be realistic, because then I would just spend that on toke token usage maybe, right?

44:48.61
Frank
Right. Yeah.

44:55.05
Frank
Yeah.

44:55.03
James
um More realistically, what I'm really fascinated about is maybe running smaller, tiny models as sub-agents automatically so they don't use tokens.

45:04.63
Frank
Right. Yes.

45:05.86
James
Or maybe there is a, hey, this one works really well on the Mac Mini. And so I'd love to see a follow-up article, which is, hey, you have this, but here is how you can use an optimized model based on the hardware that you have. Cause some people like us have our beefy, what, how many gigs of randomness thing have like 64, 128,

45:25.11
Frank
I went with 64. I can't remember what you went with. I think you went the same.

45:28.16
James
64 then, whatever you got.

45:28.86
Frank
Yeah, 64. Yeah, cool.

45:29.66
James
And then I got the two terabyte, you got the four terabyte.

45:30.17
Frank
yeah cool

45:32.70
James
So we got this, right? So this isn't even realistic because many people aren't going to buy this configuration.

45:34.51
Frank
Yeah. Yeah.

45:37.50
James
They're going to buy the base model, which is 16 gigs of RAM, which means it's more realistic on our M1. know it's a different M4 Pro, but M1, which is a 16 gigs of RAM on our little MacBook Air. So the question is like, what could be a good optimized model? How would you run in here locally versus running it on this thing here? Because I think that is a more realistic scenario.

46:03.83
Frank
Yeah, 100%. And I even make that point in my blog because I price out what would it cost to build a 3090 machine these days? And it's about $2,000.

46:13.75
Frank
And $2,000 gives you 10 months of the highest tier of Claude code, you know, that that are, yeah, don't know.

46:13.94
James
Oof.

46:21.08
James
Yeah.

46:24.69
Frank
What's the most expensive copilot, like $40 a month. So that's, that's,

46:29.82
James
A lot. Yeah.

46:30.36
Frank
months ah so yeah you can i'm not even going out there and saying go build a machine or anything like this but what i do say is if you have a machine like that you'll probably be using and then yeah yeah

46:32.13
James
Yeah,

46:42.97
James
yeah there you go. But I'm also interested in, I do have small, I have a bunch of machines. I don't have that machine, but I do have like little, I mean, I have to imagine a Mac mini, it's gotta be doing something.

46:53.21
James
Did you run any models on the Mac mini directly?

46:54.23
Frank
yeah 100%. Yeah, I have been. um So I have mixed feelings about it because our Macs do a great job running it, but it is definitely eating at the memory and it's eating at the GPU. So you're slowing down basically everything else on the computer. And the fan turns on on my Mac mini. Every time the fan turns on in the Mac mini, I'm like, oh, I'm sorry.

47:17.10
Frank
I'm sorry. I don't mean to hurt you. Like, I don't care.

47:19.29
James
Yeah.

47:20.52
Frank
The RTX can start a fire back there. I don't care. You know, burn, baby, burn. But... My little Mac mini, I'm like, oh, I'm sorry, are you getting too hot over there? So um that's probably stupid of me.

47:32.98
Frank
But probably the biggest, weirdest change that I've seen um in AI since I started is in the beginning, we were all compute obsessed. How fast is your computer?

47:43.31
Frank
How many flops can you do?

47:44.00
James
Mm.

47:45.56
Frank
Today, that's almost irrelevant. All anyone cares about is how much RAM do you have and how fast is that RAM? Because these models are so big. The reason the 3090s, 4090s, are nice is because they have 24 gigabytes of the RAM.

48:02.14
Frank
And therefore, these small models are small. 30 billion parameters can fit on them. Um, because we have 64 jig lights, you can fit some even bigger models, but it turns out like the model makers out there, they, at there the, the small models are about 30 billion parameters.

48:22.26
Frank
And then the next step up is 300 billion. There's not really an in-between ground there.

48:25.09
James
Wow.

48:27.26
Frank
So, um, you definitely can take like the Quen 27 billion, run it on your, um, Mac and it's going to run great, but it is going to use all the resources and you are depleting your dev machine and you'll notice hiccups here and there. Like I had YouTube music playing in the background and every so often you'd get like some static because the poor little processor is cranking away.

48:53.37
Frank
um and you And when you go to activity monitor, you see the GPU usage at 90%. You're like, ooh, okay, we are burning this puppy. um And then it becomes a thermal game.

49:04.30
Frank
Like everything else, it becomes a thermal game. So yeah, it's it's a mixed bag running these things yourself.

49:07.26
James
I think, yeah, I think that the bummer part here may be, I could be wrong, but the Mac mini does not support eGPUs. So you can't just like plug in an external GPU into your Mac mini.

49:29.37
Frank
ah Yeah, I agree. And, you know, I've actually run eGPUs with Macs in the past, and that's even a bit of a mess, to be thoroughly honest.

49:38.41
James
Hmm.

49:39.06
Frank
um Thunderbolt is fast, but you're sending so much data so fast to make these models work. It's... a little bit weird. One of the reasons MTP is such a good technology is it actually keeps a lot more of the data on the video card before it ever has to get back to the CPU.

49:59.70
Frank
um They try to do like their sampling on the video card.

49:59.65
James
Hmm.

50:03.86
Frank
People don't think about it, but these are probabilistic models. For every vocabulary word out there, it's giving a probability that it's the next word. And something has to take that list and sort that list and take the top K and do the top P off of that top K. And that's not, that is a serial operation. That is not efficient on a GPU. GPUs are parallel devices. And that is a giant serial operation that has to happen. And so for unoptimized models, there is a very huge cost we all have to pay if you're running these things yourself, where lots of memory has to be copied down from the video buffers into CPU buffers.

50:47.58
Frank
Unless you have a unified memory architecture like on a Mac. and then

50:51.61
James
Yeah.

50:52.18
Frank
Life is good. I think some of the Windows ARM computers have unified memory architectures too. I just know less about them than M's.

50:58.90
James
Interesting.

51:03.60
James
All right.

51:05.18
Frank
All right.

51:05.72
James
I am going to try something on my Mac, that's for sure, because i do have it just sitting here ready to go.

51:06.20
Frank
Well.

51:13.58
James
I think it's fascinating and do want that little little fan to spin up on occasion.

51:17.88
Frank
Yeah.

51:18.05
James
But I do also want to point out that you can actually do this in the CLI now too.

51:18.84
Frank
and

51:22.41
James
You can actually configure it.

51:23.68
Frank
Hmm.

51:23.65
James
don't think it's as simple, but I know Kayla Cinnamon on hers went out and um has a whole thing about how to configure OLAMA and run it as the the select the models that are running there inside the CLI as well if you want to run in the CLI.

51:39.24
James
So it could be cool too to give it a try.

51:42.49
Frank
Yeah, and I think it's just one of those. here's Here's the real deal, James. I never want to go back to writing code in the old way anymore.

51:53.53
James
Mm. Yeah.

51:54.49
Frank
But I don't like the idea that my ability to code is dependent upon paying someone a service fee every month. I don't love that.

52:03.64
James
Mm.

52:05.30
Frank
So what I like about all of this, and it's really more of a mental thing than anything else, to be thoroughly honest, is I feel like I'm back to I myself in my little office without an internet connection can still code again in the way I prefer to code these days as coding has gone through a transformation in this past year.

52:26.49
Frank
And I don't want to go back. It's just, I'm over it. I want to stay in this world.

52:30.20
James
Yeah, me too.

52:31.74
Frank
And ah it's it's a little bit of a liberation to run these models yourself and to see, okay, they're good enough. I can still keep doing my planning, keep doing all my, keep doing my agentic workflows. Feels good.

52:45.67
James
I'm glad that we changed you over. Took a little bit what we got there, Frank. And yeah, I'm excited for the year ahead. I mean, it's only May and who knows? I mean, for me, I think with the next next steps here as we wrap up is it's more feasible than ever to use these models locally and combine them with the ones that you're using on the cloud and the different services of where you're doing your coding and how you're coding.

53:05.94
James
And what I'd love to see even progress even more is for all of these bits and pieces to come together. And I think they have come a little bit together in the form of extensions, but i would love to see that ecosystem unify even more. like I feel like we have the capability of building some sort of GUI or some sort of thing that really streamlines these processes and makes it easy to configure and connect all these different harnesses and tools that we're using out there to make it easier than ever. like That's what I would kind of expect because that would then enable more individuals to do that or maybe just build it into the tooling. like I do think that there is

53:40.95
James
something good and something bad about what Apple did on their machines. And I guess technically even on my Windows machine, there's there's models that are running locally for all the AI stuff. But it is like, do I want to install this thing?

53:52.95
James
And one thing that might be fascinating is, is there a future that I see, which is streamlining this process to say, okay, download this model, like an easier configuration wizard is what I'm saying.

54:05.25
James
Like there's all these libraries and all these things, like how do we streamline that bit and piece?

54:05.43
Frank
Yeah.

54:10.86
Frank
Unified, I'll just take because I'll be honest, I have like six different model runners on this computer and they all store the models in different directories. And I have models everywhere on this hard drive and it's just eating all my gigabytes and I want my gigabytes back.

54:22.36
James
Yeah.

54:27.01
Frank
But I'm not going to go organize these.

54:27.38
James
Yeah.

54:29.09
Frank
I have to run my grand perspective to go find all these stupid models. So I'm hoping at this WWDC, Apple comes up with some unification for um having like background services and good things that can

54:33.35
James
That's great.

54:44.31
James
Yeah.

54:47.23
Frank
it's It's tough because, sorry, i we are trying to wrap up, but like we're still in the wild, wild west here. you know Innovation is happening. So you don't want to unify, prematurely unify and prematurely standardize this stuff.

54:59.35
James
yeah

55:01.01
Frank
It's fun having everyone ah competing against each other for speed and all that. And I don't want to lose that competitive spirit. But it is a little bit exhausting. So hopefully within a couple of years, it'll all be unified and standardized.

55:13.84
James
I'd like to see it. Well, let us know if you're running a local model, where and how you're running it. Maybe you have a 3090, maybe you don't like me. And then what are you running it on? Let us know. Go head over to our YouTube, youtube.com forward slash at Merge Conflict FM. And that's a great way to leave a comment on this episode or any of our past comments. And we'll do a listener bag episode. There's quite a few chimed up, so we'll get it going. So let us know. um But ah that's it for your local edition of Merge Conflict. until next time, I'm James Montemag now.

55:41.24
Frank
And I'm Frank Kruger. thanks for watching and listening

55:44.89
James
Peace.