00:00.60
James
Welcome back everyone to Merge Conflict, your weekly developer podcast, breaking down all the things the latest and greatest in the world of software development and AI and models and conferences and all those goodies. Frank, on the road, Kruger, how's it going, buddy?

00:14.71
Frank
Norcal PTAC Co- James coming to you live from actually don't know where I am so from a motel hi James it's wonderful to be here I didn't think I was going to make James Forrest, Norcal PTAC Co- We thing where, no matter where we are in the world, we try to record the show, and this is one of those episodes I don't know where I am in the world, but here I am recording.

00:35.67
James
Frank is live from Willits, California.

00:38.51
Frank
Oh, hi. Hi, California. That would explain all the California license plates I've seen lately.

00:44.59
James
They're everywhere. You can't get a you can't get away from them. Like, oh, what's going on? Yeah, we've been, ah Frank is doing this big cross-country tour up and down, yeah north to south and back up again.

00:57.56
James
And Frank stayed with us, which was great, which was fantastic. And now, uh he's on the road maybe we'll see him again we'll see but it's been pretty cool to follow you because every i would say like was telling you like five or six outer hours or so heather's like you know and i'm like where's frank where's frank is he gonna make it is he here where's he at was here and then we're like is he stopping is he going i like why is he going through death valley we told him not to go through death valley what's he doing is he okay is he alive uh so it's been it's been a joy to watch you from a distance uh on it

01:28.89
Frank
Yeah, it's funny. um you know the I never understood what the Find My feature is for on iPhones, because every time I lose my iPhone, it's like, in an ocean or something.

01:39.21
Frank
So it's not good for it.

01:39.70
James
Bye.

01:40.66
Frank
the Find my device doesn't work. But I do love tracking people and living vicariously through other people. And I don't want to be snoopy or anything, but I love it when other people do it to me too. They're like, Hey, you're in this place.

01:52.10
Frank
And i'm like, I am in that place. How did you know? oh yeah, it's 2026 and all these features are there. But honestly, it's also comforting knowing that people are watching me because You know, if my dot stopped in Death Valley for, you know, 12 hours, maybe you should call someone, James, and i will hold you to that.

02:11.74
Frank
That's your job. If you see my dot stop for too long.

02:14.62
James
I may have been checking it minimum twice a day just to make sure the dot was moving. I'm not going to lie about it. That is factual. I'm not going to lie. Well, because it is scary. It's a big.

02:25.10
James
You're out there by yourself doing your own thing. And i don't know. It's a long distance. So, no, it's it's funny. my I have some coworkers that are like Gen Z, and they were telling me that that is...

02:30.24
Frank
Yeah.

02:35.16
James
Find My is like a it's like a social norm. Pretty much ah the friend group, the neighborhood, everybody is on it.

02:40.18
Frank
think

02:43.51
James
like I had some my coworkers show me their Find My and they would have like 70 people, like everybody, every single person. And it was not just because they're friends, but they're like checking on each other. They're doing stuff. There's kind of like this subculture to it.

03:00.18
James
but it was almost like their i you know, their eye mess I, guess they're all on WhatsApp, but it's like their I message, you know what I mean? it's like, it's very, like that's how they communicate.

03:04.98
Frank
Yeah. That's funny. I just got the FOMO thing of like, well, I've only got like 20 friends on Find Maya. I got to collect more. i got find more people that will let me snoop on them constantly.

03:16.82
James
ah yeah i got four, I got you, Craig, Heather, I think I got Scott now, cause he accidentally did, were, you know,

03:19.17
Frank
You've got four. ah

03:24.70
Frank
You're like, you can' you can't have it back, Scott. I can know where you are forever now.

03:27.99
James
Well, because, you know, when I go into Find because I do the thing when I'm at, like, I was just at Microsoft Build 2026, we'll talk about today. um But, you know, when you go into Find My, do have one, two, three, four, five five people.

03:41.36
James
um I have more devices than people. But it does say, Scott Hanselman, this person shares their location with you.

03:44.32
Frank
We'll get you more followers, James.

03:48.64
James
do you want to share back? You know what i mean? They kind of guilt you into it.

03:51.29
Frank
And you're like, nope, nope.

03:52.28
James
Uh, no.

03:53.02
Frank
Scott can't know where I am.

03:53.04
James
And I, Scott is in his, uh, oh, Scott said a movie theater. and That's what he's doing.

03:59.37
Frank
Don't be creepy.

03:59.60
James
Uh, I'll do it. I know that's the, well, now can. Well, so when we go to conferences, I always do the share until the end of the day, you know, for my coworkers, because it would be,

04:06.05
Frank
Yeah. Right. But the buttons are right next to each other. You never know which one you're going to click.

04:12.63
James
You got to audit. You got to audit. That is for sure. Well, I was at Microsoft Build 2026 and it was fantastic. It was in the same place where GitHub Universe was at last year and the years before that at Fort Mason, which is over in the Marina area on the far west side, right by Alcatraz area.

04:19.52
Frank
Oh,

04:31.56
James
It's an old Fort, Fort Mason.

04:31.80
Frank
OK.

04:33.12
James
It's in the name. And it's kind of like a peer for thing. It's very fascinating. It's very, very cool. You should probably register for Universe in the end of October. It's like 600 bucks or whatever. You get a coupon code. It's pretty good deal. Yeah, you could crash on my couch.

04:46.52
Frank
That's funny. We're playing like a little California dance. It seems like wherever you go, I come the next day. we're We're just kind of doing a do-si-do around the West Coast here. i love visiting San Francisco. In fact, I was going to try to do it on this trip, but um the fantastic app Waze by Google totally led me astray. And I'm 100% blaming them and not the user, not me. It's the app's fault that it did not take me to San Francisco when I asked it to, but it's a fun city to visit.

05:15.49
James
Yeah, it was good.

05:15.69
Frank
It's always fun visiting.

05:17.18
James
My sister lives there. I forgot. So then I remembered and then I was like, we should probably get breakfast or something. And then i I told her I told her that I forgot then she'll never let that down.

05:23.48
Frank
Make me choke.

05:26.74
James
But, you know, she moves around. She moves around. So it's like, you know, what ours i yeah oh she's like in and around San Francisco.

05:30.17
Frank
It's her fault. Yeah.

05:32.97
James
You know, it's like it's a big, you know, it's around the area. So I was like, oh, I don't know.

05:36.06
Frank
Who doesn't live in San Francisco?

05:37.09
James
You've. yeah I mean, you moved around so much, I don't even remember. I was like, also, if it's like five miles in San Francisco, that might as well be two hours away, you know?

05:45.94
Frank
Yeah.

05:46.00
James
So, um yeah, it was good overall.

05:48.37
Frank
That's cool.

05:48.77
James
But I figured what we could do today is there's, you know, there's tons of window news that I really want to talk about, but I'm im i'm really fascinated about the the models. But let me really quickly shout out a few things.

06:01.41
Frank
Oh, yeah. Okay.

06:02.01
James
um for windows really quick so windows was first and kayla on my team got to do the big demo after satya so she was first up um so there's a few really cool things that they showed off um first and foremost is that there are new um uh built-in container support on windows called wsl containers which means that um for folks that maybe don't want docker or podman or other some docker or some containerization virtualization software, you can now just use WSL, which is just right there. And it's the same commands and you got container images and boom, you know, or images list or whatever it is, the command is. And then boom, here's all of your images and your containers running. And it all just works brilliantly out of the box, which is really nice. You know, Apple did this like not too long ago.

06:52.89
Frank
Yeah, so these are Linux containers, right? Because it's WSL, not Windows containers.

06:56.60
James
That is correct. WSL.

06:58.52
Frank
Is that right?

06:58.86
James
yes Yep. Yep.

06:59.79
Frank
Yeah.

07:00.21
James
That's correct.

07:00.41
Frank
Okay. I mean, that makes 100% sense. So much so that I guess I just kind of assumed they already had that. But that's good because i did assume that they already had that.

07:11.58
Frank
So that's good that they got that going. And yeah, ah something a lot of people like, docker containers are just namespace limiters. like Most of the operating systems support that feature.

07:23.71
Frank
You don't even need virtualization to make containers work. but So it 100% makes sense that Microsoft would get on this bandwagon, especially with all the agent coding stuff they seem to be hooking it up to, but I'm sure you'll get to that.

07:39.57
James
Well, I'm glad you talked about that because Microsoft also announced something different, which is actually something cross-platform. We're big fans of cross-platform software here, Frank Krueger. But it is something called MXC, the Microsoft Execution Container.

07:49.46
Frank
Wow.

07:55.48
James
And this is a sandboxed code execution system for running untrusted code, so model output, plugins, tools on Windows, Linux, and Mac OS. And it runs in, so it's JSON config, of course, and there's it's policy driven. So you can give um access to file system, network, UI policy, things like that. So you have access to the clipboard, do you have access to the network XYZ.

08:18.68
James
And it is basically on different containment backends, which are all like sort of micro VM sort of base. so you have, you can either use,

08:29.75
James
Process container, window sandbox, LXC, bubble wrap, seatbelt, micro VM, hyper light, isolation session, or WSLC as well, which was just announced too, which is cool. And Scott Hanselman and Sam showed this off, which was the open claw support, which is running the tool calls inside of MXC. So it has the policy of the app isn't running inside, but it is...

08:54.90
James
doing the the tool calling, right?

08:57.05
Frank
Interesting.

08:57.22
James
So um inside of it, so this is familiar if you're inside of VS Code or something and you see a sandbox, this would be something that now would be running on Windows, which is very exciting for me.

08:57.69
Frank
Yeah. and um I like this one because honestly, I missed the cross-platform part as a part of the announcement.

09:16.73
Frank
I just saw it was a sandboxing for code execution. And I somehow totally missed the the fact that it can run on Macs. And sorry, you just listed all those technologies that it can use. So i'm like, well, yeah, that just brings home my point that these container things are just namespace limiters. It's actually kind of easy to do this in most modern operating systems. So this thing is just kind of joining it all together.

09:42.97
Frank
I do have... Mixed feelings, um definitely for running like just random code that the agent is generating, 100%. Let's run that in a constrained environment.

09:54.17
Frank
For Open Claw and things, like I'm a little bit confused, to be honest, because I thought the power of Open Claw was you are just logged into all your accounts and everything, and it just impersonates you and pretends to be you and all that stuff.

10:07.61
Frank
But maybe Claw usage has become more advanced, and I'm behind the times here. But I thought it was a real YOLO environment of I'm logged into everything, go have fun ai So I guess there's different uses of claw out there.

10:23.58
James
Yeah, I think they are thinking about like enterprise claw, right? Hey, I want to run this onto my machine and doesn't it's not going to delete everything or get access to my internal network. For example, you can sandbox those calls specifically, which is great.

10:36.09
Frank
Yeah, I'm curious to see what the um configuration language looks like. i mean, you said it's Jason, but um like I'm curious to see what all the dials actually are. Because again, i have mixed feelings. I set up a development environment. want all the tools to be able to use that development development environment without having to like mess around too much. So i want I'm curious how easy they made it to access built-in stuff you already have installed.

11:02.10
James
Yeah, totally. um And I would say, i haven't really tried out a lot of it, but there's no like a new Windows native app basically. So I'm i'm interested in that and actually giving it a spin finally.

11:14.20
James
But I will say, do you know um do you know about like core utils or like grep? Have you heard of these things, you know, different commands available on Unix?

11:21.49
Frank
Yeah, I've heard of the Grap, mostly because I've seen agents be like, hey, you haven't installed RG, RG is the better Grap. I'm just like, just use Grap.

11:32.04
Frank
That's how I know I've used Grap. Core utils, is that um is that a GNU thing? um I mean, so a part of POSIX is a set of utilities that are a part of any POSIX operating system. POSIX is what Unix and Linux both implement. Core utils I thought were just the GNU versions of the POSIX tools. Am i right or wrong anywhere?

11:56.22
Frank
am I close to home?

11:57.90
James
I think you're, so i mean, it's unix Unix style. So it's Unix commands basically that are there.

12:03.21
Frank
Yeah.

12:03.58
James
it's ah it's But uutils is the org or whatever that does it.

12:04.37
Frank
So

12:07.50
James
But you got you got things like the RM, the MV, the DIRs, the cats, the the greps, those things like that.

12:12.67
Frank
yeah.

12:17.94
James
You know what i mean? You all the commands um that are there. The POSIX, all those things. But basically, yeah.

12:24.30
Frank
That's what I mean. Yeah. There's usually a suite of tools to become a POSIX operating system. You basically just have to have these tools installed. It actually has nothing to do with the kernel or very little to do with the kernel. It's more about what your threading model is, what your process model is, and what tools you have available. But I interrupted you. I think Microsoft released something written in a funny language called Rusty or something like that.

12:49.27
James
Probably Rust. yes so Yes. Core utils for Windows is now available. so If you're just inside of PowerShell, you can grep away all your all you would like, which is fantastic.

12:55.54
Frank
specific

12:59.50
Frank
it's It's ls and I know PowerShell implements it now, but it's still one of my favorite things is every time I switch to Windows because I'm still command exe person.

13:04.67
James
Yeah.

13:09.78
Frank
Judge me. Don't judge me. You can judge me.

13:11.77
James
Yeah.

13:12.34
Frank
I still type ls.

13:13.97
James
It's there. Yes. Well, the interesting part is the other thing that they announced was a new profile for um for WSL that they call it Comfy Shell, I want to say.

13:26.04
James
But for all intents and purposes, they've ported over like Starship and Homebrew and all the keyboard command shortcuts.

13:26.61
Frank
OK. Oh.

13:29.43
Frank
Huh?

13:34.43
James
If you're coming from Mac OS, for example, I don't think they said this on stage, but I'll say it because it's our podcast, damn it, is that if you're coming from macOS, you now have a shell that is pretty much exactly the same. You have all the same things that are available to you. And obviously, you can't just install like a homebrew macOS application or something like that, or Linux application. That's not going to work. But if you have a Linux app,

14:03.03
James
you know command line application that's gonna work. So for example, like, um you know, she showed off like running like btop, for example, and it's like, here it is, like, here's all of btop or whatever, and here's all of your homebrew commands and everything just works.

14:16.18
Frank
Well, I'm sorry, Windows users, that you're now going to be having access to Homebrew.

14:16.70
James
You know your tops, your things.

14:23.65
Frank
I'm terrible. mom I hate Homebrew. I think it's one the worst package managers. It's all we've got on Mac. It's what we all use. Thank you, Homebrew, all everyone who works on it. Thank you.

14:33.90
Frank
We need you. i love your packages. But my god, Homebrew is the worst package manager on the whole planet.

14:38.99
James
It's there. I like Winget. I like Winget personally on Windows. It's fantastic. So I'm just saying. Well, two more things really quick. One is that there is a ah new experimental terminal.

14:51.86
James
So there is the Windows terminal, but there is another terminal that's basically a version of Windows terminal, but with agents built in.

14:53.58
Frank
Oh my God. Oh my.

14:58.79
James
It's called Intelligent Terminal. um And what that allows you to do is use a terminal as a terminal. And if you need an agent to help you out, it pops up a side-by-side or below, and you just have your agent there, whether that's Copilot, Claude, or Gemini or something else, it's available.

15:14.75
Frank
Uh.

15:15.90
James
And it can auto-detect errors. So for example, let's say you don't know how to use grep and you grep incorrectly, it can auto-detect the error, and then it will tell you what to do. And you can select the models, and that could be running locally on your machine too, don't have to like, actually use a cloud a cloud one.

15:31.16
James
You just have a local model running, which is cool. and If you want to get all of this stuff, they have a new developer config, which ins installs all of the things that I just told you, but it in installs so much more.

15:37.79
Frank
Mm-hmm.

15:41.94
James
so It gives you all the PowerShell stuff. It does stuff. But what it also does is it removes all the noise, I would say. So they call it like quiet mode almost. So it turns off notifications. It gets rid of all the news feeds, all the widget stuff. It gives you like a beautiful background. It installs all the things for you. It like sets up all your environment and installs the things that you need. And it's just a script that you run and it gives you all of the dev environment out of the box. And the beautiful part about that is that it is the default experience on the new

16:14.02
James
RTX Spark devices in partnership with NVIDIA, including a device that I have personally, which is the Surface Laptop Ultra with 128 gigs of unified, beautiful, unified.

16:16.87
Frank
oh okay

16:28.77
James
Satya said it a thousand times, a unified memory.

16:31.55
Frank
you know

16:32.65
James
unified, which means that this puppy can run like the NVIDIA superstar 120 billion parameter model Kayla was running on stage without a hiccup, which is fantastic. So you can run those models locally in your machine, baby.

16:49.83
Frank
Don't make me jealous. Okay. ah You gotta say unified a lot because RAM is so expensive. It better be unified if I'm not buying two sets of RAM.

16:56.50
James
Unified, baby, it's unified.

16:58.88
Frank
I'm not buying GRAM and Normie RAM. You can't have both of those things. It's too much. Man, that spark.

17:06.55
James
Yeah. Mm-hmm.

17:07.26
Frank
those Those new machines look nice. I know I'm not ever going to afford them because I spend all my money on overpriced Macs, but It is very tempting to spend money on an overpriced NVIDIA machine because i have definitely gotten the local model disease.

17:23.75
Frank
It's a disease, a plague. It's going around the whole internet right now. we've talked We did a whole show on it. I'm enjoying all my local models and new local models keep coming out left and right. And I want to try all of them, but then my Python environments get really weird and I refuse to write. Anyway, going off on a tangent. It'd be nice to have more RAM.

17:43.77
Frank
Be nice to have unified RAM.

17:44.03
James
Yeah.

17:45.98
Frank
Be nice to have unified RAM on a cute little laptop, though.

17:46.50
James
Unified.

17:49.26
Frank
I hope you put like a refrigerator underneath it.

17:52.44
James
it um It stays pretty cool, actually. um i don't know what the dynamic I don't know how they did it, but it stays pretty pretty pretty quiet.

17:54.74
Frank
Does it?

17:59.20
James
They did announce a new dev box, of the RTX Spark dev box, and that looks like a squished down Xbox Series X that kind of looks like a PlayStation 2 in a way. i really, really, really, really want it. That is for sure. And that is what Kayla was demoing on stage. It's beautiful. And it's like coming out this fall or something like that. So I want that.

18:25.21
Frank
Can you make sure to put that in the show notes so I can check it out too? Because I don't think I saw that one.

18:28.45
James
Yeah.

18:30.63
James
it's It's beautiful. It's a great thing. But talking about models, let's talk about the brand new MAI, the Microsoft AI models.

18:37.18
Frank
Wow.

18:39.49
James
Mustafa Suleiman got up on stage and announced seven homegrown non-distilled.

18:45.02
Frank
Seven.

18:48.85
James
Non-distilled from the ground up.

18:50.43
Frank
Nondis still.

18:51.56
James
Hill climbed from zero to the top.

18:53.21
Frank
Free range.

18:55.54
James
Free range, organic, models, Frank Kruger.

18:58.94
Frank
Oh yeah, organic.

19:00.37
James
Why?

19:01.70
Frank
Hey, I'm really excited about these, James. Yes, the my. my I'm pronouncing a my. how How did everyone else pronounce them? I didn't actually see them speak.

19:11.48
James
what Well, Craig, I asked Craig if it's because he works in MAI.

19:12.76
Frank
May? My.

19:17.46
James
So we say MAI, Microsoft AI, but also my is also acceptable.

19:22.39
Frank
Way too many syllables. Yeah.

19:23.93
James
my my My voice, my transcribe, my code.

19:24.23
Frank
My is better. They're my models.

19:29.48
James
i'll do like my. My my is fine.

19:31.03
Frank
Yeah. So these are the MAI models and it's like Microsoft, welcome to the chat. It's Everyone's been critiquing Microsoft because there obviously been early investors in open AI. They made partnerships. They they serve a billion different models. They're they're playing the will serve anyone's model kind of game with foundry and all that kind of stuff. But it's nice to see Microsoft putting some money into training their own models to be quite, you know, and these are decent, decent models that they're working on. Phi, fee, full, fun. was a smaller, decent model, but smaller, kind of in the Apple intelligence realm, which is the non-intelligent realm. So these are some actual good models posting some good numbers. I always pay attention to, I'm not a big benchmark person. I don't really believe in these benchmarks, but we need something, some way to compare these models. So I look at the SWE Bench Verified, just because we have the most models there, and it's scoring like a 70%. a 70% on ah fundamentally pretty small model. It's 31-ish billion active parameters, 2 billion um potential parameters because it's a mixture of experts model, which means it turns off a big chunk of itself. So it trains 512 versions of itself. This is all hand wavy, ignore all this because it's not how it actually works. And then picks eight of the 500 to actually execute in real time. That's why it's

21:05.17
Frank
30-ish billion active parameters, 2 trillion ah possible parameters. That is not large. So like,

21:14.46
James
This is the MAI thinking one model, I assume, that you're thinking about here, right?

21:17.94
Frank
yes, this is thinking one. Sorry, yeah, there's also the coding flash, the image one, and there's an audio one also.

21:19.83
James
Thinking one.

21:24.18
Frank
That's why there's so many of these models because they are smaller, so they fine-tuned them to um specific scenarios. But it's the thinking one that's kind of the most interesting because it's your general purpose one.

21:36.01
Frank
And even though they call it thinking, if you read the technical report, James, did you read all 109 pages of the technical report?

21:43.16
James
I did not. The one thing that they wonder going deep into and specifically about it is they say all of these models show the same infrastructure, the same commitment to clean enterprise glade grade data lineage.

21:51.56
Frank
Mm-hmm.

21:54.28
Frank
Mm-hmm.

21:54.65
James
We do not distill from other labs and we do not rely on opaque data. Our data sets are clean, traceable and enterprise grade. They're designed to work together and to integrate directly into the products people use every day, but the models themselves are only part of the story.

22:07.10
James
So this is the ground up approach and yeah, Mustafa, was talking specifically about about this, and they were put out a big report, and I did not read said report, but I believe my company blindly. Yeah.

22:21.28
Frank
I read every page of it. Maybe not every word on every page, but I read every page of it.

22:26.58
James
Yeah.

22:28.47
Frank
um Because it's actually a really fascinating report because they actually go into their training details. And a big part of training is what data you provide when, for how long, that kind of stuff.

22:41.53
Frank
um Yeah, they call it a hill climbing model because they're trying to figure out scaling figures like, no one can train a giant model 30 times in a year because they're giant. They take a lot of money and they take a lot of time to train these things.

22:54.28
Frank
So they keep using this hill climbing analogy, which is quite simple that they're trying to figure out what patterns they can test out on smaller models that they hope when you make the model bigger, still reflect on those bigger models.

23:06.68
Frank
So you can, figure out what works and what doesn't work on the smaller models. And then when it comes time to train the bigger ones, you can just do what works and you're not doing so much experimentation with the bigger training.

23:14.55
James
Yep.

23:17.30
Frank
um And honestly, these are basically industry secrets. um the The training method is becoming kind of the most important method because if you have a bunch of money, you can make a big network.

23:31.48
Frank
I today could go out there and make a trillion parameter network, no problem. I could even get it to train, but I do not have the time or money to actually train it on good data. Neither do I actually have the good data to give such model. So the number of parameters in a model does not reflect its intelligence. I could take a 1 trillion parameter model today, create it myself, And it would be the dumbest thing ever you'd ever seen. It will not perform. It will do garbage because i don't have the knowledge or time to invest in its training.

24:04.76
Frank
Whereas if you read this 109 page ah technical report, they go through basically all their ah training setup. And a big part of that is how they mined all the data.

24:16.88
Frank
They were very careful to be very clear about, you know, we obey robots text. We only access public GitHub repos. They have huge deduping things.

24:28.24
Frank
They have things where like they have the problem of a lot of the code out there now is AI generated. And they were trying to remove AI generated content from their training system.

24:38.19
James
Yeah.

24:40.80
Frank
you kept saying it's not distilled. And what that means is you can use other LLMs out there to generate training data so that you can train your network, but then you inherit all the little foibles and biases of those networks.

24:57.83
Frank
So them saying they use non-distilled data is them basically saying they use human-only data, which is scary. Yeah. You know, I think a few years ago I would have said, yeah, that's the only way to go.

25:10.95
Frank
These days I'm starting to wonder, like human data is so messy. You know, you go to Stack Overflow these days that has non-AI code on and you're like, wow, humans are mediocre programmers at best.

25:24.99
Frank
Maybe it would be better if there was some AI generated stuff on here.

25:25.08
James
Yeah.

25:29.72
Frank
But um it's it's it's laudable. it's It's a good thing that they actually cared that much about the data because it's honestly a pain in the butt. The amount of work that they had to go through to get you your free range model was a lot. And the fact that they got it to work well without using distilled data is quite an achievement because you know tool calls alone, thinking models, It's so hard to bootstrap these things. Just think about like the training scenario you have to go through to do, sorry to keep using the word, to train these things.

26:06.17
Frank
It's tough. It's really hard. It's much easier to just, you know, pay a few thousand dollars, generate a million tokens out of Opus and use that as training data. That's easier.

26:18.20
Frank
They didn't go that route.

26:19.97
James
Well, the interesting part about all the models all up is one of the things all about really efficiency and sort of being X amount more efficient than other comparable models. So the model for the thinking one is a similar to, at least on Sweep, Sweepbench Pro, be like an Opus 4.6.

26:39.99
James
um but with less token usage and more efficient. They do say here from like user ratings, I guess there's a company called Surge that they do blind blind LLM testing, which is crazy to think about.

26:44.81
Frank
Yeah.

26:53.18
James
But basically people always seem to prefer the quality of MAI Thinking 1 compared to Sonnet 4.6 across single and multi-turns. And you know when you look at these things, you look at all the bench scores, all these things, it does,

27:09.72
James
look you know really fantastic and especially the usage and the token usage, which I think is really important. overall because you know we're thinking about like, what does it cost?

27:16.26
Frank
Yeah.

27:19.92
James
Now we can't get actually access to the Thinking One models. That's gonna be in Foundry. It's in private preview, but there is another coding model, Frank, the MAI Code One Flash, which is more comparable to a Haiku model, but cheaper and faster and more efficient.

27:36.74
James
So just because it is faster, if it's 10 times more efficient, that means actually, even if it's the same cost and it's 10 times more efficient, it's, less token usage all up but it's been optimized for GitHub Copilot, CLI and VS code so there's some ah homegrownness going on here and yeah it also ranks very well on all the benchmarks so Sweebench Pro which is 51% AMME 92% with his match performance and instructions following of 75%

27:52.34
Frank
Mm-hmm.

28:07.86
James
um And it's in the box. So it was the last thing that was there. talked about some other models too. And we'll talk about some other cool things that they're doing. um But I love this one because I'm really excited that we have models in the box. I think it's, we you GitHub Copilot, one of the key differentials is the model choice, obviously, and bring your own key as well.

28:31.21
James
But you have Gemini models, which are now available in the CLI too. but You also have um OpenAI models. You have XAI models that are in there.

28:41.66
James
You have Anthropic models, DeepSeek.

28:42.90
Frank
Deep seek. of my Oh yeah, sorry, in the box, not in the box, fine.

28:44.64
James
Well, you got to connect to it, I guess. I guess, ah yeah. In the box, yes.

28:49.69
Frank
Yes, right. so Yes, yes, gotcha.

28:50.38
James
Well, you you can connect you can connect to anything in there. Well, I've actually connected to Open Router and then Open Router has that same 120 billion parameter model for free right now.

28:54.30
Frank
Yes.

28:59.35
Frank
Sure.

29:01.77
James
so i can just use it for free, which is crazy.

29:02.89
Frank
Oh my god.

29:03.86
James
ah They have a bunch of free models you just use. um But I really love it because I'm a big fan of stuff in the box that's available. And I put out this tweet and you can quote me on it, but it's what I said is not every agentic coding task that you do requires Opus 4.8 or GPT 5.5.

29:16.60
Frank
hmm

29:24.28
James
They definitely probably do not need a million context window and You definitely don't need anything probably besides the default reasoning and probably that's even too much too. So stop wasting all of your precious tokens.

29:36.87
James
I've been sitting down and I've been coding all day with MAI Code One Flash. I'm not saying it is a perfect model by standards, but I've been doing planning, integration on it, creating PRDs. I created...

29:49.34
James
a PRD for that pet application that I do, and I put side-by-side other models and the same exact command with the same tools. I'm talking two AI credits.

30:01.07
James
That is two cents compared to somewhere around 20 to 40 to 50 with other models because of their deep thinking and reasoning on it.

30:05.40
Frank
Yeah. Yeah.

30:08.54
James
And it's really good. So yeah, if you're used to haiku and you have certain tasks, it is really fantastic for that. But I've been using it a whole bunch. It's really great. I'm really enjoying it so far. And i have, of of course, having other models in the mix. But if I can, you know, use this model, which is really fast and um flash in the name there, then, you know, I can get a lot of work done for for a lot of the things, including planning. So it's pretty nice to have.

30:37.21
Frank
You know, we used to have a joke that a performance is a feature because people are never about, oh, but I'll get to performance. Like no performance is a feature. Like the speed of your app is a feature of your app. The fact that people don't have to sit there and wait for something. Well, James, I have a new one.

30:53.07
Frank
Um, price is a feature, especially in this new world of all these models just jacking up all their prices.

30:55.26
James
Yeah. Yeah.

31:01.02
Frank
and I'm not talking about just Copi. Anthropic has gotten expensive too and everything. Everything's getting more expensive. um The fact that like it may not be as smart as an Opus 4.8 or a Mythos or anything like that, Yeah, but I don't have that.

31:15.83
Frank
The benefit is I don't have that mental barrier of like, oh, I better word this perfectly because when I hit enter here, it's going to be a dollar gone. And if I do a bad job, then I have to spend another dollar or $2 or $3.

31:28.28
Frank
Now, with you know I forgot the exact numbers, but I think it was like 75 cents for a million tokens or something like that. like I can go back to saying good morning to the model like I used to.

31:40.50
James
Yeah. Yeah.

31:42.95
Frank
I'm willing to spend a cent to be polite. And so, you know, price is a feature to remove that mental burden, at least for cheap people like myself that don't want to spend a billion dollars but are totally addicted to the agentic style of development.

31:59.31
Frank
It's a big deal to be able to use ah a cheap, fast model. I'm personally, I'm looking forward to, I'm on a trip right now, so I can't do it. I want to compare a kind of head to-head to head to my local models because these slash models are smaller. Um, but is it better than my beloved Quinn 3.6?

32:20.82
Frank
Who knows? I don't know. I got to test it out. I got to do a little bit of head to head competitions. I might do a little blog post on it just cause I'm myself curious. And the only way to like really know is, you know, give it the same, give, one of them the same prompt five times, give another one that prompt five times and see what 10 results you get and judge them. You were mentioning in that odd what they one of the only benchmarks out there that works with these models are the human evals, where it's a blind A-B test of which one do you like better.

32:51.60
James
Yeah.

32:56.61
Frank
That is the only legitimate way to actually test these things at this point, because all the other benchmarks can be games. There's a lot of leaderboards that do this. Go find one.

33:07.10
James
Yeah.

33:07.26
Frank
And i want to do that myself.

33:07.30
James
And it, Yeah, and it's super fascinating because they do all the benchmarking and all this stuff. There's a few things I think is really interesting about this model is that they do compare it to a Haiku 4.5 so you can set your expectations accordingly.

33:22.33
James
However, so I mean, I think that's you got to set expectations like that's the correct way to do it.

33:22.42
Frank
Okay.

33:26.40
Frank
Mm-hmm.

33:27.22
James
um But they say, you know, it's it's doing it in less parameters, only five billion active parameters. And it's doing with better price to performance across the benchmarks with 60% fewer tokens.

33:39.10
James
So price is one thing, but also the tokens, if you use less tokens, are there.

33:40.48
Frank
That's, yeah.

33:42.98
James
So this is what's interesting about this. It says that specifically... um um that they were It says, they built code one flash with production workflows at center rather than optimizing only for benchmarks that it was trained directly with github coillot harnesses used in production it allows it to learn how to interact with surrounding tools and systems and ienttic coding tasks make you legally well suited for copillo workflows compared to other available models So it is really grounded in the GitHub copilot usage.

34:12.32
Frank
Yeah.

34:15.00
James
So does that finely tuned optimize? you know And you see other companies doing this like cursor with Composer, for example, right?

34:19.03
Frank
Yeah.

34:21.22
James
That's their own finely tuned. Now that is just on top of Quen, obviously. Now this is a non distilled ground up, but you know it says they they see code run flash solving hard problems.

34:32.18
James
with 60% fewer tokens that helps reduce latency, lower costs, improve return on tokens and make interactive workflows feel smoother. And like that stuff is important at the end of the day, that's there. And I've been really surprised to be honest with you of, I did a new feature for the My Cadence app where I wanted to add in a um dynamic island feature. So the live activity feature.

34:56.69
James
So I did something scary, which is I went into the CLI and I did slash research, which is a terrifying feature that does deep research on a task.

35:06.10
Frank
love d two Love deep research.

35:08.89
James
And I did it all with the MAI code one flash and it spun doing researching for several minutes.

35:17.87
Frank
Okay.

35:18.04
James
um so like

35:18.84
Frank
There goes all your tokens.

35:20.31
James
It was $1. It must have spun for like 10 to 15 minutes.

35:23.57
Frank
Nice. Okay.

35:25.98
James
And then I had it plan the feature based on the research.

35:26.08
Frank
Okay.

35:28.34
James
So in the context window, um i don't think it even compacted because it was pretty small. I don't know exactly what the harness is doing. It's something very efficient with those those research. And then I had an implement said feature. And from start to finish, it was about, I think $2.50. And that was a pretty long running operation. Now that said, I did then take 5.3 codex out. um And I did, well, I didn't actually review it right away.

35:54.61
James
I pushed the code and I did a get a copilot code review, which uses action minutes.

35:58.18
Frank
All right.

35:59.38
James
um and had a few comments on it. And then I had 5.3 do an analysis. So from start to finish, I used about $4 worth of of credits. And that was a pretty long running me spending hands-on time because I knew what I wanted.

36:15.09
James
i kind of was designing the feature, reviewing the feature, probably like an hour worth of of time.

36:17.49
Frank
Mm-hmm.

36:20.99
James
But that was between the two models and and going back and forth. And and iterating on it. I was really happy with the end result for something that would definitely take me a lot longer than one hour to implement and test in my code.

36:34.68
Frank
Well, to be blunt, all that Apple Widgets stuff, it feels like a lot of these models do a terrible job at it, partly because there's just not a lot of code.

36:44.21
James
So I researched. So I researched.

36:46.28
Frank
That's why you did the research here. Smart man, deep research.

36:48.34
James
Because I knew. i knew I did that research. I went through all the Apple documentation.

36:51.59
Frank
They're terrible at modern Swift APIs.

36:52.95
James
yeah

36:55.64
Frank
um Let me ask you. um It's hard to judge these things. But was was the... um well did Did you do codex or GPT? Sorry, I already forgot. You had to do the code review. Was it worth it?

37:08.47
Frank
Did it need to in the end?

37:08.73
James
code reviewed Code review is worth it. Yeah. Code review found a few optimizations that it didn't think about.

37:11.32
Frank
Yeah.

37:14.12
James
well One thing, I mean, the code review is really cool.

37:14.42
Frank
Okay.

37:16.03
James
It said, hey, it seems like you're updating every two seconds, but you can update every 15 seconds. And then if you do that, you can drop this other policy. And then also you could like reduce this thing. So it actually, it was more of a performance analysis type of thing on it, which was cool.

37:25.71
Frank
Hmm.

37:28.74
Frank
Yeah, okay.

37:29.45
James
And then i I don't know what model review did. I just assigned it to Copilot. um and then i went i could have done it with code one flash, but I just decided to say, okay, hey, we have the code review. We have some trickier details here.

37:45.94
James
Let me, um you know one, um go off, pull down all the comments, review the comments. I had to give me an analysis of the comments, and then I had to implement the comments, push the code, and then I also had to update

37:56.68
Frank
Okay.

38:01.53
James
the issues and resolve the issues automatically. So I did a lot more than just coding. I could have just resolved the issues, my or issues the comments myself, but I was getting cocky.

38:08.41
Frank
Yeah.

38:10.33
James
You know what I mean? i was like, ah, just do it.

38:11.61
Frank
yeah

38:11.85
James
Just do the thing, right?

38:13.43
Frank
hey

38:13.40
James
um But you know a lot of people, for example, are like oh why don't you use 5.4? And I said, why don't you use 5.4? Because it's almost twice as more expensive than 5.3 codex. And I think 5.3 codex is just as good of a job.

38:24.04
James
So why burn extra tokens and things that don't need a million contacts, which is more expensive anyways, um and just let it let it cook.

38:24.91
Frank
Okay.

38:27.31
Frank
Yeah.

38:30.59
Frank
Yeah.

38:32.03
James
So I think that's something of interest.

38:33.97
Frank
Yeah, fantastic. I just want to go back to the token thing. So it was funny reading because I did i read that i read the paper. it was they They are actually using the 200K vocabulary that has kind of become a standard with OpenAI.

38:42.68
James
Hmm.

38:48.78
Frank
So if you do, it's hard to compare these models from a token perspective because the tokenization method they use can change between all the models. But you can actually compare the GPT and I think Codex models against Mei, Mai.

38:57.91
James
Bye. Mm-hmm.

39:06.20
Frank
because it does use the same tokenizer. So it is kind of a one-to-one mapping between the different tokens. And the fact that it's using 60% less tokens, I'm excited for that because i even see my beloved Quinn. It's amazing how many tokens you waste on like failed merges of code edits.

39:27.12
Frank
And I think that's the real benefit where they say they trained with the actual harness of tokens. VS Code or Copilot. Ideally, it would have fewer of those kinds of stupid mistakes that are just burning tokens.

39:43.04
Frank
you know Now that we're all paying for tokens, it's the worst thing to see. failed to merge. So I'm going to output a file and and then, oh, something got corrupted. I'm going to delete that file and rewrite it from scratch.

39:52.98
James
Yeah.

39:53.56
Frank
You're like, no, don't rewrite the file from scratch.

39:54.55
James
No.

39:56.08
Frank
just cost me $10. Yeah. um yeah So I think seeing these models tuned into their harnesses is beneficial for all of us. And I'm excited to see these kind of savings that we get from that kind of stuff.

40:12.44
James
Yeah, and on top of that, there was a few other models. There's Image 2.5, which is basically at the same level of Nano Banana Pro out there. um Two that I'm very interested in, which is Transcribe 1.5, which is the world's best transcription soda.

40:24.56
Frank
Crazy. OK.

40:27.96
James
Out there, you can do transcription in 43 languages with domain-specific terminology. um I know that this is very good. I've seen it because it's it's used throughout a bunch of Microsoft tech already. And then there's also voice, which is very fascinating, which is natural sounding voice speech across 15 languages.

40:46.62
James
And they also have a flash variant of that too. What I'm really fascinated about this is I really want, I was talking to Hanselman about this is but have all these models and I'm really interested in how I could compose these to streamline r podcast operation even more because all these are in foundry.

41:01.01
Frank
Mm-hmm.

41:03.01
James
I am doing a bunch of work today with some open AI models for some transcription, to take a transcript and do some things, but it is manual. I just would love to have an MP3 right after, but i don't even want to do anything.

41:14.29
James
I want to create some sort of workflow that auto gens it, does a thing, bingo bango, creates all the beautiful artwork, does the thing and puts it all together. The cool thing about these models, Frank, is they're not only on Foundry readily available, they're also on Open Router, Fireworks, and Base 10, which means that you can actually tune the weights of the models themselves if you so desire, if you want to get them spicier.

41:34.48
Frank
Yeah.

41:35.79
James
But I would also say this, the other cool part that we talked about hill climbing One of my favorite demos was with Lando Lakes, the butter company, the dairy company, where they took um the Thinking One model and they did frontier tuning. This is a new service that is available that allows you to fine tune the MAI models and make it your own for your own business needs. so You're talking about all the data. If you're an enterprise, you can now feed all of that data within your own secure environment. and You basically create your own model on top of it.

42:09.07
James
finely tuned and you hill climb with it. So they were talking about the hill climb of the butter.

42:12.30
Frank
um

42:13.82
James
I was in part of the keynote review and i was making lots of butter jokes. I wanted some butter Easter eggs, but Kayla refused to put a butter profile and in there. But this is really cool. So you define your tasks and what success looks like. You feed in your data workflows and maybe M365 data. And you you basically improve performance through training and iterative optimizations. And then you can deploy that a model in foundry or a co-pilot and you can continuously improve based on real usage, which is really cool. It's one of my favorite demos.

42:44.79
Frank
Yeah, fun. um It's definitely the future of these models. like We're all still deal figuring out how to deal with them from like the model and harness perspective, but fine-tuning is definitely the future.

42:56.50
Frank
there's There's only so much you can do with the context windows at the moment and system prompts.

42:56.95
James
Yeah.

43:02.25
Frank
Fine-tuning is the future, so I'd like to see all progress in those areas. um Am I crazy? Did they say they were going to open any of these models? Are any of them going to be released like so you can run them locally?

43:17.56
Frank
i thought I saw something like that, but I don't want to put words in your mouth. So can you tell me?

43:23.03
James
Great question. There are two other models specifically um that were released called Aion. A-I-O-N 1.0.

43:34.54
James
There was two that these run locally on device. Aion1 Instruct brings smaller local Aion models.

43:39.75
Frank
OK.

43:42.56
James
So that's a compact on device language model for lightweight AI tasks. And there's AI Ion1 plan, which is a larger 14 billion parameter reasoning and tool calling model.

43:55.72
James
And those are there.

43:56.52
Frank
OK. OK.

43:58.44
James
i don't actually know how those work, but they're like in the box or something. um These ones, I don't think you're running locally just on foundry that are deployed, but the Aon windows models, I think are newer models that you can like grab and like put into things.

44:11.91
James
I'm not positive though.

44:12.19
Frank
Okay, thanks for clarifying because I thought I saw something about local.

44:12.87
James
Yeah. Yeah.

44:15.24
Frank
Again, I'm i'm drinking the local juice right now, so I just had to had to know.

44:20.60
James
Yeah. So. so

44:22.52
Frank
Also, I have to say, um we've been talking about on Microsoft, but Google's not sitting back doing nothing. So they just released a Gemma 12B that is also multi-modal.

44:34.52
Frank
A big thing I don't think people talk about enough is like, you can give images and and text and PDFs and all sorts of stuff to these models now. We're all programmers, so we're just feeding them text and having them read files.

44:45.48
Frank
But it's cool when they actually take images also. Because then you can do the closed loop thing of like it generates some UI and it can actually see that UI, stuff like that.

44:57.28
James
Yeah.

44:58.62
Frank
So I believe all these models are multi-modal. And then I just want to give a shout out that Gemma just released a 12 billion multi-modal model also.

45:10.60
James
I did see that and they released like some AI edge app that I think you need to install to get it or something, but I definitely want to try it. Yeah, that'd be cool.

45:18.23
Frank
they They released it on all the normal places, but yeah, they released yet another app because it's not Google if they haven't written a new app this month that they'll cancel next month. so Get it while it's available.

45:28.92
James
Yeah. Yeah.

45:33.72
James
Yeah. All right. Well, tons of good stuff. That was only scratching on the surface. There was tons to go through. We'll continue to unravel stuff for Microsoft Build and also WWDC, which is happening.

45:43.77
James
i think it already happened, Frank.

45:44.77
Frank
Oh my gosh.

45:45.41
James
It was like already happened.

45:46.42
Frank
Oh, Jesus.

45:46.45
James
i think Well, I mean, it is happening between then and now, but I think it's like today or tomorrow from when the podcast comes out. So we'll do some WWDC breakdowns. That'd be great. Yeah.

45:55.37
Frank
Yeah.

45:56.01
James
Yeah.

45:56.09
Frank
instances I think it's going to rainy day. I'll spend a rainy day inside WWDCing.

46:01.16
James
Yeah. Awesome. All right, everyone. Well, thanks for tuning in. Let us know if you're trying out the new models or any other exciting announcements that you had at Microsoft Build 2026 or anything anything you're looking forward to at WWDC or even Google for that matter. Let us know at forward slash at mergeconflict.fm. That is the best way. Leaving a sweet comment on our YouTube. We appreciate it. That's good for this week's Merge Conflict. So until next time, I'm James Watson Magno.

46:27.29
Frank
And I'm Frank Krueger. Thanks for watching and listening.

46:30.49
James
Peace.