00:00.60 James Welcome back everyone to Merge Conflict, your weekly developer podcast, breaking down all the things the latest and greatest in the world of software development and AI and models and conferences and all those goodies. Frank, on the road, Kruger, how's it going, buddy? 00:14.71 Frank Norcal PTAC Co- James coming to you live from actually don't know where I am so from a motel hi James it's wonderful to be here I didn't think I was going to make James Forrest, Norcal PTAC Co- We thing where, no matter where we are in the world, we try to record the show, and this is one of those episodes I don't know where I am in the world, but here I am recording. 00:35.67 James Frank is live from Willits, California. 00:38.51 Frank Oh, hi. Hi, California. That would explain all the California license plates I've seen lately. 00:44.59 James They're everywhere. You can't get a you can't get away from them. Like, oh, what's going on? Yeah, we've been, ah Frank is doing this big cross-country tour up and down, yeah north to south and back up again. 00:57.56 James And Frank stayed with us, which was great, which was fantastic. And now, uh he's on the road maybe we'll see him again we'll see but it's been pretty cool to follow you because every i would say like was telling you like five or six outer hours or so heather's like you know and i'm like where's frank where's frank is he gonna make it is he here where's he at was here and then we're like is he stopping is he going i like why is he going through death valley we told him not to go through death valley what's he doing is he okay is he alive uh so it's been it's been a joy to watch you from a distance uh on it 01:28.89 Frank Yeah, it's funny. um you know the I never understood what the Find My feature is for on iPhones, because every time I lose my iPhone, it's like, in an ocean or something. 01:39.21 Frank So it's not good for it. 01:39.70 James Bye. 01:40.66 Frank the Find my device doesn't work. But I do love tracking people and living vicariously through other people. And I don't want to be snoopy or anything, but I love it when other people do it to me too. They're like, Hey, you're in this place. 01:52.10 Frank And i'm like, I am in that place. How did you know? oh yeah, it's 2026 and all these features are there. But honestly, it's also comforting knowing that people are watching me because You know, if my dot stopped in Death Valley for, you know, 12 hours, maybe you should call someone, James, and i will hold you to that. 02:11.74 Frank That's your job. If you see my dot stop for too long. 02:14.62 James I may have been checking it minimum twice a day just to make sure the dot was moving. I'm not going to lie about it. That is factual. I'm not going to lie. Well, because it is scary. It's a big. 02:25.10 James You're out there by yourself doing your own thing. And i don't know. It's a long distance. So, no, it's it's funny. my I have some coworkers that are like Gen Z, and they were telling me that that is... 02:30.24 Frank Yeah. 02:35.16 James Find My is like a it's like a social norm. Pretty much ah the friend group, the neighborhood, everybody is on it. 02:40.18 Frank think 02:43.51 James like I had some my coworkers show me their Find My and they would have like 70 people, like everybody, every single person. And it was not just because they're friends, but they're like checking on each other. They're doing stuff. There's kind of like this subculture to it. 03:00.18 James but it was almost like their i you know, their eye mess I, guess they're all on WhatsApp, but it's like their I message, you know what I mean? it's like, it's very, like that's how they communicate. 03:04.98 Frank Yeah. That's funny. I just got the FOMO thing of like, well, I've only got like 20 friends on Find Maya. I got to collect more. i got find more people that will let me snoop on them constantly. 03:16.82 James ah yeah i got four, I got you, Craig, Heather, I think I got Scott now, cause he accidentally did, were, you know, 03:19.17 Frank You've got four. ah 03:24.70 Frank You're like, you can' you can't have it back, Scott. I can know where you are forever now. 03:27.99 James Well, because, you know, when I go into Find because I do the thing when I'm at, like, I was just at Microsoft Build 2026, we'll talk about today. um But, you know, when you go into Find My, do have one, two, three, four, five five people. 03:41.36 James um I have more devices than people. But it does say, Scott Hanselman, this person shares their location with you. 03:44.32 Frank We'll get you more followers, James. 03:48.64 James do you want to share back? You know what i mean? They kind of guilt you into it. 03:51.29 Frank And you're like, nope, nope. 03:52.28 James Uh, no. 03:53.02 Frank Scott can't know where I am. 03:53.04 James And I, Scott is in his, uh, oh, Scott said a movie theater. and That's what he's doing. 03:59.37 Frank Don't be creepy. 03:59.60 James Uh, I'll do it. I know that's the, well, now can. Well, so when we go to conferences, I always do the share until the end of the day, you know, for my coworkers, because it would be, 04:06.05 Frank Yeah. Right. But the buttons are right next to each other. You never know which one you're going to click. 04:12.63 James You got to audit. You got to audit. That is for sure. Well, I was at Microsoft Build 2026 and it was fantastic. It was in the same place where GitHub Universe was at last year and the years before that at Fort Mason, which is over in the Marina area on the far west side, right by Alcatraz area. 04:19.52 Frank Oh, 04:31.56 James It's an old Fort, Fort Mason. 04:31.80 Frank OK. 04:33.12 James It's in the name. And it's kind of like a peer for thing. It's very fascinating. It's very, very cool. You should probably register for Universe in the end of October. It's like 600 bucks or whatever. You get a coupon code. It's pretty good deal. Yeah, you could crash on my couch. 04:46.52 Frank That's funny. We're playing like a little California dance. It seems like wherever you go, I come the next day. we're We're just kind of doing a do-si-do around the West Coast here. i love visiting San Francisco. In fact, I was going to try to do it on this trip, but um the fantastic app Waze by Google totally led me astray. And I'm 100% blaming them and not the user, not me. It's the app's fault that it did not take me to San Francisco when I asked it to, but it's a fun city to visit. 05:15.49 James Yeah, it was good. 05:15.69 Frank It's always fun visiting. 05:17.18 James My sister lives there. I forgot. So then I remembered and then I was like, we should probably get breakfast or something. And then i I told her I told her that I forgot then she'll never let that down. 05:23.48 Frank Make me choke. 05:26.74 James But, you know, she moves around. She moves around. So it's like, you know, what ours i yeah oh she's like in and around San Francisco. 05:30.17 Frank It's her fault. Yeah. 05:32.97 James You know, it's like it's a big, you know, it's around the area. So I was like, oh, I don't know. 05:36.06 Frank Who doesn't live in San Francisco? 05:37.09 James You've. yeah I mean, you moved around so much, I don't even remember. I was like, also, if it's like five miles in San Francisco, that might as well be two hours away, you know? 05:45.94 Frank Yeah. 05:46.00 James So, um yeah, it was good overall. 05:48.37 Frank That's cool. 05:48.77 James But I figured what we could do today is there's, you know, there's tons of window news that I really want to talk about, but I'm im i'm really fascinated about the the models. But let me really quickly shout out a few things. 06:01.41 Frank Oh, yeah. Okay. 06:02.01 James um for windows really quick so windows was first and kayla on my team got to do the big demo after satya so she was first up um so there's a few really cool things that they showed off um first and foremost is that there are new um uh built-in container support on windows called wsl containers which means that um for folks that maybe don't want docker or podman or other some docker or some containerization virtualization software, you can now just use WSL, which is just right there. And it's the same commands and you got container images and boom, you know, or images list or whatever it is, the command is. And then boom, here's all of your images and your containers running. And it all just works brilliantly out of the box, which is really nice. You know, Apple did this like not too long ago. 06:52.89 Frank Yeah, so these are Linux containers, right? Because it's WSL, not Windows containers. 06:56.60 James That is correct. WSL. 06:58.52 Frank Is that right? 06:58.86 James yes Yep. Yep. 06:59.79 Frank Yeah. 07:00.21 James That's correct. 07:00.41 Frank Okay. I mean, that makes 100% sense. So much so that I guess I just kind of assumed they already had that. But that's good because i did assume that they already had that. 07:11.58 Frank So that's good that they got that going. And yeah, ah something a lot of people like, docker containers are just namespace limiters. like Most of the operating systems support that feature. 07:23.71 Frank You don't even need virtualization to make containers work. but So it 100% makes sense that Microsoft would get on this bandwagon, especially with all the agent coding stuff they seem to be hooking it up to, but I'm sure you'll get to that. 07:39.57 James Well, I'm glad you talked about that because Microsoft also announced something different, which is actually something cross-platform. We're big fans of cross-platform software here, Frank Krueger. But it is something called MXC, the Microsoft Execution Container. 07:49.46 Frank Wow. 07:55.48 James And this is a sandboxed code execution system for running untrusted code, so model output, plugins, tools on Windows, Linux, and Mac OS. And it runs in, so it's JSON config, of course, and there's it's policy driven. So you can give um access to file system, network, UI policy, things like that. So you have access to the clipboard, do you have access to the network XYZ. 08:18.68 James And it is basically on different containment backends, which are all like sort of micro VM sort of base. so you have, you can either use, 08:29.75 James Process container, window sandbox, LXC, bubble wrap, seatbelt, micro VM, hyper light, isolation session, or WSLC as well, which was just announced too, which is cool. And Scott Hanselman and Sam showed this off, which was the open claw support, which is running the tool calls inside of MXC. So it has the policy of the app isn't running inside, but it is... 08:54.90 James doing the the tool calling, right? 08:57.05 Frank Interesting. 08:57.22 James So um inside of it, so this is familiar if you're inside of VS Code or something and you see a sandbox, this would be something that now would be running on Windows, which is very exciting for me. 08:57.69 Frank Yeah. and um I like this one because honestly, I missed the cross-platform part as a part of the announcement. 09:16.73 Frank I just saw it was a sandboxing for code execution. And I somehow totally missed the the fact that it can run on Macs. And sorry, you just listed all those technologies that it can use. So i'm like, well, yeah, that just brings home my point that these container things are just namespace limiters. It's actually kind of easy to do this in most modern operating systems. So this thing is just kind of joining it all together. 09:42.97 Frank I do have... Mixed feelings, um definitely for running like just random code that the agent is generating, 100%. Let's run that in a constrained environment. 09:54.17 Frank For Open Claw and things, like I'm a little bit confused, to be honest, because I thought the power of Open Claw was you are just logged into all your accounts and everything, and it just impersonates you and pretends to be you and all that stuff. 10:07.61 Frank But maybe Claw usage has become more advanced, and I'm behind the times here. But I thought it was a real YOLO environment of I'm logged into everything, go have fun ai So I guess there's different uses of claw out there. 10:23.58 James Yeah, I think they are thinking about like enterprise claw, right? Hey, I want to run this onto my machine and doesn't it's not going to delete everything or get access to my internal network. For example, you can sandbox those calls specifically, which is great. 10:36.09 Frank Yeah, I'm curious to see what the um configuration language looks like. i mean, you said it's Jason, but um like I'm curious to see what all the dials actually are. Because again, i have mixed feelings. I set up a development environment. want all the tools to be able to use that development development environment without having to like mess around too much. So i want I'm curious how easy they made it to access built-in stuff you already have installed. 11:02.10 James Yeah, totally. um And I would say, i haven't really tried out a lot of it, but there's no like a new Windows native app basically. So I'm i'm interested in that and actually giving it a spin finally. 11:14.20 James But I will say, do you know um do you know about like core utils or like grep? Have you heard of these things, you know, different commands available on Unix? 11:21.49 Frank Yeah, I've heard of the Grap, mostly because I've seen agents be like, hey, you haven't installed RG, RG is the better Grap. I'm just like, just use Grap. 11:32.04 Frank That's how I know I've used Grap. Core utils, is that um is that a GNU thing? um I mean, so a part of POSIX is a set of utilities that are a part of any POSIX operating system. POSIX is what Unix and Linux both implement. Core utils I thought were just the GNU versions of the POSIX tools. Am i right or wrong anywhere? 11:56.22 Frank am I close to home? 11:57.90 James I think you're, so i mean, it's unix Unix style. So it's Unix commands basically that are there. 12:03.21 Frank Yeah. 12:03.58 James it's ah it's But uutils is the org or whatever that does it. 12:04.37 Frank So 12:07.50 James But you got you got things like the RM, the MV, the DIRs, the cats, the the greps, those things like that. 12:12.67 Frank yeah. 12:17.94 James You know what i mean? You all the commands um that are there. The POSIX, all those things. But basically, yeah. 12:24.30 Frank That's what I mean. Yeah. There's usually a suite of tools to become a POSIX operating system. You basically just have to have these tools installed. It actually has nothing to do with the kernel or very little to do with the kernel. It's more about what your threading model is, what your process model is, and what tools you have available. But I interrupted you. I think Microsoft released something written in a funny language called Rusty or something like that. 12:49.27 James Probably Rust. yes so Yes. Core utils for Windows is now available. so If you're just inside of PowerShell, you can grep away all your all you would like, which is fantastic. 12:55.54 Frank specific 12:59.50 Frank it's It's ls and I know PowerShell implements it now, but it's still one of my favorite things is every time I switch to Windows because I'm still command exe person. 13:04.67 James Yeah. 13:09.78 Frank Judge me. Don't judge me. You can judge me. 13:11.77 James Yeah. 13:12.34 Frank I still type ls. 13:13.97 James It's there. Yes. Well, the interesting part is the other thing that they announced was a new profile for um for WSL that they call it Comfy Shell, I want to say. 13:26.04 James But for all intents and purposes, they've ported over like Starship and Homebrew and all the keyboard command shortcuts. 13:26.61 Frank OK. Oh. 13:29.43 Frank Huh? 13:34.43 James If you're coming from Mac OS, for example, I don't think they said this on stage, but I'll say it because it's our podcast, damn it, is that if you're coming from macOS, you now have a shell that is pretty much exactly the same. You have all the same things that are available to you. And obviously, you can't just install like a homebrew macOS application or something like that, or Linux application. That's not going to work. But if you have a Linux app, 14:03.03 James you know command line application that's gonna work. So for example, like, um you know, she showed off like running like btop, for example, and it's like, here it is, like, here's all of btop or whatever, and here's all of your homebrew commands and everything just works. 14:16.18 Frank Well, I'm sorry, Windows users, that you're now going to be having access to Homebrew. 14:16.70 James You know your tops, your things. 14:23.65 Frank I'm terrible. mom I hate Homebrew. I think it's one the worst package managers. It's all we've got on Mac. It's what we all use. Thank you, Homebrew, all everyone who works on it. Thank you. 14:33.90 Frank We need you. i love your packages. But my god, Homebrew is the worst package manager on the whole planet. 14:38.99 James It's there. I like Winget. I like Winget personally on Windows. It's fantastic. So I'm just saying. Well, two more things really quick. One is that there is a ah new experimental terminal. 14:51.86 James So there is the Windows terminal, but there is another terminal that's basically a version of Windows terminal, but with agents built in. 14:53.58 Frank Oh my God. Oh my. 14:58.79 James It's called Intelligent Terminal. um And what that allows you to do is use a terminal as a terminal. And if you need an agent to help you out, it pops up a side-by-side or below, and you just have your agent there, whether that's Copilot, Claude, or Gemini or something else, it's available. 15:14.75 Frank Uh. 15:15.90 James And it can auto-detect errors. So for example, let's say you don't know how to use grep and you grep incorrectly, it can auto-detect the error, and then it will tell you what to do. And you can select the models, and that could be running locally on your machine too, don't have to like, actually use a cloud a cloud one. 15:31.16 James You just have a local model running, which is cool. and If you want to get all of this stuff, they have a new developer config, which ins installs all of the things that I just told you, but it in installs so much more. 15:37.79 Frank Mm-hmm. 15:41.94 James so It gives you all the PowerShell stuff. It does stuff. But what it also does is it removes all the noise, I would say. So they call it like quiet mode almost. So it turns off notifications. It gets rid of all the news feeds, all the widget stuff. It gives you like a beautiful background. It installs all the things for you. It like sets up all your environment and installs the things that you need. And it's just a script that you run and it gives you all of the dev environment out of the box. And the beautiful part about that is that it is the default experience on the new 16:14.02 James RTX Spark devices in partnership with NVIDIA, including a device that I have personally, which is the Surface Laptop Ultra with 128 gigs of unified, beautiful, unified. 16:16.87 Frank oh okay 16:28.77 James Satya said it a thousand times, a unified memory. 16:31.55 Frank you know 16:32.65 James unified, which means that this puppy can run like the NVIDIA superstar 120 billion parameter model Kayla was running on stage without a hiccup, which is fantastic. So you can run those models locally in your machine, baby. 16:49.83 Frank Don't make me jealous. Okay. ah You gotta say unified a lot because RAM is so expensive. It better be unified if I'm not buying two sets of RAM. 16:56.50 James Unified, baby, it's unified. 16:58.88 Frank I'm not buying GRAM and Normie RAM. You can't have both of those things. It's too much. Man, that spark. 17:06.55 James Yeah. Mm-hmm. 17:07.26 Frank those Those new machines look nice. I know I'm not ever going to afford them because I spend all my money on overpriced Macs, but It is very tempting to spend money on an overpriced NVIDIA machine because i have definitely gotten the local model disease. 17:23.75 Frank It's a disease, a plague. It's going around the whole internet right now. we've talked We did a whole show on it. I'm enjoying all my local models and new local models keep coming out left and right. And I want to try all of them, but then my Python environments get really weird and I refuse to write. Anyway, going off on a tangent. It'd be nice to have more RAM. 17:43.77 Frank Be nice to have unified RAM. 17:44.03 James Yeah. 17:45.98 Frank Be nice to have unified RAM on a cute little laptop, though. 17:46.50 James Unified. 17:49.26 Frank I hope you put like a refrigerator underneath it. 17:52.44 James it um It stays pretty cool, actually. um i don't know what the dynamic I don't know how they did it, but it stays pretty pretty pretty quiet. 17:54.74 Frank Does it? 17:59.20 James They did announce a new dev box, of the RTX Spark dev box, and that looks like a squished down Xbox Series X that kind of looks like a PlayStation 2 in a way. i really, really, really, really want it. That is for sure. And that is what Kayla was demoing on stage. It's beautiful. And it's like coming out this fall or something like that. So I want that. 18:25.21 Frank Can you make sure to put that in the show notes so I can check it out too? Because I don't think I saw that one. 18:28.45 James Yeah. 18:30.63 James it's It's beautiful. It's a great thing. But talking about models, let's talk about the brand new MAI, the Microsoft AI models. 18:37.18 Frank Wow. 18:39.49 James Mustafa Suleiman got up on stage and announced seven homegrown non-distilled. 18:45.02 Frank Seven. 18:48.85 James Non-distilled from the ground up. 18:50.43 Frank Nondis still. 18:51.56 James Hill climbed from zero to the top. 18:53.21 Frank Free range. 18:55.54 James Free range, organic, models, Frank Kruger. 18:58.94 Frank Oh yeah, organic. 19:00.37 James Why? 19:01.70 Frank Hey, I'm really excited about these, James. Yes, the my. my I'm pronouncing a my. how How did everyone else pronounce them? I didn't actually see them speak. 19:11.48 James what Well, Craig, I asked Craig if it's because he works in MAI. 19:12.76 Frank May? My. 19:17.46 James So we say MAI, Microsoft AI, but also my is also acceptable. 19:22.39 Frank Way too many syllables. Yeah. 19:23.93 James my my My voice, my transcribe, my code. 19:24.23 Frank My is better. They're my models. 19:29.48 James i'll do like my. My my is fine. 19:31.03 Frank Yeah. So these are the MAI models and it's like Microsoft, welcome to the chat. It's Everyone's been critiquing Microsoft because there obviously been early investors in open AI. They made partnerships. They they serve a billion different models. They're they're playing the will serve anyone's model kind of game with foundry and all that kind of stuff. But it's nice to see Microsoft putting some money into training their own models to be quite, you know, and these are decent, decent models that they're working on. Phi, fee, full, fun. was a smaller, decent model, but smaller, kind of in the Apple intelligence realm, which is the non-intelligent realm. So these are some actual good models posting some good numbers. I always pay attention to, I'm not a big benchmark person. I don't really believe in these benchmarks, but we need something, some way to compare these models. So I look at the SWE Bench Verified, just because we have the most models there, and it's scoring like a 70%. a 70% on ah fundamentally pretty small model. It's 31-ish billion active parameters, 2 billion um potential parameters because it's a mixture of experts model, which means it turns off a big chunk of itself. So it trains 512 versions of itself. This is all hand wavy, ignore all this because it's not how it actually works. And then picks eight of the 500 to actually execute in real time. That's why it's 21:05.17 Frank 30-ish billion active parameters, 2 trillion ah possible parameters. That is not large. So like, 21:14.46 James This is the MAI thinking one model, I assume, that you're thinking about here, right? 21:17.94 Frank yes, this is thinking one. Sorry, yeah, there's also the coding flash, the image one, and there's an audio one also. 21:19.83 James Thinking one. 21:24.18 Frank That's why there's so many of these models because they are smaller, so they fine-tuned them to um specific scenarios. But it's the thinking one that's kind of the most interesting because it's your general purpose one. 21:36.01 Frank And even though they call it thinking, if you read the technical report, James, did you read all 109 pages of the technical report? 21:43.16 James I did not. The one thing that they wonder going deep into and specifically about it is they say all of these models show the same infrastructure, the same commitment to clean enterprise glade grade data lineage. 21:51.56 Frank Mm-hmm. 21:54.28 Frank Mm-hmm. 21:54.65 James We do not distill from other labs and we do not rely on opaque data. Our data sets are clean, traceable and enterprise grade. They're designed to work together and to integrate directly into the products people use every day, but the models themselves are only part of the story. 22:07.10 James So this is the ground up approach and yeah, Mustafa, was talking specifically about about this, and they were put out a big report, and I did not read said report, but I believe my company blindly. Yeah. 22:21.28 Frank I read every page of it. Maybe not every word on every page, but I read every page of it. 22:26.58 James Yeah. 22:28.47 Frank um Because it's actually a really fascinating report because they actually go into their training details. And a big part of training is what data you provide when, for how long, that kind of stuff. 22:41.53 Frank um Yeah, they call it a hill climbing model because they're trying to figure out scaling figures like, no one can train a giant model 30 times in a year because they're giant. They take a lot of money and they take a lot of time to train these things. 22:54.28 Frank So they keep using this hill climbing analogy, which is quite simple that they're trying to figure out what patterns they can test out on smaller models that they hope when you make the model bigger, still reflect on those bigger models. 23:06.68 Frank So you can, figure out what works and what doesn't work on the smaller models. And then when it comes time to train the bigger ones, you can just do what works and you're not doing so much experimentation with the bigger training. 23:14.55 James Yep. 23:17.30 Frank um And honestly, these are basically industry secrets. um the The training method is becoming kind of the most important method because if you have a bunch of money, you can make a big network. 23:31.48 Frank I today could go out there and make a trillion parameter network, no problem. I could even get it to train, but I do not have the time or money to actually train it on good data. Neither do I actually have the good data to give such model. So the number of parameters in a model does not reflect its intelligence. I could take a 1 trillion parameter model today, create it myself, And it would be the dumbest thing ever you'd ever seen. It will not perform. It will do garbage because i don't have the knowledge or time to invest in its training. 24:04.76 Frank Whereas if you read this 109 page ah technical report, they go through basically all their ah training setup. And a big part of that is how they mined all the data. 24:16.88 Frank They were very careful to be very clear about, you know, we obey robots text. We only access public GitHub repos. They have huge deduping things. 24:28.24 Frank They have things where like they have the problem of a lot of the code out there now is AI generated. And they were trying to remove AI generated content from their training system. 24:38.19 James Yeah. 24:40.80 Frank you kept saying it's not distilled. And what that means is you can use other LLMs out there to generate training data so that you can train your network, but then you inherit all the little foibles and biases of those networks. 24:57.83 Frank So them saying they use non-distilled data is them basically saying they use human-only data, which is scary. Yeah. You know, I think a few years ago I would have said, yeah, that's the only way to go. 25:10.95 Frank These days I'm starting to wonder, like human data is so messy. You know, you go to Stack Overflow these days that has non-AI code on and you're like, wow, humans are mediocre programmers at best. 25:24.99 Frank Maybe it would be better if there was some AI generated stuff on here. 25:25.08 James Yeah. 25:29.72 Frank But um it's it's it's laudable. it's It's a good thing that they actually cared that much about the data because it's honestly a pain in the butt. The amount of work that they had to go through to get you your free range model was a lot. And the fact that they got it to work well without using distilled data is quite an achievement because you know tool calls alone, thinking models, It's so hard to bootstrap these things. Just think about like the training scenario you have to go through to do, sorry to keep using the word, to train these things. 26:06.17 Frank It's tough. It's really hard. It's much easier to just, you know, pay a few thousand dollars, generate a million tokens out of Opus and use that as training data. That's easier. 26:18.20 Frank They didn't go that route. 26:19.97 James Well, the interesting part about all the models all up is one of the things all about really efficiency and sort of being X amount more efficient than other comparable models. So the model for the thinking one is a similar to, at least on Sweep, Sweepbench Pro, be like an Opus 4.6. 26:39.99 James um but with less token usage and more efficient. They do say here from like user ratings, I guess there's a company called Surge that they do blind blind LLM testing, which is crazy to think about. 26:44.81 Frank Yeah. 26:53.18 James But basically people always seem to prefer the quality of MAI Thinking 1 compared to Sonnet 4.6 across single and multi-turns. And you know when you look at these things, you look at all the bench scores, all these things, it does, 27:09.72 James look you know really fantastic and especially the usage and the token usage, which I think is really important. overall because you know we're thinking about like, what does it cost? 27:16.26 Frank Yeah. 27:19.92 James Now we can't get actually access to the Thinking One models. That's gonna be in Foundry. It's in private preview, but there is another coding model, Frank, the MAI Code One Flash, which is more comparable to a Haiku model, but cheaper and faster and more efficient. 27:36.74 James So just because it is faster, if it's 10 times more efficient, that means actually, even if it's the same cost and it's 10 times more efficient, it's, less token usage all up but it's been optimized for GitHub Copilot, CLI and VS code so there's some ah homegrownness going on here and yeah it also ranks very well on all the benchmarks so Sweebench Pro which is 51% AMME 92% with his match performance and instructions following of 75% 27:52.34 Frank Mm-hmm. 28:07.86 James um And it's in the box. So it was the last thing that was there. talked about some other models too. And we'll talk about some other cool things that they're doing. um But I love this one because I'm really excited that we have models in the box. I think it's, we you GitHub Copilot, one of the key differentials is the model choice, obviously, and bring your own key as well. 28:31.21 James But you have Gemini models, which are now available in the CLI too. but You also have um OpenAI models. You have XAI models that are in there. 28:41.66 James You have Anthropic models, DeepSeek. 28:42.90 Frank Deep seek. of my Oh yeah, sorry, in the box, not in the box, fine. 28:44.64 James Well, you got to connect to it, I guess. I guess, ah yeah. In the box, yes. 28:49.69 Frank Yes, right. so Yes, yes, gotcha. 28:50.38 James Well, you you can connect you can connect to anything in there. Well, I've actually connected to Open Router and then Open Router has that same 120 billion parameter model for free right now. 28:54.30 Frank Yes. 28:59.35 Frank Sure. 29:01.77 James so i can just use it for free, which is crazy. 29:02.89 Frank Oh my god. 29:03.86 James ah They have a bunch of free models you just use. um But I really love it because I'm a big fan of stuff in the box that's available. And I put out this tweet and you can quote me on it, but it's what I said is not every agentic coding task that you do requires Opus 4.8 or GPT 5.5. 29:16.60 Frank hmm 29:24.28 James They definitely probably do not need a million context window and You definitely don't need anything probably besides the default reasoning and probably that's even too much too. So stop wasting all of your precious tokens. 29:36.87 James I've been sitting down and I've been coding all day with MAI Code One Flash. I'm not saying it is a perfect model by standards, but I've been doing planning, integration on it, creating PRDs. I created... 29:49.34 James a PRD for that pet application that I do, and I put side-by-side other models and the same exact command with the same tools. I'm talking two AI credits. 30:01.07 James That is two cents compared to somewhere around 20 to 40 to 50 with other models because of their deep thinking and reasoning on it. 30:05.40 Frank Yeah. Yeah. 30:08.54 James And it's really good. So yeah, if you're used to haiku and you have certain tasks, it is really fantastic for that. But I've been using it a whole bunch. It's really great. I'm really enjoying it so far. And i have, of of course, having other models in the mix. But if I can, you know, use this model, which is really fast and um flash in the name there, then, you know, I can get a lot of work done for for a lot of the things, including planning. So it's pretty nice to have. 30:37.21 Frank You know, we used to have a joke that a performance is a feature because people are never about, oh, but I'll get to performance. Like no performance is a feature. Like the speed of your app is a feature of your app. The fact that people don't have to sit there and wait for something. Well, James, I have a new one. 30:53.07 Frank Um, price is a feature, especially in this new world of all these models just jacking up all their prices. 30:55.26 James Yeah. Yeah. 31:01.02 Frank and I'm not talking about just Copi. Anthropic has gotten expensive too and everything. Everything's getting more expensive. um The fact that like it may not be as smart as an Opus 4.8 or a Mythos or anything like that, Yeah, but I don't have that. 31:15.83 Frank The benefit is I don't have that mental barrier of like, oh, I better word this perfectly because when I hit enter here, it's going to be a dollar gone. And if I do a bad job, then I have to spend another dollar or $2 or $3. 31:28.28 Frank Now, with you know I forgot the exact numbers, but I think it was like 75 cents for a million tokens or something like that. like I can go back to saying good morning to the model like I used to. 31:40.50 James Yeah. Yeah. 31:42.95 Frank I'm willing to spend a cent to be polite. And so, you know, price is a feature to remove that mental burden, at least for cheap people like myself that don't want to spend a billion dollars but are totally addicted to the agentic style of development. 31:59.31 Frank It's a big deal to be able to use ah a cheap, fast model. I'm personally, I'm looking forward to, I'm on a trip right now, so I can't do it. I want to compare a kind of head to-head to head to my local models because these slash models are smaller. Um, but is it better than my beloved Quinn 3.6? 32:20.82 Frank Who knows? I don't know. I got to test it out. I got to do a little bit of head to head competitions. I might do a little blog post on it just cause I'm myself curious. And the only way to like really know is, you know, give it the same, give, one of them the same prompt five times, give another one that prompt five times and see what 10 results you get and judge them. You were mentioning in that odd what they one of the only benchmarks out there that works with these models are the human evals, where it's a blind A-B test of which one do you like better. 32:51.60 James Yeah. 32:56.61 Frank That is the only legitimate way to actually test these things at this point, because all the other benchmarks can be games. There's a lot of leaderboards that do this. Go find one. 33:07.10 James Yeah. 33:07.26 Frank And i want to do that myself. 33:07.30 James And it, Yeah, and it's super fascinating because they do all the benchmarking and all this stuff. There's a few things I think is really interesting about this model is that they do compare it to a Haiku 4.5 so you can set your expectations accordingly. 33:22.33 James However, so I mean, I think that's you got to set expectations like that's the correct way to do it. 33:22.42 Frank Okay. 33:26.40 Frank Mm-hmm. 33:27.22 James um But they say, you know, it's it's doing it in less parameters, only five billion active parameters. And it's doing with better price to performance across the benchmarks with 60% fewer tokens. 33:39.10 James So price is one thing, but also the tokens, if you use less tokens, are there. 33:40.48 Frank That's, yeah. 33:42.98 James So this is what's interesting about this. It says that specifically... um um that they were It says, they built code one flash with production workflows at center rather than optimizing only for benchmarks that it was trained directly with github coillot harnesses used in production it allows it to learn how to interact with surrounding tools and systems and ienttic coding tasks make you legally well suited for copillo workflows compared to other available models So it is really grounded in the GitHub copilot usage. 34:12.32 Frank Yeah. 34:15.00 James So does that finely tuned optimize? you know And you see other companies doing this like cursor with Composer, for example, right? 34:19.03 Frank Yeah. 34:21.22 James That's their own finely tuned. Now that is just on top of Quen, obviously. Now this is a non distilled ground up, but you know it says they they see code run flash solving hard problems. 34:32.18 James with 60% fewer tokens that helps reduce latency, lower costs, improve return on tokens and make interactive workflows feel smoother. And like that stuff is important at the end of the day, that's there. And I've been really surprised to be honest with you of, I did a new feature for the My Cadence app where I wanted to add in a um dynamic island feature. So the live activity feature. 34:56.69 James So I did something scary, which is I went into the CLI and I did slash research, which is a terrifying feature that does deep research on a task. 35:06.10 Frank love d two Love deep research. 35:08.89 James And I did it all with the MAI code one flash and it spun doing researching for several minutes. 35:17.87 Frank Okay. 35:18.04 James um so like 35:18.84 Frank There goes all your tokens. 35:20.31 James It was $1. It must have spun for like 10 to 15 minutes. 35:23.57 Frank Nice. Okay. 35:25.98 James And then I had it plan the feature based on the research. 35:26.08 Frank Okay. 35:28.34 James So in the context window, um i don't think it even compacted because it was pretty small. I don't know exactly what the harness is doing. It's something very efficient with those those research. And then I had an implement said feature. And from start to finish, it was about, I think $2.50. And that was a pretty long running operation. Now that said, I did then take 5.3 codex out. um And I did, well, I didn't actually review it right away. 35:54.61 James I pushed the code and I did a get a copilot code review, which uses action minutes. 35:58.18 Frank All right. 35:59.38 James um and had a few comments on it. And then I had 5.3 do an analysis. So from start to finish, I used about $4 worth of of credits. And that was a pretty long running me spending hands-on time because I knew what I wanted. 36:15.09 James i kind of was designing the feature, reviewing the feature, probably like an hour worth of of time. 36:17.49 Frank Mm-hmm. 36:20.99 James But that was between the two models and and going back and forth. And and iterating on it. I was really happy with the end result for something that would definitely take me a lot longer than one hour to implement and test in my code. 36:34.68 Frank Well, to be blunt, all that Apple Widgets stuff, it feels like a lot of these models do a terrible job at it, partly because there's just not a lot of code. 36:44.21 James So I researched. So I researched. 36:46.28 Frank That's why you did the research here. Smart man, deep research. 36:48.34 James Because I knew. i knew I did that research. I went through all the Apple documentation. 36:51.59 Frank They're terrible at modern Swift APIs. 36:52.95 James yeah 36:55.64 Frank um Let me ask you. um It's hard to judge these things. But was was the... um well did Did you do codex or GPT? Sorry, I already forgot. You had to do the code review. Was it worth it? 37:08.47 Frank Did it need to in the end? 37:08.73 James code reviewed Code review is worth it. Yeah. Code review found a few optimizations that it didn't think about. 37:11.32 Frank Yeah. 37:14.12 James well One thing, I mean, the code review is really cool. 37:14.42 Frank Okay. 37:16.03 James It said, hey, it seems like you're updating every two seconds, but you can update every 15 seconds. And then if you do that, you can drop this other policy. And then also you could like reduce this thing. So it actually, it was more of a performance analysis type of thing on it, which was cool. 37:25.71 Frank Hmm. 37:28.74 Frank Yeah, okay. 37:29.45 James And then i I don't know what model review did. I just assigned it to Copilot. um and then i went i could have done it with code one flash, but I just decided to say, okay, hey, we have the code review. We have some trickier details here. 37:45.94 James Let me, um you know one, um go off, pull down all the comments, review the comments. I had to give me an analysis of the comments, and then I had to implement the comments, push the code, and then I also had to update 37:56.68 Frank Okay. 38:01.53 James the issues and resolve the issues automatically. So I did a lot more than just coding. I could have just resolved the issues, my or issues the comments myself, but I was getting cocky. 38:08.41 Frank Yeah. 38:10.33 James You know what I mean? i was like, ah, just do it. 38:11.61 Frank yeah 38:11.85 James Just do the thing, right? 38:13.43 Frank hey 38:13.40 James um But you know a lot of people, for example, are like oh why don't you use 5.4? And I said, why don't you use 5.4? Because it's almost twice as more expensive than 5.3 codex. And I think 5.3 codex is just as good of a job. 38:24.04 James So why burn extra tokens and things that don't need a million contacts, which is more expensive anyways, um and just let it let it cook. 38:24.91 Frank Okay. 38:27.31 Frank Yeah. 38:30.59 Frank Yeah. 38:32.03 James So I think that's something of interest. 38:33.97 Frank Yeah, fantastic. I just want to go back to the token thing. So it was funny reading because I did i read that i read the paper. it was they They are actually using the 200K vocabulary that has kind of become a standard with OpenAI. 38:42.68 James Hmm. 38:48.78 Frank So if you do, it's hard to compare these models from a token perspective because the tokenization method they use can change between all the models. But you can actually compare the GPT and I think Codex models against Mei, Mai. 38:57.91 James Bye. Mm-hmm. 39:06.20 Frank because it does use the same tokenizer. So it is kind of a one-to-one mapping between the different tokens. And the fact that it's using 60% less tokens, I'm excited for that because i even see my beloved Quinn. It's amazing how many tokens you waste on like failed merges of code edits. 39:27.12 Frank And I think that's the real benefit where they say they trained with the actual harness of tokens. VS Code or Copilot. Ideally, it would have fewer of those kinds of stupid mistakes that are just burning tokens. 39:43.04 Frank you know Now that we're all paying for tokens, it's the worst thing to see. failed to merge. So I'm going to output a file and and then, oh, something got corrupted. I'm going to delete that file and rewrite it from scratch. 39:52.98 James Yeah. 39:53.56 Frank You're like, no, don't rewrite the file from scratch. 39:54.55 James No. 39:56.08 Frank just cost me $10. Yeah. um yeah So I think seeing these models tuned into their harnesses is beneficial for all of us. And I'm excited to see these kind of savings that we get from that kind of stuff. 40:12.44 James Yeah, and on top of that, there was a few other models. There's Image 2.5, which is basically at the same level of Nano Banana Pro out there. um Two that I'm very interested in, which is Transcribe 1.5, which is the world's best transcription soda. 40:24.56 Frank Crazy. OK. 40:27.96 James Out there, you can do transcription in 43 languages with domain-specific terminology. um I know that this is very good. I've seen it because it's it's used throughout a bunch of Microsoft tech already. And then there's also voice, which is very fascinating, which is natural sounding voice speech across 15 languages. 40:46.62 James And they also have a flash variant of that too. What I'm really fascinated about this is I really want, I was talking to Hanselman about this is but have all these models and I'm really interested in how I could compose these to streamline r podcast operation even more because all these are in foundry. 41:01.01 Frank Mm-hmm. 41:03.01 James I am doing a bunch of work today with some open AI models for some transcription, to take a transcript and do some things, but it is manual. I just would love to have an MP3 right after, but i don't even want to do anything. 41:14.29 James I want to create some sort of workflow that auto gens it, does a thing, bingo bango, creates all the beautiful artwork, does the thing and puts it all together. The cool thing about these models, Frank, is they're not only on Foundry readily available, they're also on Open Router, Fireworks, and Base 10, which means that you can actually tune the weights of the models themselves if you so desire, if you want to get them spicier. 41:34.48 Frank Yeah. 41:35.79 James But I would also say this, the other cool part that we talked about hill climbing One of my favorite demos was with Lando Lakes, the butter company, the dairy company, where they took um the Thinking One model and they did frontier tuning. This is a new service that is available that allows you to fine tune the MAI models and make it your own for your own business needs. so You're talking about all the data. If you're an enterprise, you can now feed all of that data within your own secure environment. and You basically create your own model on top of it. 42:09.07 James finely tuned and you hill climb with it. So they were talking about the hill climb of the butter. 42:12.30 Frank um 42:13.82 James I was in part of the keynote review and i was making lots of butter jokes. I wanted some butter Easter eggs, but Kayla refused to put a butter profile and in there. But this is really cool. So you define your tasks and what success looks like. You feed in your data workflows and maybe M365 data. And you you basically improve performance through training and iterative optimizations. And then you can deploy that a model in foundry or a co-pilot and you can continuously improve based on real usage, which is really cool. It's one of my favorite demos. 42:44.79 Frank Yeah, fun. um It's definitely the future of these models. like We're all still deal figuring out how to deal with them from like the model and harness perspective, but fine-tuning is definitely the future. 42:56.50 Frank there's There's only so much you can do with the context windows at the moment and system prompts. 42:56.95 James Yeah. 43:02.25 Frank Fine-tuning is the future, so I'd like to see all progress in those areas. um Am I crazy? Did they say they were going to open any of these models? Are any of them going to be released like so you can run them locally? 43:17.56 Frank i thought I saw something like that, but I don't want to put words in your mouth. So can you tell me? 43:23.03 James Great question. There are two other models specifically um that were released called Aion. A-I-O-N 1.0. 43:34.54 James There was two that these run locally on device. Aion1 Instruct brings smaller local Aion models. 43:39.75 Frank OK. 43:42.56 James So that's a compact on device language model for lightweight AI tasks. And there's AI Ion1 plan, which is a larger 14 billion parameter reasoning and tool calling model. 43:55.72 James And those are there. 43:56.52 Frank OK. OK. 43:58.44 James i don't actually know how those work, but they're like in the box or something. um These ones, I don't think you're running locally just on foundry that are deployed, but the Aon windows models, I think are newer models that you can like grab and like put into things. 44:11.91 James I'm not positive though. 44:12.19 Frank Okay, thanks for clarifying because I thought I saw something about local. 44:12.87 James Yeah. Yeah. 44:15.24 Frank Again, I'm i'm drinking the local juice right now, so I just had to had to know. 44:20.60 James Yeah. So. so 44:22.52 Frank Also, I have to say, um we've been talking about on Microsoft, but Google's not sitting back doing nothing. So they just released a Gemma 12B that is also multi-modal. 44:34.52 Frank A big thing I don't think people talk about enough is like, you can give images and and text and PDFs and all sorts of stuff to these models now. We're all programmers, so we're just feeding them text and having them read files. 44:45.48 Frank But it's cool when they actually take images also. Because then you can do the closed loop thing of like it generates some UI and it can actually see that UI, stuff like that. 44:57.28 James Yeah. 44:58.62 Frank So I believe all these models are multi-modal. And then I just want to give a shout out that Gemma just released a 12 billion multi-modal model also. 45:10.60 James I did see that and they released like some AI edge app that I think you need to install to get it or something, but I definitely want to try it. Yeah, that'd be cool. 45:18.23 Frank they They released it on all the normal places, but yeah, they released yet another app because it's not Google if they haven't written a new app this month that they'll cancel next month. so Get it while it's available. 45:28.92 James Yeah. Yeah. 45:33.72 James Yeah. All right. Well, tons of good stuff. That was only scratching on the surface. There was tons to go through. We'll continue to unravel stuff for Microsoft Build and also WWDC, which is happening. 45:43.77 James i think it already happened, Frank. 45:44.77 Frank Oh my gosh. 45:45.41 James It was like already happened. 45:46.42 Frank Oh, Jesus. 45:46.45 James i think Well, I mean, it is happening between then and now, but I think it's like today or tomorrow from when the podcast comes out. So we'll do some WWDC breakdowns. That'd be great. Yeah. 45:55.37 Frank Yeah. 45:56.01 James Yeah. 45:56.09 Frank instances I think it's going to rainy day. I'll spend a rainy day inside WWDCing. 46:01.16 James Yeah. Awesome. All right, everyone. Well, thanks for tuning in. Let us know if you're trying out the new models or any other exciting announcements that you had at Microsoft Build 2026 or anything anything you're looking forward to at WWDC or even Google for that matter. Let us know at forward slash at mergeconflict.fm. That is the best way. Leaving a sweet comment on our YouTube. We appreciate it. That's good for this week's Merge Conflict. So until next time, I'm James Watson Magno. 46:27.29 Frank And I'm Frank Krueger. Thanks for watching and listening. 46:30.49 James Peace.