MIX-11
===

[00:00:00] James: Well, welcome back everyone to Merge Conflict your weekly developer podcast where we're diving deep into world of. Augmented mixed of virtual reality. Ask stuff with vision oss. Yes. Visions has been out for us to start developing for the last several weeks, and we had a podcast on it where we talked a little bit about exploring the spaces and what you need to know, and Frank has been diving deeper into the world of three D objects for vision oss.

If you're brand new, the podcast, I'm James Watson Magno, and yes, that is Frank Krueger on the other side of the microphone. How's it going? Mad. 

[00:00:33] Frank: Hello, sir, I can't believe we're gonna do another Vision Pro. Uh, this is my opportunity to say again, I still haven't decided whether I'm ever gonna buy one of these things, but gosh darn it, if I'm not gonna play with the s d K and play hard and see what I can do with the thing and see what crazy new APIs they're adding.

[00:00:51] James: Well, I've always been a fan of attempting myself to build three D things. You know, like I said, I started my career in video game programming and doing three D things, you know, on a two D surface. Uh, but not necessarily in the mixed reality, virtual reality, augmented reality space. Uh, and for me, you know, I think that the space has always been very intriguing to me because the very first time I got my Nintendo three dss, uh, a few cool things happened.

One, it was. Like three dimensional coming out of the screen, which was really neat. It was like mind blowing. But they also had the ability to do augmented reality. They had these really cool cards that you could put down and they were like little question marks, and you'd put them down and three D figures would pop out of them.

Right. But you're seeing them also in. Three D and they're doing stuff because all this cool stuff. So to me there's a lot of coolness that goes into that. But I am not a graphics designer. I'm not a three D model. I took a, you know, a Maya class or whatever in college, you know, so I made a teapot, but beyond that, I really don't know what I'm doing.

And so the, the thought of moving. Into the three D space is very scary for me, right, because I understand the two D level system, right? I've been building desktop apps, web apps, mobile apps. Tablet apps, TV apps, whatever, for a long time. And I feel comfortable. And a two d, you know that Z axis? I do not. I'm not, I got away from it after game development.

I got like Z axis, get outta here. Boom. I like to live in a, in a, in a two D world, uh, Frank Krueger, but they're forcing us to move into this three D world. That's also kind of two D, but also three 

[00:02:36] Frank: D. Well, it's, it's three D. It's three D. We're, we're, we're just, we're just helping the two D world, pulling the two D world into the three D world.

That's what we're doing. And yeah, uh, God, you hit the nail on the head there. I love three D. It's what got me into programming. It's what I love to do with it. But at the same time, I'm not a good three D artist. Um, I know how to use every CAD program a little bit. I can get usually a square. Usually I can get a hole, put a hole in the square if I'm feeling lucky.

Um, I, I write CAD software and yet I'm still not a good artist. The designs I come up with are functional. They're simple and they work and they're basic shapes and all that kind of stuff, and I love it. It's fine. I'm not an artist and so, My goodness. Like I would almost say my whole career has been me finding ways to have computers generate art for me.

Um, I'm really into neural networks. We talk about 'em all the time and the neural networks I'm really into are the generative ones, like, so your, your dollies and things. But I like to run those in three D also making three D shapes. But yeah, we we're in this world of augmented reality where you wanna bring three D shapes and.

Augment the person's three D world with new three D shapes, and you gotta get 'em from somewhere, James. And you're not an artist. I'm not an artist. Even the Fiverr artists are usually more like $50 artists at minimum. Um, what, what is one to do? 

[00:04:15] James: I, that's a good question because we've talked before on the podcast about ICI three D, which is one of the applications you make, which is three D, which is the app that I'm pretty sure should be.

On the Vision Pro, we talked about your three D holograms. And we've also talked about you scanning a shoe and turning that into a three D object as well. We've talked about multiple three D things. That's probably just some on, in the 585 episodes that we've done at this point of building three D applications, uh, in general.

So I don't know if any of those, I. Mechanisms are gonna work. But what did you do for eye circuit three D of those things that were generated out of primitives, basically, or No. Okay. 

[00:04:55] Frank: Okay. So I'll, I'm gonna, I'm gonna toot my own horn a little bit, please excuse me here, sir. Uh, that's the best ar artistic, uh, CAD modeling I've ever done.

Each one of those parts is me, myself generating Wow. In a various assortment of ways. I wrote sub-programs and programming languages so I could write the shapes of the parts. And it, it was complicated. Probably a little overly complicated, but um, you know, I wanted my resistors to look a very specific way, and I wanted them to be resizable dynamically.

So the shape generator had to be a dynamic shape generator, and that is a way to do it. Everyone, you can just write code to generate three D shapes. It takes a lot of math. Uh, you gotta, you gotta type a lot of numbers in and recompile a million times and see how those numbers affect things. But that's one way I don't recommend it.

Um, side tangent. Um, in, in the like, uh, three D printer world, uh, people are actually starting to use programming languages to generate. CAD files. So, uh, one popular one is called um, sca ss, as in the letter, uh, open SS ad is a whole programming language devoted to generating three D objects, three D solid objects.

Super cool. Uh, there's another one, um, that's a little more tame, which is, um, JS CAD Open jss cad. It's a JavaScript one. You write JavaScript to create three D objects. Uh, back in the day I wrote one of these for C, you could write C sharp code to generate three D objects. So there is a place for that. But it's tedious and your designs usually are come out simple because you're running literally every, every bit of that object in code.

[00:06:44] James: Yeah, that makes a lot of sense. Well, and like you said, you also need to know math, right? I think when we were building the game engine out, it's a lot of math, it's a lot of computation. It's there. I mean, luckily at that time we were at least running direct X. So if you get thing into a dot X model format and then things would magically.

Work, uh, you know, I, I, but you know, the designers, they, that was their part, right? They, we dropped it in. I was more on the. Rotation of the ship when things got hit. Mm-hmm. The shield, the bullets, the shaders of the, you know, different atmospheric things that were happening on in space and, and then just sheer, you know, collision.

Right. You know, I mean there's two D collision now, like a three D collision. You wanna make sure you're ships aren't just floating off to the ether matching, you know, into the ether, magically or whatever. And then making sure that things hit, things happen accordingly. Just like, I keep hitting my microphone for some reason on this podcast, but, um, Yeah.

I, I think that that's the, the problem I've always had is that it seems like those takes a lot of time, right. I'm not an artist. Yeah. So it really scares me, uh, in general. Um, but do I have to do that for the vision? Pro Vision oss like, is that, is that, that's an option, I assume, right? You could, you should just be able to take everything you did for Ice Circuit three D and like that should work for three D Modeling inside the Vision Pro.

Do they have backs? Do they have undersides? Can you can rotate them because you know, like in video games, like in video games, often, It's not even the full thing, right? Like often if it's a house, they just control. They just control the camera, right? So you literally, that's why the funny things is like when you go through a wall accidentally, there's nothing on the other side because they're not gonna render the whole geometry.

It's like it's literally half a house, you know? So that's why it's always kind of funny. That's 

[00:08:33] Frank: a fun example too. Clipping through walls. Um, clipping works because walls aren't actually solid. Walls are two, two dimensional, um, triangles oriented in a three-dimensional space to create the illusion of a wall that's not actually a solid object.

So you can clip right on through them. Fun. Um, but no, you know what? I actually want to go back to the shoe. The shoe that you mentioned, because that is a way to, uh, create some geometries and things. And the shoe, if you don't remember episode, who knows what we were talking about. Apple built in a sophisticated little algorithm called, well in general, the concept is called photogrammetry.

It's a terrible name. Uh, the algorithm is called Structure from Motion. What it actually does though, is you take a bunch of pictures of an object and it generates, uh, the three D object itself, um, into a nice, nice little tight format. And James, yes, it has both sides to it, which is actually a very impressive thing.

I want to actually talk about that, but we, we, we can save that a little bit for later, but that is actually an impressive thing because if you think about like, what if you wanna scan, let's say a shoe, And it's just sitting on the floor. You walk around it and you take a bunch of pictures of the shoe and you're like, generate me a shoe.

Well, what in the world is the algorithm supposed to put at the bottom of the shoe? It only has the one side. It doesn't know. So a very sophisticated thing in the Apple algorithm is that it can do both sides of the shoe. You can flip the shoe and scan both sides, so it won't be your cheating gamer thing.

It's gonna be. Both sides. And for the record, ice circuit three D is both sides also because I didn't cut any corners. Those are full, solid three D 

[00:10:23] James: objects. Of course they're, well, you know, I think the idea here, if we go roll this back, even before that, of taking a bunch of photos and turning those into three dimensional things is not necessarily a new thing.

Uh, but I think that the technology. This is what I wanna distinguish. 'cause some people might think, well hey, I've been on Redfin, right? I've seen this, you know, thing where I can like get a three D tour. Right? Or I've seen maybe like Google for example, like when you go on Google Maps and you tap on a restaurant, you can go inside a restaurant because like enough people have photographed the inside of it or they've gone in, they've done something or, yeah.

You know a good example is on my Android phone, there's a way to take a a three D spherical thing, right? But it mm-hmm. The idea there is like, it's not necessarily three D, right? Right. There's, I don't know what the, like, the thing with three D is like, is texture and depth, right? The depth aspect of, is that like, so I'm not sure how those, you know, um, applications that like, oh, just take a bunch of photos of this room and I'll figure it out.

I'm assuming it's mapping. Walls to figure out like meshing basically. But I don't think it has the depth sensor and I think is it, that's what sets apart this type of technology. Is the depth data information, is that correct? Or is that meaningless in this case? 

[00:11:48] Frank: Uh, no, you're, you're, you're right, you're right.

Um, I want to take it from a slightly different tact though. Um, I, I would say there's, there's, when you're talking about three D and visualizing three D there's two things to think about. There is, I. In what way are we storing the three D information? And then in what way are we visualizing that storage of the three D information?

And there's different algorithms for the same storage, different ways of storage and particular algorithms for that. The most common storage, the one that you want in vision oss, is a mesh. Your happy old mesh. Your triangle mesh. So just a bunch of triangles that are texture mapped. That is the format.

That is the format we've been using for 30 years for games and everything. It's old, but it works. We like it. Triangles of computer graphics cards can render them very quickly. And so a lot of the things that you were talking about, like the Redfin one, uh, that's actually different. Um, they're doing like the trick from, uh, photos synth.

Do you remember Microsoft photo synth? Yeah. They're doing the cool thing of. If you can figure out how a bunch of pictures that you took are or are positioned and oriented in three D space, then your representation of the three D world is just a bunch of pictures oriented and positioned in three D space.

And then you have to write a, a sophisticated little renderer that figures out. Which picture are you facing? Don't draw the ones behind you. Draw the ones in front of you in depth sorted order, and then you can do a very simple visualization. So does it have depth? That's a philosophical question because like as you move around, it gives the illusion of depth what is depth, but.

Everything's an illusion, whatever. It's not giving depth in the way that A three D mesh has depth in that. There are hard-coded numbers in that three D mesh file that tell you exactly what the Z axis is, what the X axis is, and all that. And so the neat thing that photogrammetry gives you is from a whole bunch of pictures.

It's not just a visualizer. It gives you a mesh. It gives you a bunch of triangles. Um, it can give you other things too. Point clouds, stuff like that. But really in this world of trading three D objects, you want a mesh, it's gonna give you a mesh, and that's what's so cool about it. 

[00:14:12] James: Now, when you did the shoe, right, that was an app that you had on your Mac and you had to like photograph it with your Mac camera.

How did you do it back in the day, this like a year and a half ago, maybe a year ago. 

[00:14:27] Frank: A year and a half ago, two years, two years ago maybe even. Yeah. Um, so I, I first did my first experiment when Apple introduced photogrammetry support. I. On Mac only, which was weird. It was still part of reality kit. And you're like, well, there's nothing stopping that code from running on iOS.

But it had to be like some kind of, it's gonna burn the battery or it's gonna use all the ram. I don't know what, but it wasn't on iOS. So the way I wrote it was, I wrote a Mac app, um, I think I even wrote it in Swift. I even spent some time on it. I polished it a little too much and I even started writing a transport between an iOS app.

So you go take the pictures with your iOS app and then it would, I. Easy transport it on over to the Mac app. Mm-hmm. Run the algorithm on the Mac app. I never finished it because I hate clunky solutions like that. I hate an app split in two halves and they have to communicate and I'm like, yeah, this is not sparking joy in my heart.

As much as I wanted some photogrammetry in my life, uh, it was not spark and dry. I should say. Just one little step back, my previous experience. With all of this was, um, a, I downloaded an app for my drone where I could fly my drone around and take a bunch of pictures and then it would turn it into a three D mesh, like, um, Google Earth style.

That cool. So it's like a Google Earth maker, but it was expensive. It was like a hundred dollars a month, uh, wow. To have access to that algorithm. And so then I looked into open source ones, and the open source one was, Very complicated to use that I had trouble, I got it to work once and then a day later completely forgot how to use it ever again.

Um, and then leading up to all this, uh, there's another open source one called, Uh, call map, C O L M A P call map, um, which is really good. And I even went through the effort of creating a t .NET binding for it so I could put it into my app and everything. It just had one problem. It used all the ram on your computer, so you would ask it to do a task and it just, Chewed through all your system resources, all the CPUs at a hundred percent Ram at a hundred percent and it's just burning away.

And so a real kick I got was I then ran the Apple algorithm and it finished quicker and used, you know, one 10th of the resources. So here's to a very large vendor with lots of money optimizing their libraries. 

[00:17:09] James: Yeah, that's the best way to do it, you know, I mean, you know, of course if any of those other ones worked like fantastic, but you know, if it's in the box, it's much, much better.

[00:17:18] Frank: And I'm completely planking on, there's a commercial offering that like, um, graphic designers actually use when they're doing like professional photo scanning and everyone basically uses that package. So it's a little bit Apple coming out with their guns loaded, uh, releasing what I think is a really good algorithm.

Can we go back to discussing flipping objects over, because I think it's. 

[00:17:44] James: Yeah. 'cause you said now basically that exact same app and logics wanna get it straight is just now available on iOS 17. Like, hey, they're like, oh now. We'll no longer throw an exception basically at this point. So your Mac app, you could just take that logic and shove it into an iOS app and then you're good.

[00:18:02] Frank: Yeah, they actually did two things, uh, to help out a, they enabled it on iOS. Uh, the a p I didn't even exist before, so it's not even like it threw an exception. It just didn't exist before. Wow. Uh, yeah, so that's actually the real reason I wanted to talk about it today because I'm just so giddy that we can actually do photogrammetry.

And you know what, what's really making me annoyed though, is the documentation says it's supported on vision oss, and yet, Vision oss Beta one, I cannot get it to compile. The library's just not there. The header files just aren't there. So, but it's on iOS, it's on Mac Catalyst. It's on Mac. It's been on Mac.

Uh, so that's the big announcement. That's why I'm so excited and wanted to talk about it today. But I also wanted to talk about it because it's cool what they added. Uh, it's very sophisticated and very powerful and, uh, We should talk into some of those bits. 

[00:18:58] James: Now, they didn't necessarily like release a complimentary app onto the app store that just lets you do it.

Do they? Because I mean, ideally what would be cool is like, give me an official Yeah. You know, image generator thing from Apple that's like, Hey, are you building vision OSS apps? Like download this tool that will help you create three D models. Right? And it'll be great. 

[00:19:23] Frank: So I, I, I wanna answer that question in multiple parts.

Okay. So because it's complicated, it's complicated. James a no app so far, but like, if it's gonna be supported on vision oss, doesn't it make sense that when you're in an AR environment, you can just be like, I need to scan that object. So I'm hoping that they have some Vision OSS app in the works because it's pretty obvious and I'd be writing that app right now if I could get the stupid thing to work in the S D K because it's so obvious, like it should be there.

Yeah. Um, so no, no announcement of an app. Um, Just the A p I and they started documenting it a little bit better. Documenting it in a way. James, I think you would appreciate, because Apple doesn't write docs. What do they write? Samples. 

[00:20:17] James: Yeah, samples. They love samples. 

[00:20:20] Frank: And what they released is the most over-engineered sample you've ever seen in your life.

It just has abstractions additional UI that does this UI that does that Wow. UI that does all these other things. So they have three total examples. You can go get that, give you a decent feel for how these APIs work and how you can build your own. But, um, I, I, I want to talk about their over-engineered one because it's hilarious.

It's basically the app they should be releasing. It's. Essentially an entire app. All it means is a project navigator and that's it. Good to go. Um, so what they did, not only did they enable the APIs, but they enabled some user interface stuff. Hmm. Before you were kind of left as the developer to. Go collect a bunch of photos from the user and that you could pop up a dialogue and say, please select a bunch of photos.

But what you really want is they're holding a phone with a camera on it. You really just wanna go take some photos. Yeah. And honestly, it's, it's not rocket science, like it's probably a hundred lines of code to get a camera preview up on the screen, put a little button on there. Whenever the button clicks, grab the frame somehow.

Write it down to discs somewhere, but it does have its tedious bits and all that kind of stuff. Um, well there's a lot of, yeah, when you're scanning something here, here, here's, here's that pause that's happening in my head. Are you scanning an environment, a scene, or are you scanning an object? Hmm. So what they've done, their a p i, the photogrammetry, a p I supports both.

You can go, you know, take pictures of your house and your lawn and all that kind of stuff, and it'll stitch 'em all together and give you a nice little model of it. They built a UI that helps you, helps the user take photographs of an object, and it's very object focused, so you're not supposed to be walking around your house and taking pictures like that.

It's very much like select the thing in the center of the screen, now walk around it and it plays like. Rotation music, like a little, a little dance jig, the better you're doing. And if you get too far away, it beeps at you and you're like, oh, I'm sorry. Oh my gosh. I'm sorry. Little guy. It has animations on it.

They have like four whole swift files devoted to animating this one little icon on the screen. Oh my gosh. And it's a weird mixture of the OS provides you. 50% of that, and then this very over-engineered app provides you another 50% of it, and so you really have to dig through and figure out what's all there.

In the end, I decided that was all very complicated and I'm just gonna write the a hundred lines of code it takes. Yeah, to put a camera preview on the screen and click a button, but I applaud Apple for putting a UI there. At the same time, I'm like, the UI is obviously only 50% done because that's the sample app is way too huge.

The sample app should be like four lines of code. Here's how to bring up the ui. Yeah. The end. That's it. 

Yeah. 

[00:23:31] James: I mean, that's the problem, right? Like we often talk about this, like, what are we teaching the person in, in this sample, in this module, in this documentation? Are we, you know, we start with that all the time.

Like even when I do the workshop, like, it's like, am I, am I teaching people data binding right now? Am I teaching them like a list? Like am I teaching them the same or I teach, what am I, am I teaching them? Azure, am I just teaching 'em feature flags? Right? Like it's, it's so easy to con conflate multiple kind of subjects together because they're complimentary in a way.

It's like, oh, of course. Like, why wouldn't you wanna like, have this thing and then do that thing and like, this is how you would do it. But then really it's like you just want to get a bunch of photos and then shove it in here. Right. 

[00:24:12] Frank: Can I, can I just complain for a minute? Sure. I like to complain. You, you let me complain.

Oh gosh. Um, you like it? APIs used to be so simple. There'd be an object that did something, and then maybe you attach a delegate to it and it would send you events when it's doing something. Look at his, but no problem. This is the same kind of problem. They have a little, they have a little photogrammetry object and it needs to send you events.

But James, it sends you events in the most ridiculous possible way, imaginable A, there's one channel for like control events, another channel for data events. Uh, in the sample app, they wrap up the data event, async q. Into their own infinitely looping async function, which signals an async actor, which then launches a message on the global message queue, because in the end, that's all you really want to happen anyway.

And I just find it like crazy. This, this a p i that like in the old days is so simple. Like, and the T .NET version of that would be, it's an object with a bunch of event, uh, little event guys on it. The, yeah, the objective C way is a delegate object. The swift way is three different concurrency, abstractions all signaling into the ui, and it's a little bit mind blowing.

The good news is, um, look at their example and say that's, that's a nice ui. And then throw it out and start over from the raw a p i because the raw a p I isn't as bad as they make it look. 

[00:25:50] James: So you've obviously did that and now you're scanning your shoe again. And now let's get back to Mo. You sidetracked a little bit in good in good spirits though, too.

Scanning the underside of the shoe because if you remember, if I remember the shoe, most important shoe, if I remember the shoe, I don't remember how. What would you say the percentage of the shoe that you got with the Mac app before? Was it like 80% of the shoe, 90% of the shoe, a hundred percent of the shoe.

I forget if you, did you get the underside 

[00:26:16] Frank: of the shoe? If we're talking surface area, it's probably less than all that. Um, probably like 60%. Um, I only took like 15 photos and I took them very poorly because I had no idea what I was doing in the beginning. It was my first real experience with photogrammetry.

Uh, so I missed important things that you learned, like I missed the inside of the shoe. Turns out shoes have a. Place where a foot goes and it's deep and dark in there and you need lots of light. Yeah. Part of Apple's thing is it just, uh, that they're built in UI is it sends you events whenever it wants to warn the user of something and the event they love to send you is.

More light. We need more light. You just flood the room with light. If you're three D, scanning 

[00:26:57] James: an object. Well, because ideally, if you're three D scanning an object like this would be almost like a photo shoot. You know, when you see like a family photo shoot or a dog photo shoot, they have that big like white screen behind them.

It's like sitting on a, that's like sitting, it's like all nice, like that's the type of site you wanna get rid of. All of the. Extra things inside of it, and it needs to almost be dangling in three D space in some way. And then there's lights just flooding into the room Yeah. To make this happen. That that's what ideally you should do.

But I'm assuming you don't have that setup at home. Frank Krueger. Oh no, 

[00:27:31] Frank: I just took pictures on the carpet. That has a little stain on it. A little bit right there, so, you know, it's fine. Um, yeah, actually I think they recommend like a medium gray, so like the photo thing, but not all white. You want a medium gray and you just want enough light sources, enough bounces, enough diffusers where you have minimal shadows.

Shadows are your enemy when three D scanning. Reflections are your enemy. Mirrors are your enemy, and shadows are your enemy. Um, but other than that, you can just, um, throw it on the floor stains and all their algorithm is really good at finding a horizontal plane. It wants a horizontal plane, so you can't, you can't three D scan things on a wall even though you totally can.

Their algorithm's powerful enough. Uh, I've gotten away with it. So things just like hanging on a hook on the wall. As long as I can see a floor to kind of get its bearings a little bit. At first, you can start, uh, Scanning other objects right away. Uh, once you have all of that, Um, totally forgot what we're talking about now, James.

What do we wanna do after we've um, gotten it level? We wanna walk around it. You wanna walk 360 degrees? You can't do that. You're gonna walk 180 degrees, get the whole hemisphere, and then do the best possible mode ever, which is engage. Flip mode. Finally. Ooh. You can flip. The output 

[00:29:00] James: is that is like you put, you have to tell it.

That is like a boo. Basically. 

[00:29:04] Frank: Wow. Yeah. You have to signal to the a p i I have instructed the user to flip the object. Wow. And then the user has to confirm it's a dead man switch. No, I'm just kidding. That means you have to make sure that they, uh, you don't wanna reengage the camera until the user has done, done flipping the objects.

You don't want that. Yeah. Transitional flip. Just want a nice concrete flip. And the reason you have to do that is when it's, when you're doing that first pass, it has to align all those cameras. It has to figure out. Uh, each photograph it has to figure out what was the orientation and position of the camera for each one of those photographs.

And it does that by playing a little optimization game, looking for similar patches on the object, a little red dot. It would focus in on that red dot, trying to find that red dot on all the pictures. Run an optimization algorithm to figure out the positions of all the cameras. But if you flip an object, you just ruined all of that, all that optimization algorithm is out the door.

Uh, so it's a nice, you know, it, it's basic. It is. You know, it's obviously a required feature, but it's funny how many photogrammetry things don't do this. So when you say I'm flipping the object, what you're just signaling too. It is. Treat that first group of cameras or photographs is. One set of orientations for the object, treat the others as another.

It actually has to run two passes and then intelligently optimize all of them together and try to pace together the object from that. So it's, uh, it's a clever algorithm because it does, it distinguish the object from the background, from the floor, from the plants, from the other things. It's, it's not being.

Simple. It's being very, a little aggressively complicated actually sometimes, but that's what allows it to, after you flipped, it took a new set of photos. It's able to integrate them all together. 

[00:30:58] James: Nice. And then it just, you run it through the algorithm and it gives you a mesh pack or like, yeah. Is there a way for it to know, like the percentage of accuracy, like, because you know, like let's say I'm doing a, a three D sphere image or whatever.

It's like, hey, you're missing this thing. I'm assuming it's just gonna do its best educated guess with whatever you get. It doesn't know anything. It, it's not, it's, it can't be like, oh, you, we think that this is 80% complete. Please take a photo of this. I'm assuming that's not the case. 

[00:31:26] Frank: Well, actually a lot of that is built into the UI that they have oh slash over-engineered sample app.

So they have a cute little animation of like, uh, you know, when, when you do like face ID thing. And it does like the, the circular progress bar. It 

[00:31:43] James: does that. Yeah. Oh, 

[00:31:44] Frank: interesting. So that, that's baked in now. Um, do you think that, 

[00:31:48] James: do you think that this technology is the same as the three D, like as the face id because that's doing a three D mesh?

No. Okay. 

[00:31:57] Frank: Uh, I wish, um, well, okay. With caveats, uh, it is not using the depth sensor, so this can all work with just photography. Oh, so that's the, that's the fancy what's called the structure from motion, s f m algorithm. Hmm. Uh, with no depth information, just, just comparing images, it's able to figure out three D information.

[00:32:22] James: Wouldn't it? Trix, wouldn't it, wouldn't it be better with the depth information or no? Or does it not 

[00:32:28] Frank: matter? Certainly, sir. Um, in fact, you can feed it the depth information, but all it uses it for is to scale the world. So with, just, if you think of it, just going from pictures, um, there's ways to orient and make cameras, make any object.

Look any size or any size, you know what I'm saying? You can, yeah. The size is relative in a photograph. Uh, so to give it an absolute size, you can feed it depth information. It can integrate that to scale things, but no, it doesn't actually use it for generating the mesh. Hmm. That said, I'm working on an app that does.

So, uh, the big problem there is that, um, the depth map is often lower resolution than the. Image and so you can actually get higher quality data if, if you take a billion, million little images. No one ever wants to take enough images. We all take like 10 and we're like, that should be enough. But really you wanna take like 30 or 40 and then the algorithm can do a really good job at pulling 

[00:33:34] James: things out.

Gotcha. Nice. So then with these mesh files inside of Reality kit, you can just render them and display them easily enough then? Or like how does 

[00:33:47] Frank: that work? Yeah, actually I should enumerate what, uh, you can get from it. So, and I wanted to go back to, you were talking about, does it give you error information?

Oh, yeah. Oh, it's Apple. It's Apple. No, it gives you nothing, nothing. It's, it's frustrating how little the a p I gives you, all they wanna do is. Plop this mesh into your hands. That's what they wanna do. I want more information. So, um, although the a p I is at least two years old, uh, they finally added one more bit of information that you can get out of it.

So the things you were able to select in the past, um, were the boundaries of the object. Yay boring. But you know, there could be simple scenarios where you're just trying to roughly measure something. You can get a point cloud where you just, all the points it used to figure out the three D information in the scene will come back as three D points with colors.

And honestly, that's just a quick and cheap way to render it. And point clouds are often just fine. Honestly, people get used to 'em. They kind of look like holograms, in my opinion. I think they're kind of fun and interesting. Um, then the mesh can come in two flavors. Uh, the one that they used to support the original was it just wrote it to disc.

It would write it to disc in, um, U SS d universal scene description format. It's a Pixar format. Talk about over-engineered. It's a super over-engineered file format, but, uh, U S D Z is apple's preferred format, uh, on iOS for the last few years. And on vision os for sure, for shipping around different meshes they want you to use in U S D Z.

The Z is just a zip file. It's a U ss D file. A U S D file can come in a binary format or an format. Again, all super over-engineered, all complicated. Nice. And then lastly, you can just get the mesh back as an in-memory reality kit object, and you can just plop it into your vision OSS world and Bloom now.

Now you have a virtual object of your. Once physical object, you may now destroy the physical object. It's useless because you have the physical or the virtual 

[00:36:04] James: version of it. Now you know, if they do enable it on visuals, which you think that they're going to, do you think it'll be possible then to like walk around?

Like walk around There's, there's external cameras obviously do, do you think they'll enable it so you could just like use the Vision Pro headset? To be like, oh, here's a can. I'm holding up a can, and I'm just like, like with my headset, right. 

[00:36:31] Frank: That, that's why I was so excited that I saw that all this was supported.

According to the docs in Vision oss, I was literally gonna write that app. Um, apple should release it, but Apple doesn't have a great track record in releasing apps they should release. Yeah. Uh, so I was gonna just have a day one app to do it, but so far I haven't gone into build. But you're absolutely right.

Uh, the thing is covered in cameras to do a fantastic job. Maybe I. Rotating in your hand would be one way to do it. It'd be nice if you could like clip out your hand or something. But 

[00:37:02] James: yeah, like if you could do that, then you could almost build like the sims, right? It's like my house. Yeah. Like I'm gonna like, I'm gonna, here's the things and like, oh now this thing's over here.

Now like here's my setup. And now what would be really cool right, is like you have your desks like desk set up and then you could have like virtual things that are like important or whatever. And then you know, you do some stuff. I dunno, it'd be kind of cute. 

[00:37:23] Frank: Or just silly things like if you're about to redecorate your living room, like go do a quick scan of your living room so that you have like a before and after.

I was thinking like when I throw out an object or get rid of something that I've had for a long time, I'm probably just gonna do a quick three D scan of it. A friend once told me like, if you have an object that you're attached to but you still want to get rid of, just take a picture of it. At least you have the picture.

Oh, now we'll do a three D scan of it and have a little three D 

[00:37:48] James: object. Oh, why do you think that? My OneDrive as is 98% full all the time? 'cause I'm always storing as many things as humanly possible. Just like take a photo of this, take a video of that just oh my, I don't want, in case I delete that file locally, might as well upload a copy.

What's on that U S B thumb drive? I don't know. Now it's on OneDrive. Like literally like I have like, I think. Because you know, you have a bunch of like SSD cards and thumb drives just sitting around. You don't know what's on them. You don't know if you need them. After you plug them in, you're like, Hmm, these files.

Hmm, I don't know what this is. Boot disc? Probably. 

[00:38:19] Frank: Yeah, it's a Linux installed from four years ago is usually what 

[00:38:22] James: mine is. Mine's like, oh. I'm like, okay, zip it up, zip it up, put it on the cloud. Like, I don't know. And then it's just like, ran. And I'll be like, thumb drive found on March 23rd, 2020. And you're like, okay, cool.

Like, great. Nailed it. So yeah, so 

[00:38:39] Frank: I, I'm really curious to see what happens. Um, There. There's already a lot of three D scanning apps on the app store. They all just found some open source library, shoved it into an iOS app, got it on there. Or the terrible ones make you upload all your data to a server and then they just run it on the server.

Only a few of 'em I think will actually run on device. So now there's a pretty easy way, definitely a big example. You can just go copy and paste and rebrand it to yourself and say, Hey look, I have a three D scanning app now. Um, so I'm curious if there's gonna be a bunch of those or if people aren't gonna bother, because like I said, the market's already pretty saturated.

Um, but then you have weirdos like me who still think I have a few. Games to play. I have a few little tricks up my sleeve that will introduce some new things that aren't the other ones aren't doing. So I'm, I'm still gonna give a try at it, but I'm curious if, uh, I'm entering a space that's gonna be flooded with copies of the Apple sample app or if, uh, Or if there'll be like vision OSS ones I, I would hope there'd be vision OSS ones.

Yeah. Anyway, now that it's a simple a p i, it's baked into the oss. I'm just curious to see if it gets to any traction, and I really hope it does because I would love a crazy three D scanning future. Yeah, 

[00:39:59] James: I think I, I'm gonna go download that sample, um, to send me a link to it. So we'll put it in the show notes and definitely get that up and running in Xcode, that's for sure.

And get it on my phone. 'cause I feel like that'd be a fun sample to see where it's like this sample is a fully fledged app. Right. You know, it's Okay. Cool. Thanks. We documented one a p i with uh, 85 hours of engineering. Thanks. Thanks 

[00:40:19] Frank: Apple. Perfect. It took me 15 minutes to like find the a p I that like sent me to that app.

Nailed it. Nailed it. I got lost in Swift Abstractions. 

[00:40:29] James: Classic. Oh my goodness. Alright, well I'm gonna go give it out. I'll give it a try, that's for sure. Let us know what you think. Are you gonna be one building three d cool things? Uh, do you want to be scanning three D things? Uh, right into the show. Merge conflict fm.

There's contact button you can hit us up on. Uh, uh, I don't know what the name of the, uh, the social media platform we use is anymore. Is it, is it Twitter? X 

[00:40:53] Frank: x? Are, are we off threads already? Twitter. Oh. Oh yeah, Twitter x I try to ignore all branding, so I'm just gonna call it the 

[00:41:03] James: Twitter. Twitter you can search Twitter about and then you can tweet, but then it's very strange.

Um, yeah. Am I still on threads? No, I'm not still on thread. I'm still having an account on threads and I, you know, I'm not a big social media person, you know, and, uh, I, I do not. Mostly just Instagram. That that's pretty much it. Yeah. And then not even really that much. Cool. I don't even post anything. I just like browse cute dog photos.

So I think that's what the internet is for, right? Just joyous animal photos. 

[00:41:36] Frank: I. Not everyone needs to contribute. We just need a few cats a day. It's, it's enough for the entire world to get by 

[00:41:43] James: on. Exactly. Uh, well hit us up on the Twitters, uh, at me, confi fm or James or Proclaim, we're somewhere on some social media account somewhere.

Yeah. Excellent. Let's do for this week's podcast, uh, if you have ideas, we're coming up on three. What are we coming up on? We're coming up on 7,000. No, we're coming up on close. Oh yeah, three 70, which will be next week. If you have questions that you want us answer, that's a lightning topic right into the show.

Tweeted us, do anything, give us your topic. Five minute topics. We'll be covering up next week, but until then, um, I'm James Montemagno. 

[00:42:23] Frank: And I'm Frank Krueger. Thanks for watching and listening. 

[00:42:26] James: Peace.