HH64-14-01-2022_mixdown.mp3

Harpreet: [00:00:06] What's up, everybody, welcome, welcome to the artist Data Science, happy hour. It is Friday, January 14th, 2020, to the second Friday of the year. Now we're going to be here for another 50 more Fridays after this. I hope you guys stick around for that. At some point this year, we're going to hit. Probably at some point the building is going to be hundred happy hours. That's kind of wild to think about, man. So thank you all for being here. A couple of announcements. Hopefully, you get a chance to tune in to the episode that was released today with the one and only Data whisper. Scott Taylor his second time on the podcast. It was a great, great conversation. We just talked about. We talked about him, we grew up. And, you know, like his origin story is a great conversation. So do you tune into that? Hope we got a chance to join me for the comet office hours. You know, we've been revamping that thing, working real hard and making it making it as amazing as it possibly can. Earlier this week on Wednesday, we talked real quick about pretty much how to translate, you know, machine learning problem into a rather business problem into machine learning problem. This week, coming up on Wednesday, we're going to talk about baselines and the importance of baselines. We also got a community member that's going to be joining us for a interview, and a panel type of discussion was just us chatting. But hope you guys can make it there. All of your questions will be welcome. So I'm looking forward to that. What else? In other news? Not much, man. Just just just here. Happy to be here. Happy to make it happen. Shout out to everybody in the building. Russell, then Eric S.

Harpreet: [00:01:43] Eric L Bellamy is in the building. Albert Bell, me. And of course, Serge. Serge, thank you so much for sending me a copy of the book. I've got it sitting right there. I'm excited to dig into it, and you know, we'll bring you onto the podcast at some point. You know, maybe early spring time to to chat, [00:02:00] taking a break from recording live episodes for a while. Also coming up this week. I'll be presenting at oDesk on the 18th this Tuesday, January 18. Twenty twenty two. Talking about Ml system designed for continuous experimentation. So definitely check that out if you can get that going live all over the place. And then also the following week, the 26th and 27th. Well, the 26th is Wednesday, so that'll be the office, our sessions, the later in the day, we're doing a webinar with Pachyderm. Join the webinar. So definitely tune into that. And then the following day on the twenty seventh doing another presentation lessons from the field for building your Ml Ops strategy. This one's going to be hosted by Kong Analytica, so check that out. I'll be posting about that on LinkedIn throughout the week so you guys can register for that and join us. Shout out to everybody. Join in on LinkedIn. If you've got questions, please do. Let me know if you're watching and if you're just enjoying what you're seeing on the screen, go ahead and smash your reaction. Give me like, give me, give me a heart clap or something. Let's kick it off with any questions. I didn't come prepared with a question this time. So if anybody wants to kick something off, please do let me know I'm. I'm all ears because I'm super unprepared today. I'll throw something out there. Yeah, please do so,

Speaker2: [00:03:26] Yeah, so I I want I was curious to know if any of you have done anything with media mix modeling. I'm just trying to I've kind of heard a little bit about it. Listen to a podcast or two and just wondering what your experience is or how you would kind of explain it. I'm starting to get a better idea of what it is and and I'm trying to understand, I guess I usually hear it in context of media mix modeling. Versus or maybe in collaboration with like multi-touch attribution, so just kind of curious to hear your hear your thoughts, last experiences around [00:04:00] the topic.

Harpreet: [00:04:01] Yeah, definitely. If anybody has any experience with that, please do let us know. I don't know too much about it, but I do know that Cam Lee, I'm not sure if you're familiar with Cam Lee. Look him up on LinkedIn Cam Lee also on on Google. He's got a bunch of talks around that that's kind of his area of specialty. I know he writes about that and he has a right, a couple of blog posts or rather, I've seen headlines for a couple of blog posts about that as well. And he's been on a couple of podcasts talking about that. So that's. And that's sort of to say resource that could potentially help keep you off anybody else here to work with with a media mixed modeling. Media networks, yeah, yeah, media mix modeling anyone.

Speaker3: [00:04:46] Then I had to Google it. Yeah. And because I've never used, I've never heard that term before. And you'd expect, yeah, it's like a branch of just marketing and attribution, right? I mean, that's where you're going is what you're trying to figure out which channel, which ad is most effective and which one to attribute any sort of customer behavior towards. Is that word?

Speaker2: [00:05:09] Is that it? Yeah, yeah. And like the thing I think was kind of interesting is to me was like that you are isolating. It's the challenging part, you know, as you're trying to isolate the one channel's impact at any given time when they're all all moving parts and everything. And so just trying to think about how to pull all those different, all the different Data together over a sufficiently diverse and interesting period of time sounds pretty crazy. But being able to figure out the looking at, you know, like the diminishing returns curves for different channels of being able to turn down budget on one channel, recognizing that you're going to make it up in another channel, you know, theoretically, sounds awesome. I'm just curious for anybody who who's actually done it and like, you know, results, experiences, [00:06:00] battle scars, et cetera.

Speaker3: [00:06:01] So it's like way more complicated than, you know. I looked at it just like I said, I just did a quick Google search on it. I was like, Oh, yeah, that's it doesn't work. It's the the chain of decisions that a customer goes through and the whole behavioral process that they go through. It's like it's never the same twice. And so unless they're starting from exactly the same point and they're from exactly the same segment, and there's just so much more to it that trying to build something like that one dimensional and that granular typically doesn't work. It's a much, much bigger, bigger model in the end, if you want to do something like that accurately, because when you start doing the mix analysis, which you end up with is a whole lot of trial and error, and you end up going down the AB testing route and that will lead you to actually starting to do the data science that'll help you understand your customer better and you wander straight into behavioral modeling at that point. But yeah, I think I know where you're coming from, but if you want. It really is interesting, but it starts you down a road that you'll end up a year from now kind of going, Why did I ever start there? And I'm not saying that, you know, ridiculing. I'm just saying, like, maybe that was me seven years ago or something like that when we called it something different. But yeah, it's a way, way more complicated problem, but it's really interesting. And if you can start working down this way and build an experimental framework and like an AB testing framework, you can find some really interesting stuff out.

Harpreet: [00:07:40] Surge, you're about to drop some knowledge as well.

Speaker4: [00:07:45] Not really about this field, I don't know a lot about how to apply in in marketing. I mean, the most I've done for marketing is a long time ago. You know, a b testing things a lot [00:08:00] less driven by machine learning. Fortunately. The space of digital analytics is one I'm acquainted of, but not not to that, not applying those techniques.

Harpreet: [00:08:15] Shout out to to both Mark and Mickey, go in the building, good to see, y'all as usual. Always a pleasure. Please license with some of that knowledge.

Speaker5: [00:08:24] Yeah. So this was work that I was, like, very intimately involved in leading and planning when I was in. Well, when essentially when I was working as a sales data scientist and like finance analyst for Walk Me. And this was something that like. So there's a company, a couple of companies out there. Ok, so sales like analytics and marketing analytics is like really intimately tied together. Like there's this huge connection, especially when you start getting into like planning for like budgeting, planning for like resource allocation. So a question for us was like, and this is like a big one was given that we were an enterprise product, but we were servicing both like SMB and enterprise. Not so much consumer. What how much money should we basically be giving marketing? How much money should we be giving our like inbound like sales account executives or whatever? And how much money should we be giving our cars? And one thing that I think was really kind of tricky and like everything Vince said is correct, which means that nothing has changed. Nothing has changed, except for the tooling and some amount of sophistication from sales analysts who are a lot more like data driven nowadays, right? Is that like first off, for depending on the product, there is like a huge difference in like. Lifetime value and cycle. So essentially like to [00:10:00] go for and this was like true, essentially like bingo. So for example, like our for whatever reason, like our our North America sales just came in a lot. They came in and they churned out a lot faster.

Speaker5: [00:10:12] Then like, let's say, like a different region. And this was true also in different company sizes. Like, typically like in big enterprise companies like deals take a lot longer. But that's also where you might want like a BTR is a business development rep. So they go out and do what's called the whale hunting, so they go out and go get whales. Typically, like guys who handle the inbound leads, they're typically served more junior. And it's one of these things where, like, there's also different quality leads. So essentially what's up happening is that like, you have to do a couple of things like one, you have to understand, like what is like the market, the marketing to sales lifecycle, you have to understand what is every single touchpoint that someone could potentially come through. Secondly, you need to understand like, are there distinct differences and attributes associated to like certain segments? Because that will massively impact. So like a big realization that like at that time, I was working with like the VP of revenue operations that he really struggled to get across to our finance teams was that so they were doing this very traditional approach of modeling like every single opportunity that came in as like it all has the same attributes, which of course, this is where data science machine learning can help, right? Is figuring out some of these things. But that was just not true. So if you give, for example, marketing, so basically, it's like if we want to produce $10 million in revenue at the end of the year, you don't need $10 million of investment.

Speaker5: [00:11:38] You need depending on your ratio, anywhere from like 20 to one hundred million dollars in like marketing investment. But the fun part is that each of those age out differently. And it's all these things are like, you're not going to get all your leads coming in like in the beginning of the year, you get them rolling. And more importantly, like some types [00:12:00] of deals come in at like different types of year. So what is so what ends up looking like from a modeling perspective is like if you're doing this in Excel and Google Sheets and I built like a bunch of these models, God bless you because it took us like six Google Docs linked all together because we maxed out on the data on each one. And this was way before, like I gone to engineering, right? And it's one of these things were like, then you also have to like, apply in the key stakeholder like, you know, what do they think is going to happen? So. Short answer is try not doing that in Excel, Use, Excel or Google Sheets, as like the illustrative tool to kind of like get people on board as much as possible, try to like, get that into like scripts and all that. But more importantly, like you kind of want to have the conversation of like. How is it going to be used? Right? So, for example, if they look at it and they go like, OK, well, historically we've needed like we've grown like $10 million in revenue like every year and we need $5 million in revenue every year.

Speaker5: [00:13:07] That actually might not be the case, right? Like, they might need something else. I'd gone on about this, but it's less like. It's less of a technical exercise, and it becomes more of a like, how well do they understand their business and like a key stakeholder management, because that that analysis came at a point where they're like, well, is very expensive, the industry is getting very saturated. Therefore, we should turn down the investment. But we had also made like like we had made IRR like targets to our investors. And so it was one of these things really like, yeah, we can ramp up our like outbound or inbound sales guys. They're very expensive. And guess what? They have a quota ramping period, which means that we actually cannot be turning down our like marketing revenue [00:14:00] and because of the lifecycle of that product. If you turn down the marketing revenue in the beginning of the year, you could literally go 10 million IRR below what you promised, and it would be too late to realize it because if it's a product that takes a year to convert people and six months in, you realize like, Oh, shoot, we're not going to reach that you like, you need to let them know. But that's the kinds of kinds of insights, honestly, that get you like, promoted and elevated. Is this whole like, if we don't spend this money now and we don't spend it like intelligently, our company is going to be in trouble. So. Sorry, that's a lot of words. I'm very excited about this stuff.

Harpreet: [00:14:44] Can you follow up questions or comments on that shadow to Joe recent the building? It exploded. Good. Well, this will be immediately available right there on the YouTube channel, so feel free to go in and tune in there to run that back. All right. So if you guys have questions, let me know. Mark got a question. So Mark next up in the queue and then actually after Mark, if there's no questions I wanted to ask Eric about, Eric claims about this post he made earlier this week that I really enjoyed that post. We're talking about getting a little vulnerable there. So you know what hit you up about that? Talk about that. But Mark, go for it. Anyone tuning in on LinkedIn or on YouTube or Twitch? Let me know if you have any questions as well.

Speaker2: [00:15:32] Definitely, so this may be like a very basic question, but I'm trying to wrap my head around so I can explain to our engineering team better the difference between a scheduler like a cron job and a orchestration tool like airflow. And we're kind of like the pros and cons of both approaches. The way I kind of understand right now is like cron jobs are great because they're easy to do and easy to implement. But [00:16:00] as you scale up, as with the startup that I'm in, that's that easiness becomes a nightmare of trying to manage everything. And then for orchestration tools, my understanding is like I have a lot of moving pieces as orchestration can smartly determine what's the best path to optimize these kind of actions and have this infrastructure as code. But at the same time, like, there's a lot upfront to get it set up, and that's my current understanding. But I want to know what I'm missing and what I should be reading. Just be prepared when I present this to the engineering team.

Harpreet: [00:16:38] Jodi won't take this one.

Speaker6: [00:16:41] Sure, I mean, those are great when you need to do exactly what it describes, right, which is scheduling things, so scheduling is very time bound. I think we actually put it in the chat cron job. Say when to run operations, say what and when I agree with this. The Zoom, by the way, if you're on LinkedIn. But so schedulers are good if you have things that need to be on a schedule, right? Obviously, this is a time bound thing at, say, Midnight Job's going to kick off. That's great. So the issue, the challenges with the schedulers is so, so you start one at midnight and then you have another. It starts at 1:00 a.m.. Right. What happens when that one at midnight starts going into like, you know, into like one 15? But the job starts at one kind of depends on the the job at midnight being done. That's kind of the challenges of scheduling come in. You know, and I've seen this where jobs start up taking an hour, hey, they grow. You know, things suddenly start taking like six hours, eight hours. Now you have to do this Whack-A-Mole thing with scheduling, you know, your jobs and stuff, and it just ends up being a gigantic headache and you can't really get ahead of it if you're successful, whereas orchestration, you know, really you can obviously set things on a timer, but you can also set things. There's dependencies. So when this [00:18:00] task finishes this job, that triggers this other job to start right? And it's only when. You know, one dependency is finished, the other one starts, and so you avoid this kind of overlapping time domains and jobs. So because a lot of cases, if your job is producing Data, that another task job depends upon and that Data is not done. And so the job starts, you can kind of see the problem with that, right? And all the time,

Speaker2: [00:18:27] So I know it definitely does, and that kind of aligns with what I was thinking, I guess. Also, I'm just trying understand what's industry best practice, which may be because it may not be because every situation is different, but like I guess, an organization running all in cron jobs, is that normal? Or is that just like technical debt that's been been spaced? Maybe, says technical debt.

Speaker6: [00:18:51] But yeah, I mean, it's normal and it's also a technical debt. But I would say technical debt, it's also normal. So this depends on how much interest do you want to pay on your debt at these current jobs all the time? To be frank, I think it's great because if you don't have systems that depend on each other and it's fine, it's only when you have systems that depend on each other where you get from running into problems.

Speaker2: [00:19:10] So, yeah, I'm definitely starting to kind of not. It's not necessarily a problem now, but I see it on the horizon and I'm trying to get ahead of it. And so I guess like when you've seen orgs like working with these cron jobs and maybe just transitioning to an orchestration tool, you know, what are the typical pain points beyond just kind of like things aren't aligning correctly? That, like motivates them to be like, All right, we need to actually invest in infrastructure for this.

Speaker6: [00:19:37] It's so hard to say it's the words I figured it's just totally patterns, but let's say set up something airflow or dags to a prefect and just kind of, you know, start small and go from there. But your challenges in the road blocks will become apparent very soon as you do this. So the mileage may vary, but certain things are like, you know, obviously packaged dependencies are a big one, right? And then there's [00:20:00] it could be any number of things, frankly. So.

Harpreet: [00:20:03] Surge in the chat, says Dags Direct directed acyclic graphs, right, so they're awesome. Talk to us about why they're awesome because airflow is a damn, well, Joe.

Speaker4: [00:20:15] Joe just spoke to that. The fact that you have dependencies and dogs have that property that if one thing isn't finished, I mean, it has and it requires something else, then you know, it has to wait. And so that's part of the scheduling aspect of it. So things are all tied together and the dogs make sure that they they follow an order. So I mean, I don't need to elaborate further than what Joe White Rice just said, but it's just basically the how that was built with dogs.

Harpreet: [00:20:52] So Kosta been the LinkedIn comet. Cron jobs get complicated when there are artifacts that you're dependent on. Rodney wants to know, So where would GitHub actions fit in Vancouver?

Speaker3: [00:21:10] Yeah, I just wanted to add real quick before we get too much. Every time you implement like a new system, especially if you go to orchestration, you're going to pick a platform where you're going to pick a direction to build your infrastructure out, to just remember you're going to break a ton of stuff as you go. So just be really careful, you know, and advertise the fact that a whole bunch of stuff breaks along the way because you start using it. Everyone thinks it's awesome and then everyone starts using it and everything breaks because kind of like I said in the comments, it's crazy how much runs on Quran. And then, you know, the person who built it all is gone. And so no one really understands it. When they start touching it, everything breaks. So just one huge red flag, it's it's great until you start breaking stuff.

Speaker2: [00:22:00] That's [00:22:00] super helpful, I think I probably wouldn't be the one to implement it or probably on the engineering side, but I'm trying to get the facts and plant the seed now. So a year from now, they were like, Oh yeah, Mark wrote that document, but it's actually fixed this today.

Speaker3: [00:22:14] Yeah, but I mean, you're going to get blamed for it. But like, no matter who develops it, it's your idea. And, you know, after they're done blaming the person that broke it, it's going to be you. And so I just really want to warn you about the side effects of architecture.

Speaker2: [00:22:31] I'll tell him that you told me to build it, so I'll go with that.

Harpreet: [00:22:37] That's a question coming in from LinkedIn, where would GitHub actions fit in? Anybody have any tips for that?

Speaker5: [00:22:44] Kiko, that's basic, but I think they're trying to generalize it so that like so basically with GitHub actions you use like Yamal, like, actually, maybe I think it is a YAML file. So you basically use Yamal to specify like a bunch of parameters and then GitHub kind of like host the thing for you. So it's like the most immediate one. But I kind of feel like they're trying to. Like COVID is a is a. Sub part of what they're trying to do, they're basically trying to create their version of Zapier or what have you, so some kind of like hosted automation. Pretty much.

Harpreet: [00:23:35] Then in the chat on LinkedIn, Co-sleep says I recently found that GitHub actions work better for build artifacts. Complicated dependencies within the same repository are easier to manage with GitHub actions, and you can set them up for more complicated requirements than just something that's time dependent. Yeah, Zapier is awesome, by the way. I love that stuff. That's it's [00:24:00] been quite helpful. I need to get on the paid tier because I've used up all my fitness apps. Not even in the middle of the month yet. Any other questions coming in from LinkedIn or here in the chat? If not, Eric talked about the this post he made on LinkedIn earlier this week. How are you feeling today, man? How are you feeling? How are you feeling after that?

Speaker2: [00:24:22] Yeah, so I have. I have tried over the past couple of years or however long I've been posting on LinkedIn like lean into the feeling of if I act like a real human, people will respond like real humans and that will be refreshing because it's real. And I can confirm. I'm glad to hear it. And so yesterday, yeah, yes, there was a terrible day and I kind of got to the end of the day and I was sitting there crappy and I thought, you know, I might as well just write about this now and write what I'm feeling because I know it's helpful to, like, talk through or get out of what's in my head so that I don't to use Mark Freeman's word, catastrophize it. And and so I started writing things down. It was helpful because it gave me a little bit of perspective of like, Well, that's not actually what I was thinking and rewriting, you know, and getting that out there. And then I thought it would be nice to share it now while I'm while I'm feeling what I'm feeling, hopefully after having written something and read it over to make sure that I'm actually saying what I'm feeling, instead of just like throwing out feelings that end up not being very coherent. And then instead of waiting until I had a success story that I could post, and I have everybody love it and see that I overcame imposter syndrome, right? Which, you know, I don't know if anybody really okay. Some people will say that they overcome imposter syndrome, but [00:26:00] I think that to a small extent, pretty much everybody experiences it at some point and probably repeatedly, judging by a lot of the comments and things that I've gotten and it's been really it's been really interesting, really cool to see what other people have to say and a lot of good ice cream flavor recommendations. Anybody's interested in that as well. I don't know. Is there anything in particular that you want to call out or talk about?

Harpreet: [00:26:26] I just want to see how you're feeling after that because, you know, you're talking about you'd write your. You mentioned you write your feelings now while you still feel feel lousy and then try to come up with the story of how you overcame it in the morning. So have you overcame it? Do you do? Do you feel any better?

Speaker2: [00:26:43] Yeah, so I spent the day I spent the day working on one of the things that I screwed up or, you know, on my missed deadline for my analysis, and it ended up taking forever because, you know, quick Data pull turns into crap. There's not even this Data isn't even in this database. And now I got to join these two databases and bring it on, you know the story, but got it done by the end of the day. And so, you know, it's I think, you know, feeling better, but I can't wait for I'm glad it's the weekend. I'll take that too.

Harpreet: [00:27:14] Yeah, yeah. Definitely great. I've been, you know, speaking that passage and everything, just like even just trying to learn something new and like, it's like, Oh man, I feel like this is something I should already know, but I don't know it and I'm learning it. And I'm like, OK, this is tougher than I thought it would be, right? It definitely is a disheartening feeling. But then it's also fun at the same time, like all this week, I've just had a consistent, I won't say headache, but there's just felt like there's like a knot in my brain from like I just been studying data structures and algorithms. Because why not got to do it right? Got to get you to know this stuff, by the way. Recommend this quick read. Corey tough self-taught computer scientist. Good overview of [00:28:00] topics just to get vocabulary. Just to have the vocabulary, because I think that's super important when you're learning something just to know the names of things so you can go dig deeper. Eric, thank you so much for for sharing that. Marc mentioned fish food, ice cream. I don't know what that is. What flavor is that?

Speaker2: [00:28:19] Imagine they took every single, sweet, chocolaty thing, and it's pure into a Ben and Jerry's tub of ice cream, and it's amazing. Nice.

Harpreet: [00:28:29] Definitely have to try that.

Speaker6: [00:28:31] The band, though, not like the aquatic creature.

Harpreet: [00:28:34] Oh yes, that was named after the band.

Speaker2: [00:28:37] I never realized that, but it makes complete sense.

Speaker6: [00:28:40] Yeah, yeah, it fish. And they also have like Cherry Garcia from the Grateful Dead. Yeah, but like that flavors. But yeah, they're uh, it's Ben and Jerry's ice hippie ice cream, right? So.

Speaker2: [00:28:50] Yeah, 100 percent. Chocolate fish in there. It's good stuff.

Speaker6: [00:28:56] It's good after, you know, you have other things to, you know, it's a warm up, so yeah.

Harpreet: [00:29:04] All right, any questions coming in from anyone here? All this one coming in from coast up. On LinkedIn, actually, I don't know why he's not here. He said I've seen GitHub actions used to promote model experimentation, especially with DVC and CMLL. Have you guys tried it? No, I tried Comet Ml, so instead of trying those things, you should try Comet amaO. It is a great solution coaster. Definitely. Check it out. I've got some tutorials up that you can peek at questions, comments or anything of that nature. Let me know Shameless plug in and say, Look, man, do I need to pay the bills? But no, no, it's a great product. I feel like we're solving a lot of problems and I'm very proud of the work that we're putting in at the at the team. And I'm excited to be doing all these presentations over the next couple of weeks, talking about imposter syndrome and going out there and doing presentations at these huge conferences [00:30:00] when I'm like, I don't, you know, I have to convince myself that I think I know what I'm talking about as tough man. It's really, really difficult. Like being on these like these huge won't say huge stages, but just wide audiences. I don't know. It's it's it's a positive thing. Keep in for sure. Very often. All right, let's say let's keep moving my questions, comments and think, search what's going on. Any any questions or any questions?

Speaker4: [00:30:32] Yeah. Not really. I mean, I came unprepared too. Yeah, yeah. I was hoping, you know, just I just like coming to these things, hearing what everybody has to say, but I don't often have too much to say. I mean, it's been a slow week. It's my first week back to work from after a vacation. So yeah, I'm still I'm still not completely in the, you know, work zone. I should be. Yeah.

Harpreet: [00:31:06] Mexico. How you doing?

Speaker5: [00:31:09] I'm doing better, but it was nice seeing Eric's post because I'm like it was a little bit of a rough week. It's been rough. Last month or two. So a lot of changes in my work area and. Trying new things. Some of those new things haven't worked out, so, you know, but I think this is cool. Yeah, well, I do think it's like it is one of those things of like if you're. It's funny, I'm wondering, like who the traditional candidate the traditional engineer is anymore, frankly. But. Yeah, I feel like I spend most of my time feeling really stupid. Yeah, and every time I read more books, I feel more stupid.

Harpreet: [00:31:59] You know [00:32:00] that the more I get exposed to, the more I'm like shit. It's just like, there's this quote. Marcello Glacier has this book called The Island of Knowledge, and he says the bigger that his island of knowledge gets, so to the shores, it is ignorance. So I thought that's just like a powerful quote, like, you know, you got like this island knowledge. And even though the island is bigger at the same time, what you don't know starts getting bigger and bigger as well. Powerful. Powerful.

Speaker6: [00:32:26] Just don't read anything.

Harpreet: [00:32:28] I just don't read and don't.

Speaker6: [00:32:32] Yeah, I know. I know some. Some of the happiest people I know are the ones who they don't fill their head with a lot. They're smart, but they're just like, Yeah, I don't need that.

Speaker5: [00:32:42] How happy are you, Joe? Are you one of those happy ones?

Speaker2: [00:32:44] No.

Speaker6: [00:32:46] If on that scale I make like hyper depressed because I just read way too much and learned too much. That's always been a well. Well, that's a character flaw, I suppose. So.

Harpreet: [00:32:58] What I've been doing the last what about over a month or so is like usually I do listen to a lot of audio books, you know, when I'm in the shower and I'm, you know, at the gym when I'm going for walks or whatever, just listening to audio books. And they're usually nonfiction, you know? Like Nassim Taleb, you know, shit like that. But recently, I've been just reading like listening to other science fiction. It's kind of. Not just having that constant barrage of information and just trying to unwind through that, and that's been really fun, so

Speaker6: [00:33:29] Have you have you reading a snow crash right

Harpreet: [00:33:31] Now? Yeah, yeah. Snow crash is cool. I'm still I'm still about halfway into it because it's dope. And that coined the term, the metaphors. And apparently he was. Neil Stephenson was on Lex Friedman's podcast, so I'll definitely have to check that out as well. But before that, like earlier in the month or later last, you know, in December, I was reading a re listening to Kurt Vonnegut. I remember reading that book back when I was like in high school and it was a Galapagos. That book was was great. But yeah, still, crash is awesome, [00:34:00] man. So crashes. It's a good book, man. I'm enjoying it.

Speaker6: [00:34:06] It's like about that book, too. Is it sort of tells you when you're when you? It's funny reading science fiction, especially kind of short term science fiction, you know, like something that's to happen on the near horizon. So Cyberpunk is a really good example. So Neuromancer does this question, does this a lot of other books in the era do it where it's like everyone's using tapes? But cassette tapes, videotapes. But it's some sort of a tape thing and you're just like that, you got some of it right, but it's like the medium you always try and think later, like, well, no, who will always use tape, obviously. And then, of course, you're not so. And it's kind of interesting.

Harpreet: [00:34:43] So crash came out like in twenty just twelve thousand ninety two, really? Oh, snap. Yeah.

Speaker6: [00:34:50] I bought it first copy when it came out.

Harpreet: [00:34:52] And the other book you're saying is called

Speaker6: [00:34:54] Neuromancer by William Gibson. I think it came out in nineteen eighty seven.

Harpreet: [00:34:58] Yeah, that's a check that one out as well.

Speaker6: [00:35:01] Yeah. Cyberspace came from that. That's the book where cyberspace was coined.

Harpreet: [00:35:04] Nice. Definitely. Check that out. Yeah, it's been nice listening to just like science fiction, it's been a good kind of a. Distressed to disconnect her, by the way, there's a question coming up from coast up. What's your ratio of one off models versus long term running models? That's a good question. I was listening to a it was like a presentation by I believe it's the head of data science at Slack. I can't remember his name. I want to say Josh Willis, but I don't know if that's right or not. And in his argument, I thought it was really good. Argument was that if you're building one off models for something, then the problem you're working on wasn't even that important to begin with that when you're building machine learning models, they should be constantly deploying, constantly doing releases and builds and things like that. So I'd love to hear about that with other folks that think about it. Let's go to Vin and then Serge. What's the ratio of one off models versus [00:36:00] long term rating models?

Speaker3: [00:36:02] I don't I mean, I get why you do one off models, because that's where I mean, it's almost like where you start. If the business is at zero, it's it's yeah, mechanical say, yeah, I'm not sure who said it first. I was kind of like a jinx moment, possibly through text and me saying it. But yeah, everything starts off, you know, especially at low maturity companies. Everything's a one off because there's no place to put it, you know? And so if you don't have a production environment that's stable and you don't have a release process, that's that's repeatable. Yeah, everything's a one off. So I kind of get that. You know, there's a necessity piece of it, but the faster you can get away from that, the better. Because every time you do a one off, it's going to end up being way more work in the long term because, you know, for every five one off models that just kind of go some place and die, one of them doesn't. And it gets deployed in the handiest in worst fashion and then you're on the hook for supporting it and eventually replacing it. It's one offs are horrible. I mean, if you can get to one hundred percent long runs, that's way, way better. Your sanity will increase significantly. But there's so much, there's so much maturity that has to happen. First, that, like I get from a practical standpoint that you know what I'm saying and what's possible or sometimes two different things. And if you're just not there, I mean, you're stuck with it, just realize it's painful. But the more time it takes you to get from one offs to everything is fairly good practice. I'm not going to say best practice, but at least fairly good practices and a standard release process. It's it hurts and it takes way more time than it should.

Harpreet: [00:37:50] And it's a great follow up question, because I'm thinking the same thing with Coast here, an example of one off model, so coast coastal, if you can clarify that a little bit by one off model, [00:38:00] what do you mean? One offers long term running models. My understanding was one off model is look like I'm just going to build a model and I'm going to get some predictions or solutions, whatever, and then pump that out and just be done with it. Or does that mean just build one model and then not even account for drift or anything like that and just let the model degrade and and just have that thing sort of predictions? Let's go to a mark. I see your hand up. So Mark actually sorry, I said surge first. Let's go to Serge, then Marc.

Speaker4: [00:38:33] Go for it. Yeah. Like at least where I work, like you, basically you everything starts. Like Makoko said, it's a one off. You have a proof of concept and you test it against historical data. But you know, it doesn't stop there. You still have to deploy it to test it live. And in agriculture, you're talking about, you know, it's seasonal and everything is a year, right? So you pretty much still have to validate it for a year before it becomes a cyclical thing. So, and by cyclical, I mean the, you know, have to train something for the next year and then you have to do continuous training throughout the season if you want in-season predictions. And so, yeah, it is a very long cycle. So in a different industry, it might not have that same pattern. And so it's it's it's odd in that sense, but I do think that most companies, you know, have some kind of process where they build a model once they still have to prove it before they put in the effort of actually making it, turning it into something that's continuous integration, continuous training, checking, model drift and all that. Eventually, I mean, hopefully that's something that is taken care of by a system, and there are already [00:40:00] systems that do that that take care of observability and so forth. And you know, it really depends what the needs are of the company to to make that effort. But I think it certainly kind of a drag, as Vince said, you know, having that like two kind of approaches and then reconciling between each.

Harpreet: [00:40:26] Yes. So just I just want to reference that just the the talk I was talking about, I'll drop a link to the talk here. But just quoting the the head of Data science, that's like he's saying at Slack, we try to publish a new search ranking model once a day. Roughly speaking, that's our goal. We are iterating just as fast as we can. Try new features, trying new algorithms, trying new parameters. We're always trying to bring new models into production just as fast as humanly possible. In fact, as a design goal, building an assembly line for building models, building as many models as you can has all kinds of dividends and advantages. And then, he says, don't ever do one model in production do thousands of models or zero models. If you're working on a problem, need to deploy into production, but you're never actually going to rebuild the model. That's a strong signal that this problem is not actually worth your time. I thought that was super, super powerful. And definitely what

Speaker4: [00:41:18] If, if I may say something about that? Yeah, yeah. Like Slack, obviously, I can understand why it's it's something that constantly because it's, after all, not social media, but kind of in that sense that it's always on and it doesn't necessarily follow like a yearly approach like it is where I work. But I do agree with making many models, and I do find it like kind of counterproductive where I make a model for one country, for one crop. If I can do the same thing in assembly for a bunch of different countries and validated all at once. Especially if I'm waiting a year, you know, so might as well [00:42:00] have results for everything rather than do it once. So I do follow that assembly approach where I work in that sense. Unfortunately, as far as like the timing goes, it's still like a year long wait to actually see conclusive results.

Harpreet: [00:42:18] Mm hmm. So Costco is in the building. I chose to go ahead and clarify your question, and then we'll go to Mark and then Kiko.

Speaker2: [00:42:26] What I was kind of trying to ascertain is like, so in my world, which is more like robotics, computer vision, a lot of the problems are more at a consumer, at a consumer level. So you've got consumers trying to identify things through their iPad camera, you know, out of the field versus a lot of the time. In conversation with other data scientists, I find that it's more about answering questions to business leadership. So the models and used by business leadership, or they're kind of the end user, it's kind of an internal and use as opposed to an external end user. So I was just trying to get a feel for, you know, there's a I think there's a is there a difference? Do you guys perceive a difference in terms of the, you know, managing model uptime managing, you know, the quality of the end product, the usability, the end product? I'm just I don't have much of an experience in in how you guys glue that gap for us. We've got to have a really tight front end. We've got to have a really tight, you know, user interface or within the robotics world. We've got to have a really tight interface with the rest of the robot and the rest of the robotic system, right? How does that reflect from like a business data science perspective? It sounds like there's less of an impact on the on the life span of the model. There's still the preference to go for long living models that continuously run on data coming in. But how do you AIs glue that together like? Is that different to a product or is that the same?

[00:43:55] Mark, go for it.

Speaker2: [00:43:58] So I can't really speak [00:44:00] on having models in production continuously because I'm at a startup getting it Data wings going. But I will say something that this reminds me of in the programing programmer is one of my favorite chapter sections was the difference between prototypes and tracer bullets, where prototypes are essentially like, you build it. You show like, Hey, this is something we can do. This is awesome, but your goal is to not use it in production, just like, scrap it, right? As for tracer bullets, you're trying to go something end to end as quickly as possible and then iterate consistently on causes so like the slack kind of thing going on. And so that's what this conversation kind of remind me of. And many times I feel like data scientists fall into the trap of, Well, I did this quick thing. I have so much on my plate. This work just great. So I'm just going to throw this in and we'll update it later. And then I'm in the situation. I've been described where it breaks and I'm on the hook for it. And so I think it's really dependent on kind of like your use case. Like once you described it more, I was like, Wow, that seems like a completely different set of criteria that have been considered for for robotics and then to provide kind of like a real-world example of like this happening, especially on the startup side, like we're trying to find great new products to have product market fit. And so for for me, we'll have hackathons on our data science team like, Hey, here's a potential product direction you want to go and I'll create like hacky prototypes with the goal of is doing something in two days, and the goal isn't necessarily to build something put in production. The goal is to build something that'll get the product team excited and the company excited for potential direction. And then once it gets picked up was like, We want to prioritize that, then I do the full kind of end to end integration.

Harpreet: [00:45:49] Mark, I'm curious, what is it that you have that's like automatically zooming in and zooming out on your camera? I need that.

Speaker2: [00:45:56] It's the insta360 right. It's like [00:46:00] a has a webcam feature, but it's like a GoPro. But the only challenge is that I use my hands a lot like follow my hands, so I'm still learning how to use it properly. I might turn it off.

Harpreet: [00:46:09] So speaking of, you know, the pragmatic programmer, I did do an interview with Andy Hunt, coauthor of the book. There's a link right there in the chat. Go ahead and check that out. As great conversation. I really enjoyed chatting with him. Michael, go for it.

Speaker5: [00:46:24] And Joe just jumped right off, so I was going to ask a question. Oh, man, because I I guess like I didn't quite understand the question, but probably because like in my head, like what I feel like, what we're struggling with and I haven't seen a clear cut answer is like, how do we? Like, how do we allow like the innovation and the experimentation in the modeling to happen? Well, like there's a balance between like making sure things are ready for production and not killing the experimentation. And that's something that we really, really struggle with, and I feel like I've seen a couple of different perspectives. And so when she had this blog post where she's like, Data, scientists should not learn. Shouldn't they should not have to learn Kubernetes. And she mentioned about how there's kind of like this, like leaky abstraction between dev and prod. And it's like something that we're legitimately struggling with, because I think. The way, like some companies have treated, was a little like they are like, OK, we'll take these like Jupyter notebooks or whatever, and we'll go ahead and wrap it and then you can run it as like a production instance. And my personal opinion is we don't want that, actually, because there's just a loss. No, we don't want that, especially when it comes to like handling like big Data like, [00:48:00] oh, sorry, not big, the big large volumes, high latency.

Speaker5: [00:48:05] We don't want that. And also to we need to have checks in place. For example, we do like email marketing or we enable small, medium sized businesses to like do email marketing. We don't want to be participating in like misinformation or like bias of some kind. Right. So we do need to have certain checks. We do need to have monitoring. We do need to have like online performance tracking. And it's like something that like we're we've personally struggled with. We've had to kind of just draw a line understanding that it it's a line that might not work for every company. But I am, like, really kind of curious as to like how people have managed that like Dev to prod. That bridging I feel like that's the goal in question, frankly, in I don't say normal ops, but in literally like analyzing ML. So how do you bridge that? Honestly, because we look at it like totally every model's like a one off portal until it gets into like the 60 I like, you know, until it gets orchestrated. All that everything is a one off model. But still like how I'm curious how people have managed that transition or like if they have opinions about that. So yeah,

Harpreet: [00:49:23] It's actually so the talk I'm doing at oDesk on Tuesday, it's centered around that theme ML system designed for continuous experimentation. And we're talking about there's an artificial split between development and production environments. And we're looking at case studies from Tesla, some dating app and Slack as well. So definitely, I don't want to spoil it. Just come and listen. It'll be short half an hour, but then go for it.

Speaker3: [00:49:53] Now, this is when you said that's the golden question, it's like, yeah, you nailed it. It's one hundred percent what a lot of companies [00:50:00] are trying to figure out, and the way it really works comes from a number. The way you end up comes from a number of different perspectives because, you know, there's the age old Data. Science doesn't always result in something, and you've got to manage that. And so that's kind of tied in to what you're saying, because sometimes you don't end up building something that's productized and you have to do some work and go back and make it better before you can actually put it in production. And then when you start looking at research actual like ML research, that doesn't always result in a product. And so from an engineering standpoint, you know, you guys can't be spending time as MLB's team or animal engineering team try to productize something that's never going to end up like it's going to die on the vine because the research ends up getting killed at some point. And then there's really the dichotomy between different types of data scientists, you have data scientists who are really research focused. Yeah, they can write code, but nobody would want to put what they write in production, ever. It literally is sometimes death to a production environment because the codes that that but they're really smart and they create models that are like, you can't you can't really duplicate that.

Speaker3: [00:51:14] If you have such a hard core engineering background in some cases, then you have that that ml engineer, data scientist who the code is beautiful. It makes you want to cry when you look at it. And what ends up happening is you break these two off. You have a model development lifecycle and you have a research lifecycle. A research lifecycle happens before you ever even put anything on a product roadmap, because that is the only way it works, you know, because that's not something you as a team have to look at and say, How would I even operationalize that? You know, you don't have to worry about it until there's something there to operationalize. And so that's where you allow innovation to just go crazy over there. And then from a business standpoint, there's all this stuff you have to do [00:52:00] to to, you know, manage and oversee research so it doesn't go crazy and actually gives the company money. But that's like a totally different thing that if you break the two apart, you have such a happier team because the artifacts of research then get published, you know, and it's not a paper, it's a model, it's data set or it's some combination. Usually it's a lot of models and data sets, but that gets published. It gets reviewed by that research team and then it gets handed off. And that's something you can actually deploy like.

Speaker3: [00:52:30] That's something you can now plan for. You can figure out where to put, you know, best practices around it, and you don't stop innovation from happening because that happens in a totally different lifecycle that you don't care about. And it's only once the publication happens that you look at the artifacts and you say, OK, now I've got to care about this. Now there's products that we're going to be building around this. Now there's revenue booked against, you know, it's just all of that business stuff shows up, and that's how you keep best practices kind of segmented on the other side of the fence to the business doesn't go out of business, and you don't end up doing all the worst things possible. But at the same time, you still get that cool, innovative, you know, leading edge type work and you can still do all the regulatory compliance pieces. And really one of the biggest parts of regulatory happens at the experimental at that research phase because you have to like you. You don't realize, well, most people don't realize that most research has to go through an approval process like you can't even do the experiment until somebody on a board signs off, like you get an ethical review of this and that you're not going to mess with somebody's head and so much of the marketing experimentation that gets done. You can't do that without approval,

Speaker2: [00:53:53] You can't do

Speaker3: [00:53:53] That without reviews. And so when you talk about like the regulatory and keeping the world safe, you know, from I [00:54:00] half of that experimental review research review process and like you don't in your side of the world, you shouldn't have to worry about that. You shouldn't get a model that's finished and ready for production and you've been monitoring it for a while and you go, You know what? I think this is really going to do some horrible things to people like you shouldn't, and the ops team should not be the one responsible for that.

Harpreet: [00:54:23] That's not.

Speaker3: [00:54:24] It's not OK. And that's really in business. As what happens when you don't separate these things out is now your ethics team is responsible for like stuff that can't be responsible for. No, no way of doing it. And so that's really the like the the five and a half minute answer of how you manage innovation without killing it.

Harpreet: [00:54:47] I didn't think you could go for it.

Speaker2: [00:54:50] Man, that's that last bit about the MLPs and the engineers being responsible for the ethical aspects of it, it gets really complicated, particularly in niche areas like I'm sure people anyone who's in the autonomous vehicle space is probably battling that entire conversation right now. Personally, I've seen it in the defense robotic space, especially because that's like. What 20 people in Australia are doing ML for defense robotics right now, right? And it got really complicated, like it does get extremely complicated to figure out should we do something and how do we do something? Two very different questions, right? And while there needs to be crosstalk between those communities, I think it's. It's quite hard to establish when your typical robotics team are quite small, you know, like they're very small, they're very, very specialized. So it's very hard to find a diverse group of people to even fit into that kind of ecosystem. And particularly in talking about defense, you then have the complication of not being able to hire people from international. So that brings in a lack of variance and perspective. And [00:56:00] I think that can have an impact in the ethical outcomes that you kind of agree with, right? But back to the back to the point, unlike how do you how do you split that, Deb versus prod kind of thing that you are talking about? What I've kind of seen is it kind of goes through a bit of a natural cycle and you have to let it go through a natural cycle of initially it's just data scientists just experimenting with something. And then you start like a pilot process, which is frankly where, like a prototype turns into a traceable it.

Speaker2: [00:56:32] Like, we hope that we design this traceability correctly. But nine times out of 10, let's be honest, it's a prototype that is in way too deep, right? Like just way too deep. It was never designed for it. The code wasn't designed to handle the kind of scale that your typical for team would even enjoy because they're sitting there going, Why are you rebuilding this massive container every time you change one line of code, like there's no container optimization, there's no, you know, CIC PhD built into it a lot of the time. And then you go through that cycle of saying, OK, now how do we operationalize that and make that bit more efficient and tied us to that production is more reliable? So you're going to go through that kind of cyclical cyclical development cycle and the complications that we're kind of answering along the way is it's because most other engineering and technical fields in the past have had a very clear line between I'm designing the product and building the product, and the product is finished, right? You see that lesson software because you're seeing that cycle go through more and more. But even in software, we've managed to design software engineering as this design phase deployment phase, right? Like there are two separate linear like like logical deliverable entities, whereas the Data science field, it's just different because the situation in which you use that model is going to be very different, right? The way in which we like versioning Data, I think we've started to answer that [00:58:00] question on how do you manage Data versioning? How do you manage model versioning? I think we've got a much stronger grasp on that now than we did say four or five years ago, right? I think that's improved in a long way.

Speaker2: [00:58:10] But there are still these other questions on how do you maintain the cycle of experimentation like you see some more mature companies and the way that they do data science, they end up going through that cyclical cycle and then figuring out that some of their people are just constantly in that research space and then they end up splitting these teams naturally. So in a sense, I think we naturally organize and any reasonably high performing team naturally organizes into that kind of style. And none of them seem to do it in the same way, which is the bit that really intrigues me, right? Like whether you look at Nearmap in Australia, you look at the way like Google's AI team sets up themselves, you look at the way DeepMind sets up themselves. They all have very different flavors on how to get it done right. So, yeah, it's it's still that that human organizational piece is actually what we're talking about in in how do you separate dev versus product, who's researching, who's operationalizing, who's deciding on the ethics of it, who's deciding on the market value of it. And it's complicated, particularly in the product space. I feel, because there's a I think there's a deeper investment in the product space before you get that return.

Harpreet: [00:59:23] Is it like an artificial? Split that's carried over from the software engineering world into machine learning, this split between development and production because typically in software, you deploy something and the button works and it does what it does. But it's not like that with machine learning because you deploy something that's serving predictions, recommendations, something that's influencing end user behavior that's changing like you're changing what you modeled right with your actual model. So you have to continuously be. Being that that cycle, [01:00:00] you know, continuously experimenting. I don't know like any thoughts.

Speaker5: [01:00:06] I mean, it's interesting because I'm on a software engineering team that's trying to help people get animal models into production. So it's kind of like we're the place between the Hard Rock and the God. What's that phrase thing? We're that they please rock and a hard place? Yeah, like it's I mean, it's essentially like if you look at the Google like Maps whitepaper like and they're like, what's level three or level, whatever maturity, right, like ml sicced. Like they're like, This is experimentation. This experiment, this is like. Like experiment Dev Pride, but like even then, I'm like, hypothetically there, you could argue that there's a difference between experimentation and dev. Right? Because in a way, Dev is does the code work as opposed to like, is it experimentation where you're trying to figure out what code should I be creating? Well, yeah, I don't know. And I think that's something we're trying to figure out because it's like at some point we know it needs to have like a unit test. I'm sorry tests in general terms, it needs to have tests in general. It needs to have Data validation like there needs to be logging. The code needs to be well organized. So then it becomes a question of do we put that burden on like the Data scientists to do? Do we create this new MLW engineer who does it? Do we create a platform? So I think like we're trying to figure out what that question is and it's interesting because.

Speaker5: [01:01:46] Like we've like as a team, we've done a ton of research we've looked at like the email ops community and people have different ways of going about it, right? Like you can do a cookie cutter templating thing. But guess what? There's problems with that because now you're maintaining an internal developer tool, [01:02:00] you could put the burden on data. Scientists will guess what you're then asking for unicorns and those are expensive. They exist, but they're expensive. Or do we do like the supply, like the assembly line thing where each person has a function? But then guess what? You get silos. So it's kind of like. Uh, it's almost like that edict that Edith Wharton. Quote that she has, like all families, all happy families are the same, all unhappy families are different. So it's I think everyone kind of comes up with, I guess, something that works, so maybe the answer is we just try to come up with something that works and then just try not to worry about. About whatever.

Speaker2: [01:02:47] So, yeah,

Speaker5: [01:02:49] But it's it's something that I feel like it is a challenge because once again, like I don't know if taking a jeep or notebook and just putting a wrapper around it like great, we'll create a rest end point. I mean, is that because you can do it, should you do it and. You know, it's yeah, right, like everyone's like, no, it's funny when you start asking, like in my ops community, there's a few very strongly opinionated people who are like everything should be engineering from the get go. There there's a few those and it's like, well, you've also been in the field for like 20, 30 years, so you could probably engineer your way out of the bottom of like a tunnel. Not everyone can do that. So, yeah, it's something that we think about a lot. It seems like the production part that's good. It seems like the experimentation part. People got their pandas, they got their ticket, learn, they got their Tensorflow kitting PyTorch, you know, open. They got that stuff. But it's like when you start bridging, then that's where it's like, Oof, [01:04:00] that's where people got feelings. Very strong feelings.

Harpreet: [01:04:05] A closeup. Hands up. Go for it.

Speaker2: [01:04:08] Yeah, I guess, like so in my recent experience, like it's been more about production rising to robotics and then in the last six months, more about production AIs in the cloud endpoints. Right now, what I found weirdly is in the robotics space we don't really use like containerized models that are deployed, right, because they're not as used to that process, right? So the idea of engineering your way through that and I had I gone back with what I know now, I would have containerized the model and then deployed it slightly differently, even on board the robot right hosts like a local Docker image with with a containerized model there and then call that from your, you know, Ross middleware or whatever that is. So that whole thing changes based on how much you're absolutely right, how experienced you are as an ML engineer or an ML ops engineer, or even as an infra engineer, a lot of infrastructure engineers bring a lot of that in-house knowledge. And how do you engineer the solution, right? But what I saw there, that company and what I'm seeing here now where I'm at, is that these things are all kind of layers on top, right? So you always start with that experimentation thing because if you're starting from the pure, I'm going to engineer the full solution here.

Speaker2: [01:05:18] It's it's you're going to get it so clogged up in trying to engineer the perfect solution that you forget that either that edict of good is better, like working is better than perfect, right? So being able to layer that on top, that's what makes like. I think that's what makes a successful company in this case is being able to experiment first and then say, OK, now let's wrap that around with a layer of code. Maintainability and unit tests come in just for the peripheral functions to make sure that, you know, as we develop, it's going to continue serving the purpose of the larger code base and then going through the model deployment process and all of that. So these are all layers that you wrap on top one on top of the other and that when you think about it that way, it's kind of that evolutionary [01:06:00] cycle of the code base, right? So should you engineer it from the get go? Probably not. Should you add layers of engineering as early as possible, given what you know? Yeah. Like after you've depends on whether you're in a proof of concept point or whether you're, Hey, we need to start like deploying this to production, right? So you've got to like knowing when to make that call, I think is the most powerful part of it is to say, Okay, let's start improving it.

Speaker2: [01:06:25] But one of the big deterrents in all of this is the unit testing and the testing side of it. Now, it's very easy to unit test a single function that does something right. But then at what point do you say, OK, we've got to keep strong, like cyclic testing with the Data in in pride. Like, for example, if I'm designing something for a robotics field, how do you even get the the labeled Data for that, right? Like just to do that, it's scale. It's such a mammoth task that if you get caught up in trying to do that, you're never going to have the proof of concept robot in the first place. You're never going to have the, you know, running working prototype in the first place. So you have to kind of wrap that around as you go. Each pass that you make improves them, right? Yeah. Like, I used to work in a motorsport team at university and like our principal engine was always like, Hey, guys, remember evolution, not revolution, right? So if you're constantly evolving what you designed before, it's eventually going to become, you know, Mercedes-AMG or, you know. And last year, this case, Red Bull. Interesting, so actually, so I guess,

Speaker5: [01:07:31] Like it kind of makes sense sort of kind of toy round question, do you robotics companies like have microservices? All right, that sounds like a really stupid question, and it probably is, but I was I'm like, I've actually only worked in consumer SAS companies where it's like, Yeah, we do have microservices actually partially because we are kind of spinning of different models in different applications. But I'm curious, do you have ever boxer?

Speaker2: [01:07:59] Depends on the [01:08:00] I guess it depends where the company comes from, right? Like so now if you if you look at say, well, anything that Google is doing in robotics, I can almost guarantee that they've got some element of microservices in there. They've got served code through containerized, you know, things like that. So but whereas like a lot of robotics shops, comes from a pure robotics background where most of them aren't actually trained software engineers, but they're designed towards embedded programing. The designed towards robotic design, which is a very different style of programing and you're thinking mostly of. It's very similar to using pub subsystems all the time, right? So they're very keen on embedding the model like, I've made this mistake before, and I don't think it's a mistake. I think it's like a almost a development step where I embed the model directly into my node on that. Like, it's kind of like a cloud function, right? But it runs on the robot. So I embed the model straight into the code. So for me, deploying the model was updating the the the file, updating the same model weights and then just re initializing the node that. Sorry. So that's why for me, it was like slightly different, so then when I started working with more cloud based systems and more deployment based systems, it turned into this thing where I'm like, You know what, if I containerized that as a Docker container that gives me my dependency management and all my node has to do is then just ping that one local like instead of pinging like an endpoint that's hosted by GCP, I'd have to ping an endpoint that's in a local Docker container, right? So that's how you kind of keep it on a robot without needing to host? Well, they can't see online.

Speaker2: [01:09:39] I've said this a number of times on this, on this chat is robots don't have internet when they're underwater, for example. Right. So like plain and simple, you have to come up with a slightly different solutions to it so that microservices, that experience and microservices that traditional software delivery processes is kind of different because there's so much more mentally [01:10:00] closer to the hardware. And it's just this perspective shift, right? And it took me a little while to get my head out of that bit of hole, but I think we can take a lot of these. Microsoft's deployment practices back to the robotics field and actually make a huge impact in how well it's done. And I think some of the major software companies that are exploring robotics, they have that in-house natural knowledge, right? You'll start up robotics companies that are, you know, a couple of guys like me that did robotics at university and want to build something cool. We don't have that knowledge in-house. We kind of have to pick that up as we go. So this is where I've seen that layers of information and knowledge grow on top of it.

Harpreet: [01:10:37] Yeah. I'm wondering for for the robotics case, like you mentioned, deploying just right on the robot itself, like, is that any different than the deployed on, you know, whatever the model is deployed on my watch that is able to pick up what exercise I'm doing. For example, rowing right or the model that's on the phone, it's the same kind of concept of deployment.

Speaker2: [01:10:59] Well, yes, yes and no. So. Yes, in the sense that you could do it that way. There are a lot of things on your phone that you rely on external endpoints that you know you're relying on an internet connection. But yes, there are absolutely models that are stored on your phone and that run local to your phone. There are things that train on you, like federated learning models that train on your individual device and so on. Right? So you can actually apply some of those principles to the robotics field. What we have? Like I said, what we've seen is that kind of thinking comes from companies that have experience deploying on end user wearables and things like that, mobile devices and then transitioning into investigating into robotics. Now, I'm sure if you talk to Apple's robotics research team, they're going to do stuff like that. They're going to naturally have those skills where they can say, Hey, I'll wearables team probably know how to deploy this because they've deployed models in the wild, right? Let's do that in the robotics now. If you talk to like an ox bodyguard or you talk to, you know, any of those kinds of companies, it's that knowledge has [01:12:00] to mature and you have to recognize that that's a problem in the first place in order for that knowledge to mature.

Speaker2: [01:12:05] So that recognition that hey, actually, there's an idea out there that can serve us in the robotics world. Yeah, that's that's kind of what I'm trying to bring together with the communities around me is how does the like we need to talk more between the Data science world, the machine learning world and the robotics world because we just there's just this lack of understanding of what's possible out there to deploy, right? And I think you can bring a lot of benefit. Yeah, you could probably do it in very much the same way. You will have some differences because the kinds of platforms that you're building robots on, the complication is that they're very often unique. They're very often very custom. Hardware like Nvidia's designed their own custom hardware for running what's called just for running their autonomous vehicles right the whole drive Pegasus systems and all that stuff, they've had to specialize for that space. You see similar stuff with robotics. The only difference is there's much higher upfront capital investment in robotics, right, that we haven't seen the scale of companies yet because they're not able to consumer, like, bring it to consumers quite yet. So those are the challenges, right? The sheer cost of it is is not letting people scale enough that they say, Hey, we have the we have the bandwidth and the maturity to take on some traditional software engineers, right, who can help us build this product out there kind of force to that corner where they're like, Yes, but shit, we need 10 more like two more robotics engineers that can, you know, do specifically robotics stuff that a lot of software engineers may not have had any understanding of.

Speaker2: [01:13:40] Right? And that space programing for robots is very different and very complicated. And frankly, even though I'm from that space, there's a lot that I don't know about it, right? So it's it's one of those things where eventually as that, like the like the autonomous vehicle industry, as that industry [01:14:00] matures, you're going to see more of that transfer of knowledge from mature industries come into it. And I think the autonomous vehicle industry is the closest to the robotics space. There's a lot of learnings to be done between those areas. Mark, go for it. It's kind of hard to follow up the awesome robot talk. That was really cool. Well, I want to learn more about this. I want to kind of go back to Mexico's question and talk about like notebooks, wrappers and stuff like that. And it reminded me a conversation I had with Zach Wilson, who's over at Airbnb, but before he worked at Netflix, and it was then I was talking to him.

Speaker2: [01:14:40] I was just learning about microservices and like, what that's all about. And what he was describing to me is that microservices make sense for Netflix because of their culture. Their culture is very deeply rooted in being like little mini startups all around. And so they have to use microservices to keep everything together because talking to other teams extremely hard. See this. My hand is moving. So, so like talking to other teams, it's extremely difficult to microservices like facilitate that. But if you saw like the whole system, I think there's like a YouTube video of it describing microservices. Netflix looks crazy. It's like a swarm of information moving across, right? And so for me, like seeing that, I'm just like brainstorming. Making wrappers around notebooks makes sense because their culture is like, move fast versus startups. It's hard to talk to other people. It's way easier to wrap this than to talk to another team across the hall, I guess. And so for me, like what's best practice, I guess, is like similar to like what data scientist is, it's like it's completely dependent on the culture. There's my hands. I need to turn this off. It's even distracting me

Speaker5: [01:15:50] Because it's like that zooming in for like in the in like I don't do sports. So I'm assuming, but like in sports, you know, and they're like, they've got like that penalty kick [01:16:00] or whatever, it's like.

Speaker2: [01:16:04] I have mixed feelings. It looks cool, but I lose my train of thought. I'm the attention span of a gnat. But just thinking back to is like, I don't know, there will be like necessarily best practice. I mean, there may be best practices for like certain aspects, but I feel like your company's culture. And again, I don't have any definitive proof or anything like this around this or like experience around this, but just brainstorm from experiences and stories. I've talked to other people about, Oh, I'm going to stop here. I'm. Awesome.

Harpreet: [01:16:42] Thoughts and comments on that, anyone.

Speaker2: [01:16:47] Yeah, look, I completely agree that what's best practice, it's like, you know, I was recently listening to the audio book of software engineering at Google The O'Reilly one. I learned a lot from listening to that book, but the biggest takeaway was the number of times they're like, Now, this might not work for you. We do this because we've got, you know, a bajillion software engineers trying to, you know, search a single giant monster repository at the same time. So we need to optimize that like the entire thing was, this might not work for you. And I think really understanding that is pretty critical. So like, it's very tempting as engineers to want to know the answer to how to do something like we're expected to know how to build something from scratch, right? So we're expected to follow design patterns who are expected to know this is the right solution. To be honest, let's be let's be frank, most of us, if we're trying to do that, we're bullshit artists, right? Like we're really just trying to figure out what works best in that situation. And if you're working in a cutting edge enough field, which frankly most of us are, you're going to find that everyone's going to come up with a slightly different solution to it. It's exactly [01:18:00] why you have at least a half a dozen Data LinkedIn solutions. You've got at least, you know, you've got at least four different four or five different, decent machine learning packages out there, right? And you've seen those evolutions keras, totally different to town, completely different to pie, torch and Tensorflow, like they're just totally different ways of what is best practice for machine learning. I mean, darknet is another example. Right? So what's the best practice? It's going to depend on what you're trying to solve. At the end of the day, it's a really like it's a cop out outside of it, like it's kind of honest.

Harpreet: [01:18:36] Awesome. Well, thank you so much. Great conversation, great talk. Let's go ahead and wrap it up. Any parting thoughts or words from anyone? Do let me know. Russell's got a lot of good feedback here in the chat. Real trick is convincing people to allow us to fail enough times to create something incredible. And on that, let's go ahead and and wrap this happy hour session up. Thank you guys so much for joining. Be sure to check out the episode released today. Scott Taylor, the one and only Data whisperer who's getting loud on the podcast. So check that out and a few events next week. Speaking at some oDesk thing on Tuesday. Check that out, if you can. Wednesday. A Harp happy hours, yet no comment office hours and then Data science happy hours be on camera a lot this month. She is tiring, but she's got a nice haircut. So guys, thank you so much for joining. Take care of the rest of the weekend. Remember you got one life on this planet. Why not try do something?