HH86-17-06-22_mixdown.mp3

Harpreet: [00:00:09] What's up everyone? Welcome. Welcome to the day. Data Science Happy Hour is Friday, June 17, 2022. Super excited have all all here thank you so much for joining in shout out to Ben for taking over the hosting for me last week on such short notice. Ben, I appreciate you taking over for me that I haven't got a chance to tune into the conversation yet, but no doubt that it was a good conversation. Huge shout out to the sponsor for today's episode. This episode of the Data Science Podcast is brought to you by Zee, by HP get rapid results for the most demanding data sets, train data models and create data visualizations with Z, data science, laptops and desktop workstations. The Data Science Stack Manager provides convenient access to popular tools and updates them automatically to help you customize your environment on Windows or to find out more by going to HPE Forward Slash Data Science. Thank you so much for responsive podcast see by HP. One of these days I will turn into kanji and become a global ambassador. Hopefully that happens. What's up? Everybody in the building shout out to Gina's in the building in Rio, then wrestle Ricardo, Maia, my ass and my job, my chair and the Kiko. Super excited to have you out here. Thank you so much for taking time and joining. If you're watching on LinkedIn, YouTube, Twitch, wherever it is that you're watching the stream and you have a question too, feel free to drop the question right there in the chat section.

Harpreet: [00:01:39] Or if you want to join us, shoot me a message. I'll give you a link to the Zoom room. So let's go ahead and kick this discussion off. I'm curious to hear about why there's so much stuff going on in tech right now from from layoffs and companies kind of rolling back valuations. And then this Google employee who got put on leave because he thought lambda [00:02:00] was was sentient, I don't know where to start, where to pick up. Let's go with the most controversial topic, though, and that is chat bot sentience. I got a chance to scroll through some of that conversation transcript. I thought it was kind of creepy a little bit, and I'm curious if you have done any research into that. We looked into it where your thoughts are on that. And while we're keeping this conversation, I get the genius question right afterwards. Then I'd love to hear from you, man. What's your take on? Well, how about this? Let's not debate the nature of sentience and whether chat bots can actually be sentient. I'm more interested in the move that Google took, like putting the engineer on leave. Why do you think they did that? Of all things? What do you think the rationale is behind that? How would you handle that situation?

Speaker2: [00:02:56] You know, I think as far as what happened, I mean, if you look in the clouds and you see a face, you did the same thing that guy did. It's exactly the same mental mechanism. And so I think if your Google and you have someone who instead of running it up the ladder who just goes public and says, you know, hey, let me give some proprietary information out and kind of make an assertion with the word Google behind my name making, lending their validity to something that I'm saying. I think you have to suspend someone like that. You have to have some questions about whether your intellectual property is safe. You have to have questions about whether that person is you know, whether the person's having some sort of a crisis which, you know, could happen. That's what a lot of us go through before we have one of these wild moments. And I think everybody's had that where they've been a little bit too far into the code and you you pull your head out of it. And the real world is kind of edged by for six months. And so I think it's it's [00:04:00] the right thing to do to put them on leave. I don't know if I'd have said it, talked about it publicly or not, but you can't. You'd have to be worried about your property, intellectual property at that point. You'd have to be worried about what other leaks are coming from this individual. So I think, like I said, he saw he saw a face in the clouds. So not like he did anything wrong. But at the same time, the reaction was, I think what Google had to take action over.

Harpreet: [00:04:31] Didn't think so much. Yeah, that's an interesting point about kind of like guarding VP because this person does come out and makes the claim like that. Chances are they're gonna be hit up by these reporters, and who knows what it is that you might say out there in the press? Curious if Murillo or Russell or anybody in the room has, they'd like to contribute to this. Feel free to let me know if you got questions of your own. Also, let me know if you're watching on LinkedIn, on YouTube, or whatever it is that you are at. Feel free to ask any questions. Well, we'd love to love to hear from you on this.

Speaker3: [00:05:06] Sure. I honestly. Well, first of all, I completely agree with everything.

Harpreet: [00:05:11] That Ben said. I think it would perfectly the.

Speaker3: [00:05:14] Company's strengths if that protects their intellectual property. But to be completely honest, I didn't read much about it because when I saw.

Harpreet: [00:05:23] In the news, I kind of was like.

Speaker3: [00:05:24] Oh, this seems like fake news. And I just I was like, I didn't want to spend too much of my time on this. So I'm not I don't know a lot about it.

Harpreet: [00:05:34] But, yeah, I completely.

Speaker3: [00:05:37] Agree with what Vince said. And I guess just reiterate that we're not close to have, you know, general artificial intelligence or artificial general intelligence.

Harpreet: [00:05:51] So, yeah, definitely, definitely a long way away. Thanks so much, Akiko. Just drop something in the chat here. It was a [00:06:00] link to a tweet. I'll go ahead and show the link out on YouTube and LinkedIn as well. Also in the show, notes of the show, if you're back, we can go ahead and jump on your on your question list. Michiko and Akiko is here. You got anything to add? Love to hear from you as well.

Speaker4: [00:06:19] So generally speaking, if you don't think you did something wrong, you usually in a tweet won't say, I know or think that this thing I'm doing is wrong. And generally speaking, people who believe what they're doing is truly innocent won't necessarily say that. And he kind of said that in the tweet where he's like, this might be called sharing Google's proprietary property. So yeah.

Harpreet: [00:06:49] Awesome. Well, Gina, if you're back around, let's go ahead and jump into into your question. So go ahead. Go for it. Take the Explorer.

Speaker4: [00:06:59] Well, first I wanted to just say regarding last week's discussion was really I listened actually to part of it. I wasn't able to attend. But the question about learning and fear and how do people deal with that? And just one, there were so many good comments. But one thing that I'm not sure I heard that I wanted to add is that when you start working on something and you don't get it right away, at least I think often in American culture we feel like, well, I guess I'm just not cut out for that. I mean, that's really, I think, more ingrained sometimes than we realize the notion that some stuff just takes a lot of work and you struggle through it, that that doesn't always come through, especially in the stories we hear of supposedly brilliant founders who want some great company that just takes off. And you don't necessarily you don't hear about all the [00:08:00] days, weeks, months, years of hard work and struggle and doubt and all the rest. So I think it's important. I just wanted to add that for anyone who might remember, obviously Ben does from last week, that that discussion I think is really worth it. And once I realized that, I think the researcher at Stanford, Carol Dweck, the growth mindset, she talks a lot about that and that really helps. That helped me. I realize that I have a growth mindset, but at the same time I grew up in a family, you know, where it's like if somebody tries something and it doesn't work out, it's like, Why were you so stupid to try that? Why did you think that that would work? And not so much directed at me, but I saw it directed at other people.

Speaker4: [00:08:52] That doesn't exactly give you warm fuzzies about trying something risky. So I just wanted to point that out that things, even people who are really good at things. Of course, Malcolm Gladwell talks about this and outliers and stuff. Even people are really good at things, you know, they work at it consistently over time. And that's what you need to do. And it isn't always just like, Oh yeah, it was like falling off a log. I totally got it right off the bat. So that was my comment on last week's discussion. And then my question for this week, I've been working on a really big project and I'm reminded that through my combination of working very hard over the years on my writing abilities and my obsessiveness about writing, about editing, about punctuation, about formatting, about clear expression, about synthesizing information into something coherent as opposed to. Giving a client a bunch of stuff that and this is a unique [00:10:00] case. So this isn't like a regular consulting engagement, but giving somebody a bunch of stuff and saying, in essence, you figure it out, you kind of go through the pieces. I'm reminded that this is a skill that I don't think a lot of people have or have developed. And I'm really curious to hear all of your thoughts. You know, you all having worked in data science and data science adjacent fields, how does this manifest, whether it's in companies with teams? I mean, we hear so much about how important it is to be able to communicate and how important it is to be able to communicate to decision makers who may not be technical. But I'm curious to hear how this manifests in in the work that you all do.

Harpreet: [00:10:59] Yeah. So I guess the question is, if you were to distill down the question you go to here, what rights goals are most important from a data science perspective?

Speaker4: [00:11:11] Writing, yeah. Yeah. I mean, how important is good writing in in the jobs that you all have had, like.

Harpreet: [00:11:21] Yeah.

Speaker4: [00:11:22] What impact does it have and what are the most important skills? Right. I mean, yeah, storytelling, etc.. But I mean, literally, if you're putting together a report and you're laying out the work that's done or a PowerPoint, what's most important? What are the skills or the things that are most important in those deliverables?

Harpreet: [00:11:47] People always be talking about data like storytelling and data science, and I'm just I always wonder like, who the hell has time to listen to your stories? Right. The most important thing of any communication, whether it's data science or not, is just be [00:12:00] clear, concise and confident. Right. I think that's the three things you need to have people that are reading your stuff like they're busy, right? It manifests itself in a number of ways. So I guess I'll I'll tackle that part first. I think that we do everything on Slack now, right? So Slack messages having to write clearly in Slack in such a way that you don't come off. I would like to use that as well. So you don't come off kind of like a big, super important me personally. My job is not developer relations, developer advocacy, so it's like a huge part of my job is writing, whether it's writing blog post, writing, documentation, writing with community members. It's a huge piece, I imagine, for. Any data scientist that has to communicate with other humans. It's definitely a core skill to have. So pick up a business writing book, people of business writing course. One of the most impactful things I ever did that just completely changed the way I write was a I think it was like a 30 minute course from Scott Adams, the creator of Dilbert. And it's literally called it was a blog post that's called The Day You Became a Better Writer. And then he had like a half an hour YouTube video that just broke down how to just write effectively. And I think it might be called just that as well. Like the baby became a better writer, but we've got some writers here in the audience, like Jim's got this amazing substack that you check out. I'll be sure to drop a link in there as well is all but see over here and again.

Speaker4: [00:13:31] Yeah, let me if I could just add one thing. So yeah, I mean I feel quite confident because I've gotten a lot of very positive feedback along these lines that my writing skills are very good. And one thing that sometimes frustrates me is I wonder how much that's really valued, right? Sometimes I get the feeling like I don't know, things like punctuation, clear expression. I get a sense that a lot of people don't [00:14:00] think that's very important. I personally think it's important, probably partly because I'm a little bit OCD about it. So if things aren't punctuation isn't in the right place or whatever, it just drives me crazy. It distracts me. So there's a clarity of thought that comes from this. But is it valuable or are people just like, you know, I get the idea. I mean, seriously, that could be an answer. And I'm curious about that as well as we've already kind of put out there as a topic.

Harpreet: [00:14:32] Yeah, that's interesting. I think if you write with bad grammar, bad punctuation, like it does kind of reflect a little bit on your audience. Like, My God, this person doesn't know the basic rules of grammar. Like how am I supposed to take their analysis seriously if they don't know when these are period versus like a comma or semicolon? I think that is definitely important. But if it was something that I personally wouldn't spend too much time like combing over a piece of written communication just to check for grammar, because I don't think that's unless it's a piece that's published. Right, that's on the Internet to live forever, then let's send it to a copy editor or someone on your team and get that out there. But I think with just interpersonal communications, just enough to get the job done, and that's my personal take. I'd love to hear from Akiko and anybody else that was to chime in. Please do let me know. Writing, I think is super important still. So materials protective.

Speaker4: [00:15:30] So my SO before I went to day science right in high school and before I went to college. So my scholarship, I actually got scholarships in right in both journalism and creative writing. And that was actually a big part of what I was thinking about taking college. I want to because my parents wanted me to go to Naval Academy, actually, but I wanted to have a little bit more freedom to do writing and journals and all that. You [00:16:00] know, it is important, but to like at some point write like. So. As a data scientist, typically your main output is not. And your main deliverable value is not in writing. As a data scientist, your goal is to train models, get them pushed down production. Ideally, although Dave Langer might disagree with me on this know but it's to get out of production. Communication is like a part of being successful in that process, but that's not the thing that they're essentially paying you for as a data scientist, right? As a dev advocate or a developer, relations is our point out. Like that is a huge, huge part of that role because that is a lot of what you're doing is you're taking what the company is doing, the product, you're evangelizing it out, and you're also trying to get that feedback in to then like translate that into the project team. Right? But that's a different role from a data scientist. So I would say that like most people, they might say they like longform writing, but they actually don't really like long form writing when it comes to reports.

Speaker4: [00:17:13] I would say like the assets that get passed around all the time are PowerPoints. Because for one thing, they're like heavily limited in words. They're usually picture based. People like to share them on Google Drive. They'll print them out as a PDF. Like I would say, that is like the main asset that typically gets passed around. And even then, if you're reporting out on a data science project, you're literally just answering the who, what, when, where, why, how? That's all you're really getting at. When the company eventually wants you to do more like white papers or blog posts, then they want more details. But that's because you're communicating it out to an audience that doesn't have the context of that project. But I would also say pictures are a very [00:18:00] underutilized, underappreciated form of communication. Most of the day sites that I work with, they're like, If you could essentially put it into a step by step list or a set of bullet points or pictures, they will be very, very happy because I spent a lot of time already reading like long form technical papers and same with business partners and apparently millennial attentions and Gen Z attentions are getting even shorter to the tick. So. I think the trend is going to be towards short form writing, not long form. So that's just my my take on it.

Harpreet: [00:18:37] Akiko, thanks so much. Then let's hear from you, by the way. We're talking about just the importance of writing skills in data science or data professionals at large. You guys got any points to chat about on that? Please do let me know. Shout out to everybody else in the room post. It's going on. Good to see you, Harold, as well. And then we also got Mark and we'll go to Mark's question after this video.

Speaker2: [00:19:00] It really depends on what you're doing as the data scientists and at what level that's really going to be, because Mexico's experience is 100% spot on, especially when you're engineering focused. It is the ability to explain quickly to an executive audience just one picture, five or ten bullet points. I mean, if you try to go past five or six communication objectives, you're going to lose your audience. And so the ability to write isn't as important as the ability to reduce. And this is one of those really critical writing skills is you go from five pages to a half page. And when you can do that, that's actually a strong writing capability. So it's that ability to reduce, the ability to synthesize and to really just hammer home your main communication objectives. And to do that, both oral and written communication, especially in emails. When you get to leadership and [00:20:00] then the email all of a sudden becomes something different because you have to respond to a broader audience. You may have to push back in an email. And so writing and the ability to convey your thoughts very intelligently and back up what you're saying with evidence, with points in an email, because sometimes you have to do it in email. You can't just all of a sudden call a meeting, but you have five people asking you to do five different things and you need to very diplomatically write up a So here's the problem, or you need to take a position through email and you need to say, Look, this is how we propose to solve this problem and here's why.

Speaker2: [00:20:39] And so writing evolves when you get into leadership or when you get into that, that principal role. Staff, senior staff, distinguished. All of a sudden, writing kind of comes around and it's something different now. And you have to be able to articulate your thoughts and clearly defend the concepts and the ideas and suggestions and the recommendations. And then you get to the executive level and writing becomes an art form. Again, you all of a sudden have to be able to write up these long form papers. You have to be able to contribute to patents. You have to be able to concisely really distill down exactly what it is about this particular initiative that's going to become a competitive advantage, both now and in the long term. And these are all written documents you eventually present, but they start out as technical documents. As strategic documents. You end up building out some very complex plans. Those are all written. And you then go back to it being an art form because you have to tell a story. You have to incorporate data as part of it. But you are truly storytelling because what you're really doing for senior executives at that high level is you are not only giving them data points, but you're also giving [00:22:00] context.

Speaker2: [00:22:00] You're also giving them the background that they need to incorporate those data points into their heuristic decision making processes. And so the the power of the analysis, the written analysis, I make my living doing that. So it is the ability for me to create emails that I'll send out to. I've got 12 on my list right now, but I'll send out analysis where I break down a complex topic and explain what's going on and what it means and what do you need to do about it. Just in that format, those three communication objectives, but they end up being five, sometimes ten pages long. And so it's that ability to deliver the story so that now senior executives can take that back. They have talking points. They have communication, basically knowledge that they can now communicate to the rest of the organizations. And they have things that are actionable. But it's a storytelling process and it's very much like investor analysis and investor notes and that sort of thing. So there's its phases and when your early to mid. It's exactly what Mexico is telling you. And that's, you know, sometimes being loquacious can get you get you in trouble. It's actually a bad thing, but then it transforms.

Harpreet: [00:23:24] Thank you very much. Also, there's another writer, Joe Lewis. What's going on, Joe? Massive, massive. Congratulations for your book launched. So we're just talking about the importance of of writing skills as a data professional. And having written a book, I think you'd be a good person to get perspective from.

Speaker3: [00:23:48] What do you want to know? Yeah, I mean, it's I would say there's writing for for work and there's writing for [00:24:00] a. I think diving into your knowledge, which is more of what a book would be. Both are both are important, I would say. Both are almost the exact opposite of each other. So in the sense for like with work, you need to be very I would say I prefer short and succinct things like if. Tell me what you need and I'll get them to you. I don't really need an essay if I ask for an essay. Certainly provide one. We're a fans of like kind of the six pager methodology here at ternary where if you want or if you have something important to say, you should at least try and write it. But yeah, I would say writing is writing is important. It's also one of those things where it's it's hard. It's really hard. I think the hardest part is writing is when you start to write something, you realize either you do know the subject or you know what you want to write. Or I think usually more to the point, you stare at your screen for a long time and don't know what to write. And this happens a lot. I'm sure everyone's been here, so I don't know. My favorite technique has been I have a I carry a legal pad with me and I always jot down my ideas on that. So. But yeah, it's tricky. Sure. So, yeah. And then writing a book is a different thing. I don't think I would suggest that to most people. I think it's a really bad idea.

Harpreet: [00:25:21] So don't think so much. And again, congrats on the book launch. It's the fundamentals of data engineering now available for you right there on Amazon. And where else do you have like a preferred place for people to buy this like Internet. So just wherever wherever you find it.

Speaker3: [00:25:40] I mean, you can also if you have an O'Reilly dot com subscription, I found out too like if you have one, you can go there. A lot of libraries actually have access to O'Reilly as well, so you can just get it online and read it there. So either way.

Harpreet: [00:25:54] So yeah, well, thank you. Thank you so much. Let's go to Mark Clemens. Question Mark. Go [00:26:00] for it. Are you still here? Yes, I'm here. Yeah, go for it.

Speaker3: [00:26:05] My question is related to data science data. Just a project, personal project I've been working on for the past nine months. I thought it was going to last two and then here I am. But so I'm working with I came here. Where did you talk once about the Instagram comments and the bots that are still there and commenting and always continuing on scaring people because those bots at one point in the in the in the scam make you pay for sexual content, fake sexual content. But I already collected a lot of comments, a little more than 100,000 only public data. And then I cleaned the data, which was already pretty clean and I started labeling and I came here because I was wondering how can I label the users and the legit users and the bots and how can I find labeling techniques for that? So I worked a lot on how to label bots and I came up with many different techniques to label them because there's always like one part of like one feature of one bot that is similar to another bot. And so I managed to like a lot to label a lot of bots. But now I'm at this point where I have about like 25% of my dataset which is labeled, which has some bots and some legit users, but most of the rest of the dataset, which is about like maybe like, I don't know, 50,000, 60,000 rows of data that are just legit, legit users like, like you and I. And we're all different from each other and there's really no technique that I can use where I can just like, grab you all and like, all right, you are legit useless, which I could do with the bots because I could look at the domains that we're using. I could use that. I could get profile pictures and many things they were using. So I just, I don't know how to label all the users and I don't want to label them one by one because it's [00:28:00] going to take me forever. So I'm a little bit I'm a little bit stuck here.

Harpreet: [00:28:06] He has have funds to do like a mechanical Turk type of thing to get human laborers. I'm sorry. Do you have any funds to get, like, human laborers from, like, Amazon, like Mechanical Turk type of thing? Like, can you pay for for people to label?

Speaker3: [00:28:30] I will. This is a personal project, so I would like to not pay.

Harpreet: [00:28:33] To keep it. She could keep it as free as possible. Well, if anybody in the community wants to help mark out and help him make this stuff, I reach out. But. Yeah. Anybody have any labeling solutions? I'd love to hear from that. I mean, I know there's the label studio that you could do, but then again, that is manual. You have to do it one at a time. You might be able to use some machine learning to, you know, to, to, to label.

Speaker3: [00:29:02] And then there's one thing. So I started the first step of the labeling was manual labeling, and I had a background model that was always training. The more data was fitting into and it was on the background pre outputting a prediction and I always keep on the side not really to like always to to to directly label users but just to look at what I'm about to label and what the model thinks. And now I'm at this point where the model has like plus 95 in pretty much all binary classification score. So like accuracy of one. Matthew Correlation everything. So I'm wondering if I also can do something with that model, even though the distribution of the users is not the same as in real life, because I label way more bots than legit users right now.

Harpreet: [00:29:50] It's a good question. Anybody have any ideas off the top of your head? Because we can bounce ideas off here for sure. But if anybody has any ideas, [00:30:00] please do let them know. Let me know. So I guess what people are doing on this as well. Have you thought about. You said you trained a model to to how good is this model is good at recognizing bots from humans with 95% accuracy. But then again, you also have a imbalanced dataset where most of the things are bots anyway. Okay. Oh, you're on mute.

Speaker3: [00:30:30] I'm sorry. I was saying that the overall population is about I think it's like 16% of us for like 84% of aged users. And I think I'm like like 60, 60 users, 40 users with the model I developed. So it's not really the same at all.

Harpreet: [00:30:47] Yeah. The question. I think I get the insight here. I would love to love to hear anybody on YouTube here in the room. I mean, assuming you built a model that could generalize to these unseen examples and then try to classify the remaining as thought or human, then you can look at ones that were I guess the model has lower probability on what it thinks it is, right? So kind of ones where there's a bit of confusion and then double check those and that should reduce your space of labeling significantly.

Speaker3: [00:31:31] Is there a threshold in the probability which I should maybe look at, or how is there a way of determining this this threshold? I would put like, for example, like, I don't know, more than 90% of accuracy. Like, I'm sure this is choosers. Otherwise, if it's less, I'm going to look at those users and I'm going to determine by myself or with all the techniques.

Harpreet: [00:31:50] I think that's a point to experiment with because if you lower the threshold enough, then maybe you get a whole bunch where you have to check manually and you find that sweet spot. [00:32:00] You got to get your hand up inside here. Definitely. Definitely. To help us out. I'm gonna look some stuff up. Mark, let me see if I can get to you if anybody in the room has any insight. So this isn't necessarily I can't solve exactly how to fix this, but an idea how potentially get resources that since this is a personal project not really relate to work. I think there's a really cool opportunity here where you can turn to content for a company for a very interesting kind of project. And so what if you found a vendor who does data labeling and uses your project to show off how cool their product is? That way you don't have to pay for it and get some type of advertising, some blogs type of way. I've done that before and it's really fun and you just need to like have your project ready and just pitch it to people and see what they can do. And people are willing to to help out with that. Maybe Harpreet Sahota developer relations. So I feel like he might have some insight on potentially that process. Yeah. I think that's a good idea. I would definitely help someone out if they came to you with something like that. That. That's a good idea. Absolutely. Yeah. So, label studios one organization we could reach out to. Who else? I think Oracle might do labeling that there might be other companies like that. If anybody has an insight on this, please let me know if you got. Send me an email. I'll be sure to pass it on to mark your listing and have the ideas.

Harpreet: [00:33:34] And Mark, I'll do some research as well. So it's an interesting project. But I see Eric is. Eric has. Yeah. Yeah. I just had a quick I had a quick question mark. Sorry. I joined like during your explanation. So I'm sure I miss important information. How many? I guess. I guess like Mechanical Turk, don't they call it like judgments or something like that. I can't remember. Anyway, whatever they call, like the tasks that you're doing, how many tasks [00:34:00] are you are you looking at? And the reason I'm asking is because I'm trying to think through if you wanted to go the Mechanical Turk route, you know, whether it's like Amazon Mechanical Turk or just the idea of having a human look things over, you know, a thing that is. Talked about or whatever, even just like with a quick Google is whether or not people are being kind of paid fairly for their time or whatever. So I was just trying to wonder how much, how many rows or how much effort is there? Because I wanted to try and think through, all right, if you were going to pay somebody minimum wage, $15 an hour, a fair wage for where you live or something like that, how many tasks would they have to do or would they be able to accomplish in an hour? And even if you don't want to spend the time, I'm just curious to try and understand what it is because, you know, we talk about it where we say like, oh, well, what about internet? It's like is network obviously enter entered shakes out to be a pittance compared to like a living wage. But that's something I'm just trying to ask to understand from you.

Speaker3: [00:35:03] So at the best. So the first, the first liability technique I use was manual labeling because I couldn't really use anything else. So what I did is I made a summary PNG of all the users I have with the photos I collected about them the last 12 and and photo and then I display some information. So like what's their comment? What's their website is? Give me a screenshot of the website they have in the bio. So I have like a full summary of each user and I could just press right, left, right, left to say this is about, this is not about. And even with that process, I managed to get a little less than 1/2 per user and I still have about. 50 to 70 80,000 users to label. And most of them are religious. So that's a lot of there would be a lot of time. And also doing the menu labeling because I know I make errors. So like I, I [00:36:00] label them twice so I would minimize the risk of errors on the same user. So you would also double date if you do the same technique, I guess.

Harpreet: [00:36:12] I would try and contract my little brother. It's like unskilled. It's like, hey, if you don't have anything better to do and it's summer vacation. Like summer. Good question, Mark. It looks like a lot of us stumped here. So. Good job. There's a link right there. There's a link right there in the in the chat, a couple of links. One of them was something from Cloud Factory. Another one is that from Technology Review. So go ahead, check those out. Let's go to Mark Freeman, who had a question, by the way, if you guys have a question. Oh, wait, go has has some thoughts. I always loved her. Go for it.

Speaker4: [00:36:54] Yeah. So I just want to toss another vote for Mark's idea of going to a company and seeing if they'll do the resources. Snorkeling, I think, is one of the best ones. A huge part of that also is people seem to forget that. With contractors or laborers, you still have to train them, you know? So there's that component. There's cost. All that other good stuff. So. Yeah. But I think that would be the best way to go. Like for sure is Mark's idea.

Harpreet: [00:37:34] Yeah, it's a good idea. Mark, let's go to your question that Mark Freeman. Yeah. So there's actually a lines with with the hustle thing because the reason I brought it up is that I'm currently trying to figure out someone to pay for my current project ideas. I'm trying to up my data content. I'm trying to do some really cool stuff and I have some really fun ideas that I'm brainstorming through and trying to think through. And one of them is, I'm very interested in kinetic art. [00:38:00] While my favorite kinetic art groups is Breakfast in NYC, where they essentially combine AI with art that moves. And I saw that and I instantly said, I want to do that. I'm going to make this happen. I don't know how, but I'm obsessed with it. I can't have not been able to let it go for two months. My question, I think Kasab might be a great person is he works with robotics. I can do data science on the computer, but then going to data science on a physical object and that interaction between mechanical and data, that seems like a completely different beast. Any tips on making that jump from doing data science on kind of like physical objects? Kosta. Go for it. My guess is you'd have to probably pick up some reinforcement learning, but.

Speaker5: [00:38:55] I'm a little curious off the bat. What what exactly do you mean by mechanical objects and what kind of data science are you trying to do with that?

Harpreet: [00:39:05] That's a great question. I'm I'm out of my depth here. And so I'm still in the early phase to figure out how to explain this. But to describe kind of what I'm trying to do is I'm essentially have, you know, the old train science, the the the departure science that flip across what I want to do. The idea I have is I want to take a flip board of a certain dimension size that has four colors for each square, black, white and two shades of gray. And essentially, I want to use speech to text to say say it image and that go to Dolly to to generate an eye generate image turn that into the grid that is based on kind of like the dimensions of the of the Flipboard. And then from there turn it to black [00:40:00] and white, and then you can basically create a grid of colors and have that send back to the Flipboard to generate that image. And so the idea being is using AI to interact the physical piece of art to do some fun things I think is just a fun project. Also sounds like an expensive project and the way I looked at Flipboard's. And so that's kind of where I'm at. I'm just in the early stages of exploring how can I make this happen and figure out what the budget is for this and how to break this this large project into individual steps. Because then I'm gonna start going to pitch to companies so they can pay for it.

Speaker5: [00:40:39] Gotcha. Okay. I'm not 100% sure. I followed everything along that route of the contract. But without. Without wanting to pry too much into your multibillion dollar idea, which I'm sure I'll want to see more of, I guess. I guess the first question is, when you're talking about hardware, it's trying to figure out what exactly belongs in the data science part of that hardware and what's already solved with other techniques. Right now you might not need ML or data science techniques for everything along that way. There's a lot that can be done outside of that as well. So I just it's just taking stock of what exactly is the goal of what you're trying to achieve. Right. And what hardware already exists out there. What's the limitation on your cost? Right. How expensive is that hardware going to be? And then you kind of go, okay, is this going to take a Jetson board to run the kinds of data science and network that I need on the edge, you know, or is this something where I just need like a little Raspberry Pi that's going to grab stuff and chuck it out to some cloud endpoint that then does? All of this.

Harpreet: [00:41:56] Ladder. I'm hoping for the ladder. And I have I have [00:42:00] a little Raspberry Pi here. So I started the process.

Speaker5: [00:42:04] I pretty much don't travel anywhere without a Raspberry Pi at this point. So. Yeah, questionable. Just don't I just don't put it in my hand, carry. Let's put it that way. Right. So basically, yeah, it's just take stock of what exactly you're trying to achieve and then take some time to figure out what the different ways of solving that are in terms of hardware that you actually need because it gets expensive really fast. Right. It's a very easy way to sink a bucket load of money when you probably didn't necessarily need it or there's other solutions around it. Right. So think about that and then think about what kind of compute power you'd actually need to achieve your task. Right? And then you distribute your hardware based on that, right? And then if you're putting stuff on the edge, it's just a question of making sure that it's actually not that different. It's just understanding how that Linux system differs from having edge connectivity. It differs from having different kinds of latency. It potentially, if you're talking about multiple physical objects, multiple physical things that are collecting data for you, how they interact and how the data flow interacts because it's almost like multiple separate nodes collecting data, right? It's actually no different to having multiple separate data collectors on the Web or on the cloud. Now, again, because I'm not very clear on what you were saying, a lot of this might not be strictly it might be equally unclear. This is probably the this.

Harpreet: [00:43:37] Is still helpful because I'm still in the early discovery phase where I just know my explanation probably makes zero sense or very little, but I need to put it out there and talk to people so that I can get closer to a clearer description, to get closer to a clear set of instructions to actually build.

Speaker5: [00:43:56] My right kind of direct feedback would be based off that description you gave me. [00:44:00] It seems like you put a lot of thought into a process that might solve a problem, but not as much time into articulating exactly what the problem is that you need to fix.

Harpreet: [00:44:11] Right? Yeah. To be clear, there is no problem. I'm just trying to create cool stuff. Just cool art.

Speaker5: [00:44:20] Love it. Cool. Yeah. Feel free to hit me up any time. If you've got, like, hardware questions and how we want to approach stuff from different.

Speaker6: [00:44:29] Different at a minute. Cost is trying to help you with incredible wisdom to get there more successfully and with less effort. He was actually echoing the very things that were flowing well. You need to flesh out an architecture and think of different components you would try in that path or that architectural path and yeah, spend more time on the drawing before board, before just diving in and. Investing in hardware, from what you're saying, should be minimal hardware like off the shelf stuff.

Speaker5: [00:45:09] I hope.

Harpreet: [00:45:09] So. I hope so. So I think that's a great both you all gave me a good good next step is right now is in my head I have it written down my next step will be to draw it out. And I think that would probably provide a lot more clarity of what I'm trying to do. And I just got my iPad with the iPad pencil and I've been trying to find something to draw with, and I think this is it. I'm going to have something and I'll come back. But there's these cool, sick drawings and all y'all are going to be like, Wow, how'd I get involved in this project?

Speaker6: [00:45:41] Well, but mark the real test. Let me let me just explain something before I give you this anecdote. Grandmothers are really smart now, but they don't necessarily keep up with technology. So when you can explain it to your grandmother and [00:46:00] she doesn't have any questions, you're ready to start working on it.

Harpreet: [00:46:05] They love it. And yeah, no. Youtubers like Mark Rober and there's a whole bunch of people I watch. I just like. I keep on seeing that. I was like, I can do this. Not as well yet, but I can do it. And I'm going to make it happen because it just seems so fun.

Speaker5: [00:46:22] Tell you what, man, when? When you have that diagramed out, right? Pick me up. Happy to help you out with where I'd start with the hardware or where I stop with the hardware.

Harpreet: [00:46:31] Which is.

Speaker5: [00:46:33] Often the trickier and more expensive decision.

Harpreet: [00:46:36] Yeah.

Speaker6: [00:46:36] A LinkedIn live to explain it to all of us.

Harpreet: [00:46:41] I'll be awesome. Thanks, everyone, for allowing me to talk through my super abstract idea. I actually love a man. You are a artist of data science in the actual literal sense, so absolute level that if anybody has any questions, do let me know. Question When you've got a question like. How do you use, I guess probably deep learning when it comes to robotics? Like what's like the we just think of like the most simplest use case of robotics which. Maybe it's simple, I don't know. But just the arm going out and picking up like a piece of paper, right? Like how how we use machine learning and that. How we use different infrared.

Speaker5: [00:47:22] Well, so far the solution has been you don't write, you don't need deep learning in order to pick up an object. It depends on how controlled your environment is, right? So now in manufacturing, robotics, we've been doing that for like 30 years, right? Where you've just got a controlled picking up a known object in a known space and a known area. Right. And what's evolved from that is not so much needing a controlled object in a jig in a specific place. Right. It's evolved to kind of saying, oh, we've got an overhead camera that will then detect where the object is and tell us where to move the arm to and how [00:48:00] to articulate the arm to go and pick up the object. So that's that was the next step. It's kind of going intelligently to be able to place anything on a conveyor belt and then go from there. Then you kind of evolve it a little bit further, right? And you go, Okay, how can I essentially teach this to deal with any object that maybe it hasn't necessarily seen before? Now, this is a big problem within the waste control industry, right? Waste control separating out recycling types is a extremely complex process. Right. And they use different kinds of imaging to understand, okay, are there metal components in this, as is the PCB involved in this here? What kinds of plastics are here? How recyclable is it? Right. So you can get different kinds of imaging. And then that's where I start seeing this is where the simplest form of machine learning comes in is that I can take.

Speaker5: [00:48:51] Sorry. Not true. The simplest form of deep learning and computer vision comes in is basically taking images and being able to classify it into three or four different recycling types, or at least being able to triage it to an extent. Right. So that's one space where I see it kind of coming in. In robotics, you've got two different key areas of how I'd apply deep learning and computer vision to robotics, right? Area number one is in essentially putting a layer on top of sensing, right? And area number two is in in optimizing articulation and mechanical movement. Right. So the second one and I focus less on that because it's not as much my specialization, but it's basically, okay, how can I move this robot, which I've designed and is funky, it's got seven legs, it's got different kinds of motion. How can I teach it to move in the most efficient way possible to do a particular task? And this is where you see your reinforcement learning come in your deep. Q Learning for how do you [00:50:00] teach a robot a task, a complex task and a complex set of movements, right? That's where that starts to slip in path planning, optimization, all of that stuff, to be honest, yeah, I'm seeing a lot of things on Kaggle that's talking about how essentially how different ML techniques are overtaking things like Kalman filters for sensor fusion and things like that.

Speaker5: [00:50:27] Right. And they may well do, but the fact is those problems are kind of solved, right? We have common filters that help us in navigation and in sensor fusion. We've got navigation algorithms that work reasonably well. Right, that work reasonably well to what we need. So the high value right now that I see in robotics is that applied sensing kind of thing applied to sensing a layer where you're able to say, hey, this is not just object, object, object, you're able to say, hey, that is a plastic that I can recycle, right? And make decisions that no longer need that human guidance in that thought. Right. And it may be collecting the data in a slightly different way. I did a bunch of work in in underwater robotics, so I've probably mentioned this before to a point. Is looking for objects underwater, right? You're using sonar imaging that it's something that a human couldn't probably do as easily. So you need a different kind of imager. And and more than just knowing, oh, these are my depths. And there is a point here that is higher than the rest. It's understanding those points that my sonar image is picking up. What exactly is that in my environment? Teaching the robot to understand what that is helps feed in navigation. Now let's say doing a drone that can land itself, right? How many times is someone flown a DJI off like a ferry? And then I've heard the story at least four times in my personal circle where the ferry has moved on and then the drone gets out of reach from the controller. [00:52:00]

Speaker5: [00:52:00] And then it it does it back to GPS pattern, which is back to the old GPS location where the ferry is no longer there and it's gone back down and it's basically gone down over water. And they're just standing there helplessly watching it sink. Right. So now. This doesn't solve that entirely. But what would happen if I could point a camera downwards and find out is that water as opposed to is this just a flat surface to land on? Right. That's where machine learning can come in and help. Deep learning can come in and help because you can start creating networks that can tell you, I'm landing over water or I'm landing over really flat concrete because both flat concrete and like really smooth like tarmac and water are reasonably similar in reflectivity. When you're talking about, you know, when you're talking about like a leader or something like that, you're still just going to get this flat plane. So how does a robot tell the difference between a flat plane and a flat plane? Right. So that's where deep learning comes in, adding that semantic level of understanding to what a robot can do. Right so in the and place scenario there's the other thing is gripping right is and this is very similar to the to the motion path of like oh, how do I navigate this area? It's how do I grip this weird object? I don't know.

Speaker5: [00:53:22] Raspberry Pi is sitting there. Actually, this is relatively simple. How do I grip this mouse right by without having ever seen this object before with a gripper that I haven't specifically programed to pick up this mouse? It's not specifically designed to pick up this mouse. Right. How do I go about and learn how to pick up an object that I haven't seen before? Right. So there's a lot of effort going into reinforcement learning to teach robots how to grip weird and wonderful things and videos. Got a number of papers out on this in the last couple of years looking at simulation tutorials, so central kind of [00:54:00] work, right where they teach it on millions and millions of simulated scenarios with different shaped objects with particular grippers, and you just see it fail over and over again. And eventually of all those millions and millions of iterations, it starts to learn how to grip things differently. And then suddenly you've got this intelligence built into your into your software that can go, Hey, this is roughly the shape of that item, and I can do that with a little scan and or a camera combined and then go, Okay, this is how I would approach that object and go pick it up, right? But that's those are some of the ways that we're using deep learning within the robotics space, different parts of the robotic space.

Harpreet: [00:54:41] Let's go back to the earlier thing that you're talking about. The simple use case of deep learning, just classifying a thing as recyclable or not. If we're to design that system, what would it look like? Right. Because I imagine there's trash, whatever coming on some type of conveyor belt moving fast. You need to have something to scanning that image within milliseconds. Some kind of arm punch it and kick it off. Right. So let's forget the part about how to develop a model to tell whether something is is recyclable or not. But you've got a model deployed essentially on a sensor. What's that system look like? I guess the infrastructure interplay between actual physical stuff and your code.

Speaker5: [00:55:30] Yeah. So the thing that you hit, hit the nail on the head on was the speed. Right. How fast do you need to be able to infer? Are you doing? Oh, one object coming along at a time on a conveyor belt versus, hey, I've got to probably not even be taking photos. I need to be doing this on a live video stream, so I need to be able to infer at 30 frames a second. Right, and detect and classify it 30 frames a second. That's a different problem. You might use a totally different model for that than you would on something that you're happy [00:56:00] to take some time. Oh yeah, I've got a 2/2 or a 1/2 model response time from my endpoint. You might not have time to make an endpoint call to a cloud endpoint. You might if you've got like Hardwired Gigabit or whatever, but you probably won't. Maybe not. Right. Chuck this stinking great GPU on it has been the solution from for a lot of people. Right. But it also comes down to how efficiently can you. Train and prune your model. Right. There's a lot of techniques right now that are looking at how can I hyper optimize my model for the specific task that I'm doing? Mostly off the shelf models these days are reasonably general in nature. Right. That's that's the big challenge is, oh, how can we train a detector that can do 20, 30, 80, 90, a thousand classes right at whatever crazy kinds of accuracy.

Speaker5: [00:56:56] But these guys probably just want to do three classes really, really well. Right. So can you lean out your network? Can you drop out areas of your of your model that you don't need that are just taking up time and compute essentially? Right. How much can you optimize your model from a I've got to run it on this hardware kind of problem, right? So you take out the latency, anything that adds latency, right? I know there's a there's a company out here in Sydney called Zeeland and they look at they look at basically on edge computing and delivering models that run extremely fast. There's another company called Drone Shield that's been looking a lot at defense technologies on essentially combating drones. It's not exactly the same thing, but they've been focusing on getting their models really tightly bound to their hardware, right, so that they can run on really specific hardware. There's even a lot of work going on on how [00:58:00] do you select, what kind of processes are you using? Maybe it's more efficient to run parts of this on an FPGA because you can preprogram certain actions that are significantly faster on an FPGA, right. So there's a lot of embedded, embedded programing knowledge that can really help there. Combine that with machine learning and that's a bit of a rare.

Speaker5: [00:58:22] Kind of skill bound, I guess, that some companies are trying to trying to solve for. So it really becomes a question of how do you how do you resource, what kind of model are you selecting? How do you how do you minimize the latency on the model, on the inference time? And then you can select, okay, this is what our hardware is going to have to look like to achieve 30 frames a second or 15 frames a second. Do I have to subsample because I don't care about 30 frames because it doesn't move that fast? Or do I need a much faster camera? Like if you're looking at the bottle caps of of coke bottles, we've all seen that footage, right? That was like 20 years ago. That thing is blazingly fast, right? It's not doing anything more than just saying there is red thing. There is not red thing. Right. It's a color detector and that can go crazy fast. That's why it's looking at bottle cap, bottle cap. No, bottle cap. It's there or it's not there. That's. That doesn't require the degree of compute that you would need to be able to say recyclable, not recyclable. You're processing different information. You're processing different. I mean, their or not, their question is quite simple, to be honest, right? From a programing standpoint, it's like, do I get a reflection back of this color?

Harpreet: [00:59:33] Yes.

Speaker5: [00:59:33] No. Cool. Right. Totally different ballgame. So once you once you figure that out, it's then a question of, okay, does it actually fit in my area? Right. Yeah. Make it a hot dog, not hot dog. We've moved a little bit beyond that in a lot of in a lot of these robotics applications, especially like this garbage disposal thing. It seems simple, but it's. It's great if you can get it working, but if it's not working fast enough, it's pointless because [01:00:00] your entire conveyor belt slows to a stop. Right. But let's think about it. It's not just about this. It's how do you get it to a point where you can actually see it? Right. You need to essentially be able to distribute all of all of those items in a relatively flat manner so they're not piled up on top of each other. Otherwise, you're in all sorts of trash, right? Literally. But basically, that's all mechanical work and robotics work where you're going, okay, how do I how do I distribute objects across a conveyor belt in a reasonable manner? That's what I'm going to do with deep learning. That's just good old fashioned process. Engineering and robotics. Engineering. Right. And then the next question essentially is, what kind of things am I looking for? Am I looking for color differences? Am I looking for material differences? Do I need some kind of x ray or some kind of other non visual non visual spectrum imaging to help me identify materials specifically? Can that be done remotely? What kind of distances? How close do I need to be to the material to actually detect what that material is? Right.

Speaker5: [01:01:08] We see a lot of this technology in in security, airport security. Right. Where they're able to detect specific materials going through a conveyor belt. Exactly the same thing. It's complex. It's expensive. Right. Can they afford that? Kind of imaging and then other deep learning networks that we've got actually suited to that kind of imaging. There's a bit of experimentation to be done there to understand whether that will actually help, because depending on what you're learning and the features and the way that they represent, it might not necessarily be useful. Most of the time, I'm sure there's a way to transform that problem into the space where it will actually work. But yeah, essentially when you're talking about you need to understand the cost of all of this, the cost of the image or the cost of the hardware that might [01:02:00] be going into doing this, the physical limitations. How close can you actually get to that conveyer belt? Like I've seen situations where we would be able to do a particular job, but we need to be able to get right underneath the object and look at it here. I was doing defect detection and medical devices. This is a few years ago now, four or five years ago. And basically the only way we could actually do it was pass in each device one by one, and get a robot that knows how to pick it up, to pick it up, turn it around under a series of weird and wonderful lights.

Speaker5: [01:02:36] Right. And we were putting like structured projections onto it and stuff like that to find out defects in the object. Right now, when you're doing that, it's suddenly you're slowed down to, hey, each and every part has to go through, pick up, turn, one, two, three, four, five, take ten images, different kinds of lighting all triggered. Right? How do you program that in and what is the effort you go into to figuring out the exact motions that you need to detect all of the issues with that one object before you have to do it all over again and do a different setup for a different object. If you have like millions of the same object coming through, you can program to do the same thing and you're great. It's a lot trickier when you're talking about something like trash collection where you don't know what's going to come through this billions and billions and billions of objects coming through. Right. So. How do you assess the cost and the like, the installation cost, the maintenance cost, the ongoing upkeep? It might work well for your proof of concept. How do you scale that to make sure it'll work in practice and actually improve their their efficiency and their effectiveness? That's a tricky question, right? It takes a lot of capital investment to try and iron out that problem. And it might fail. Right. These are quite expensive and tricky processes that are quite. Yeah. [01:04:00] It's not a solved problem. Right. So yeah. And we need to put that effort forward. Otherwise there's so much recycling that just.

Harpreet: [01:04:06] Gets.

Speaker5: [01:04:07] Dumped. Right. And there's people doing it. But yeah, those are some of the considerations go into it. So cost. Sweet.

Harpreet: [01:04:16] It's a lot different than trying to just train a model for a. I think something to do with this pretty much. It's not something so much close to appreciate that, Mark says. I wonder if that guy ever saw that engine sound machine learning problem a few months ago. I vaguely remember that guy. Come here. Asked that question as a kid. As a good one. This for context for other people. This this individual basically had this really hard audio machine learning problem where they had to essentially had a clip of audio for like 5 seconds and all of that 5 seconds. The classifieds, is the engine broken or correct? A super fascinating problem. So your discussion about kind of all these different classifying the challenges you have with that, it reminded me of that individual. So if that individual is listening, please come back. Keep us updated. Think Ben Taylor is supposed to help her with that. So she needs to see if there's any resolution.

Speaker5: [01:05:22] I remember I remember that conversation distinctly. That was a very interesting one. And again, riddled with all of these problems is what's the sensing modality that you're trying to use to solve that particular problem? Is audio the right way to go? Is that the only way to approach it? Maybe, maybe not. Is there other information you need? Potentially. How do you experiment with that? That's an expensive experiment to run, right? It's not I can't take a cut of a database and run it in a notebook for a day. You need to do the data collection around it, right? Yeah. It's it's tricky. I hope he did well, though.

Harpreet: [01:06:00] Yeah. [01:06:00] If anybody has any questions, do let me know I'm coming in. Looks like a slowed down on a few people watching. Anybody here in the room? Question. Do let me know. Tom. Good. See you, by the way.

Speaker6: [01:06:17] I confess, I got wrapped up in the YouTube link that Russell sent on a Rube Goldberg machine. I'm very distracted today.

Harpreet: [01:06:31] Sorry. I've been. I've been there before. Trust me. So does not look like there's any questions coming through. Mark, did you go to Odesk to get a chance to cover it up or. I was supposed to but the way those plane tickets work, they were absurdly expensive. It was like 1800 dollars for plane tickets and I did not account for that. I don't travel much, let alone internationally, so I graciously ask for a refund for that conference. I was super bummed out about that. Yeah. I see them more and more extensively, that's for sure, he said. Toronto tickets cost 600 bucks. Yeah, it's quite expensive as well. That was fun hanging out with Kiko last week. Can't believe a quick cruise around Toronto. Kiko going thrifting is fun. I posted it.

Speaker5: [01:07:27] So I have a question. And it might be it might be something that we can even do, like a flash around the room to see how. I'm just trying to get a gauge of where where I'm at and the challenges that I'm trying to solve compared to where other groups are at. Right. Model promotion and essentially quality testing of models. Right. How clear are processes there within the companies that you guys have worked out? Like how mature is that [01:08:00] process? Right. Like in an ideal world, this would be something where there's a very clear process and it's not even a process. It's a designed out system where you train the model and then it goes through certain testing and then it goes through some kind of user acceptance test or whatever and then gets promoted into production, right? How real is that dream on average, right? Because what I'm working on hard right now is developing processes around how we see our models, how we promote our models. Right. We have a process. We have ways of getting things into production. And I'm trying to improve that process so that you take the expertize requirement out of it. Right. It should be as safe as safe as releasing a a a back end API update or releasing a front end update. See Vin laughing. Yeah, right. Right. It's not that safe. But when you're talking about production scale, that should be that should be the goal. That should be the gold standard. Right. Is it should be that simple and that risk free that you can't accidentally promote things to abroad unless it's actually unless it's actually ready for it. So I'm just curious, maybe you whip around the room and say. How often a percentage do you actually see that dream held up?

Harpreet: [01:09:23] So Tom, I saw Tom had his hand up and then a little bit. And then maybe the kicker.

Speaker6: [01:09:30] Yes. Best I go first so then can say things I either said wrong or forgot to say. But Costa, I was with a very gifted, intelligent young man recently during my trip to Chicago, and I could tell he kind of wanted some entrance. So we went out to dinner the last night, just he and I and I. Tell me your fears. And he launched right in. And one thing I've discovered frequently, COSTA I'm wondering if you've seen [01:10:00] it, too. I'm sure Ben seen it. People are taught algorithms in school. Basically, they're taught how to be Karl Benz. The inventor of the automobile. But when they come out, their companies want them to maybe be Henry Ford. They want to mass produce those models. So the process is at least an. You look at these intelligent young students and they're like, What the hell? What do I do? I'm not Henry Ford. I'm Carl Benz. And so it's yeah. You know that guy that tells you to learn Python real good or real good so you can clean the data like just thinking through it or learn pandas really well. Unfortunately, I try to be tools agnostic, but it is about the tools, isn't it? It's like getting rolling up your sleeves and I had to clean this data, I got to automate this ingestion. If I hear ETL or LG again, I'm like, Oh God, would you stop that? We just got to automate the flow of the data, right? We've got to get it clean.

Speaker6: [01:11:05] And then a lot of times. You could maybe be sitting with one of these. Now, why do you want to get rid of linearity? Because really, it kind of helps the model be more accurate. Yes, it does. But there's this thing called parsimony. And we want to understand I know I've said this a million times on the show, the parade of feature importance and. So there's 80% of the value of what you can get as you go through a machine learning process. Is in the 80% that we do and not seen. The gold that's in that part of it is almost criminal. So costs of what I'd say is for a given domain, for a given set of tasks you're doing for a given company, if you get all the data geeks in the room and figure out what does [01:12:00] our data pipeline need to look like before we get to the prediction stage and get everybody on the same stage with that or the same page with that, then they're going to like light on fire and and see all these other things that could present in all these other stones they could turn over. But. In my experience, a lot of people don't get to that point easily. So I'm going to be quiet now and listen to other.

Harpreet: [01:12:31] And spoke to a Zen master, Michiko.

Speaker2: [01:12:38] You've asked, look, the Holy Grail question. How do you make sure your model doesn't suck before it gets into production? That's the that's the Holy Grail question, because I've said this before, but, you know, software works because you program it to do certain things. And at least when I coded, it works in every case without fail because I've got a great specification in every instance. But that's what software does. The problem is that models don't do that. Models handle intelligent processes. And so they're a lot like people. And when a person is given a really hard task, sometimes they succeed, sometimes they fail. That's the model. And so when you get to the question of what is model quality? That in and of itself is step one is before you even start this project. Do you have a definition of functional because it's not going to work, but do you have a threshold where it will handle a certain number of classes with a certain level of accuracy? And can you connect that to business value? At what point does it become no longer advantageous to continue to improve this because it's good enough and dropping 4%, 5% big deal. Or is this a hospital monitor? We're dropping four or 5% might be a big deal and that's really [01:14:00] where it begins. And so when you ask, how many times have I seen this? Zero, I've had to put something in at every client that I've worked with to handle quality and kind of like Tom was talking about, it's part of the pipeline and it is.

Speaker2: [01:14:14] It's something that you have to automate because the level of effort is insanity. You can't like you can't create test cases for a model. It's so much harder than that because the model, if you could create enough test cases, you would have a model. That's literally how hard it is to create, to manually create test cases, be easier just to create a model than it would be to create test cases to validate the model. And so you have to automate everything and you've got multiple life cycles, you've got a data lifecycle, and so you have to have quality built into that side of it. You have to validate things like your ontology and the metadata that's attached to the data. And so you've got like this intense quality process and that's just data that you've got a research lifecycle and you have to have now manual reviews for experiments. Some experiments like the machine learning definition of experimentation. It's kind of an automated thing. You don't need to have like a review board sign off on it, but when you go to actually doing rigorous scientific experiments, you need a review board and then you have that same quality step.

Speaker2: [01:15:22] When you present results, you have someone that has to validate Did your experiment actually do what you thought it was going to do and what everybody who reviewed it thought it was going to do. And so you're hearing like, this isn't just, you know, at the end when I ship the model quality, it is at every single one of the phases. And then if you work where Michael works, you are doing not only model quality assurance, you're doing like the old school quality assurance to. You know, so she's like in the worst possible situation where she's got to please both sides. The code has to work. It has to be able to scale. It has to be able to handle all the stuff that nobody [01:16:00] told you about in the specification. And you also have to make a model that works and that runs in production, and then you have to be able to maintain it. You have to be able to continuously improve it. And so quality at launch could be awesome. And then three months from now it might go to trash. And that might be because you caught bugs in data. You may have done subsequent experiments and found problems with what you did originally. And so now you need to release another. It's I mean, you're talking about like the question of questions.

Speaker2: [01:16:31] And so you're talking about quality assurance happening at four different phases. And no one in any of those phases is an expert in quality. And so you are just kind of jamming. This no one's ever done model quality like end to end before. Outside of Microsoft and Google and those companies. And so you're throwing something that no one's ever been taught how to do that everyone's figuring out on the fly. And you're putting it in the hands of people who are also responsible for building the stuff. So basically I am the fox guarding my own hen house and we're not even talking about the ethical considerations and all of the other legal regulatory types of compliance. It's just one of those. You've just asked me to write five books. It's not even like one book. And so if you're asking, like, is this is this functioning anywhere? No. The military DARPA has a multimillion dollar contract to figure this out. So it is I mean, this is you're asking a huge question. And so even companies that are aware that this is necessary who have that end to end understanding of the workflow and understand where to put quality at each place, are still trying to figure it out as they go because this isn't like that other thing. And so it is just no one's qualified to answer that question.

Speaker5: [01:17:54] So just to add some clarity, right. So in in the case that I've essentially been focusing [01:18:00] on over the last year or so, it's the good thing is that there's a strong understanding of what quality is good enough for the model, right? So it removed a little bit past that experimentation phase and now it's about the scaling and replication phase. Now we've got to apply the same model in a very similar context, tweaked to adjust to the specificity of certain situations. Right? And so we have an understanding of how good is good enough and what's the leeway on that in terms of, by the way, it's not very big leeway, right? These are very tightly bound models. So it's quite a it's quite a challenge. So the challenge to me is how do I turn it from okay, we managed to do that for ten of them or 20 of them or 30 of them. But we need to now do 100 of these a week, right? 102 hundred of these a week. And the and the funny thing is, it's you're absolutely right. Like you hit the nail on the head. It is this weird nexus of quality assurance thinking, regulatory thinking, process engineering and.

Speaker5: [01:19:02] Ml Knowledge, right? And it's really weird. When I was in a robotics company, I was doing mostly ML data science kind of work. And now that I'm an ML company, I'm actually doing more of the process engineering work that I learned back in robotics, right? So it's a bit of a weird nexus. So this is what I'm trying to understand is, is how, how well known is, is that kind of nexus, how well traversed is that nexus of, of process engineering meets? Ml How do you scale from yeah, you're going to need to manually QC Every single model that you're about to put out there because writing test insanity but. How do you scale that? Because I'm noticing there's a gap in the industry overall, right? The majority of people that have moved towards machine learning and towards artificial intelligence are people who have primarily focused on. Exactly. Like Tom said, trying to be, you know, the [01:20:00] the benz's of the world as opposed to the Fords of the world. Right. Where where are you going to find ml quality control. The specialist or even someone to work ML quality control you're paying.

Speaker2: [01:20:16] Robotics is even harder because you have safety guarantees like you have that one other piece that most models don't have. You have to have safety guarantees like your robot can't kill somebody. So you have one step further, you know, because models normally don't control these robotic arms that could slap someone to death. You don't have something. You're talking about a recommender engine. And so you have one more layer, which is the safety guarantee on top of that. So, you know, and when you talk about tight tolerances, yeah. Your tolerances are they're not. Yeah, yeah. Drive an ant through it if you're lucky. So that's.

Speaker5: [01:20:54] What.

Speaker2: [01:20:54] You're talking about that you're looking at industrial controls, you're looking at almost like a Lean Sigma or Six Sigma approach. You're looking at the process and controls, you're looking at automation, and then you're looking at all of the machine learning that sits on top of that. And so when you say, is there an expert in the space? I mean, you hear how much I know, and I'm nowhere near an expert. I don't know enough that I would say I could implement something like this when people's safety was in jeopardy. You know what I'm saying? I wouldn't trust myself and I know a lot, which is I think the danger is you're going to have a lot of people who will say, yeah, I could do this, but they don't. And so you get deep into it. You don't really understand how hard it is. So I don't I mean, as far as expertize, I think you're cobbling together a team that's more like a think tank and this is kind of end of the world type people. This is one of those things that government has been think tanks have been trying to drag government into this and to setting standards and to setting, [01:22:00] you know, these types of frameworks that we need in order to be successful here. And it doesn't exist.

Speaker5: [01:22:10] So just to add some flavor to that, I guess here's the other rub, right? So if you're talking the beauty of the robotics aspect of that kind of thing and now I'm generalizing outside of necessarily what I'm working on specifically and this is a bit more general is the beauty of that is that it's typically a slower development process. It takes longer for them to develop the mechanics of the electronics behind it. Right. So that's slowing down. But you still want to be improving your model and not slowing down the software. Listen to Uncle Bob software supposed to remain soft. You cannot play it down to the hard nature of hardware. Then it's not software anymore. It's hardware in code. Right. And if you want to live up to that principle of being able to still deliver the latest models but still be able to run that degree of tight QC on it. I'm really curious to hear actually Michael's thoughts on on how do you make that this trains run smoothly.

Harpreet: [01:23:06] So.

Speaker4: [01:23:07] Yeah, so.

Harpreet: [01:23:08] Let's do it. Let's hear from Akiko then. Go to market.

Speaker4: [01:23:12] And just to clarify, the original question was, are there examples of a model promotion and process out in the wild? That is not like in a white paper. The answer is yes. It does depend on how long a company has had ML around and like how many models that they've had to deploy. So like a MailChimp like. So we've we've had logging and like per model monitoring. For like a long time. And that has to do with the fact that like logging tools have been around like in Python for a long time. Right. But in terms of like there's three areas that [01:24:00] we're trying to develop right now. And to be honest, these are three areas that a lot of companies are trying to build up in and like agree with Vince Point, like we do have like a team that specifically focuses on that. It's a small team, but we're building them up. And the three areas are one making like checking data drift, right? Because most of our training is offline training. We don't do online training, which honestly lines up with most of the industry. It's probably like 5% of people that do online training. Maybe everyone else is offline, right? So that's an area that we're investing in. The second one is a human in the loop.

Speaker4: [01:24:41] That's right. Which is basically getting human laborers in there. Right. Well, human labor, human laborers and also evaluators. Citrix was very famous for having published blog posts about this for their style algorithms. But that's something that we're trying to build up in. That is partially because we're also trying to build the infrastructure like investing in like labeling tools investing in. Yes, monitoring. But it's not just like using like the elk stack, right, where we just toss like the frog metrics or the quick streams or whatever over into like Bafana and Prometheus or something. But like we're trying to figure out how do we get kind of more out of it because of like the silent errors that could happen. But in terms of the promotion process, part of it is also because like. So Legal gets involved in the beginning. Like, for example, they'll say, hey, these people have specifically said, do not use their data in the machine learning models. Right. So they're in there in the beginning and then throughout the actual process itself because like even though we have tools, it's still a little bit hand sort of crafted for each model in a way. [01:26:00] Like we have like checkpoints built in at every single point in time. Ideally something that we would love to have more of is doing more shadow testing engineering.

Speaker4: [01:26:10] We do have that for some models, but once again, like a lot of this is like we're sort of like building it. We're kind of building the plane as we're going. But I know, like a lot of companies do have some kind of promotion process, but it's kind of like independent, it's individual to the company. Right. And I think that's where it gets kind of tricky, is that. Yes, you know, we have more tools to do a lot of that stuff. But the kinds of business metrics or even like one legal needs to get involved. So for example, like when I was working at the medical company, Legal was always involved. Whereas here, like we're an email marketing company, so legal needs to get involved essentially like at the beginning. But for the most part, as long as like we have like GDPR and Ccpa being followed like in our data stores and data warehouse's legal sort of only comes back in when there is a problem. But like we do have, we definitely do have like a small promotion process. But one thing, it's a team, it's composed of both practices, infrastructure and tooling and also business buying that, that it's an important thing we need to have along with our data scientists.

Speaker5: [01:27:23] So could I could I kind of ask? Right. So this obviously this is a cross-functional team with multiple disciplines of expertize, right? And then you've got down to back down to kind of the engineering level of the problem is you've got ML engineers, you can't really take the ML knowledge out of the the the evaluation because it's very difficult to test these things. So you still need a human to kind of come in there and massage them. Okay, you're doing good or shut up, you're wrong. You're stupid model but retrain, right? So how like to me that comes down to tooling then because [01:28:00] until unless and until we can build up in the industry a number of people that are focused on QC with a little bit of ML knowledge, because right now we've only got, you know, the early stage designers and design engineers for ML, you're manufacturing QC. Ml engineers essentially don't exist as a as an entity in any massive way, shape or form. So failing that, I think until that industry warms up and those people are trained up to do that, and the only real solution is how do you develop tooling that makes it super light touch for me to either set up some kind of automation to reject models before and triage it to make sure that I'm only using the ones that actually need to QC and then the process for QC, it delivers the information to me in almost a one click, one touch kind of process. I'm not spending hours and hours digging into oh is the. Whatever, you know, all the parameters and metrics we're looking at still. Right. Okay. It comes down to the tooling to bring that efficiency for the moment, doesn't it?

Speaker4: [01:29:13] So I'm going to say my piece. And then I think Marc should jump in on this because I know he's he's had a lot of experience, right, in handling the engineering and data science, like translating and all that, because that's.

Harpreet: [01:29:29] Exactly what he's going to talk about.

Speaker4: [01:29:31] Right? Because I would argue that it's not. So let me rephrase this. It's not that it's not tolling, but it's that to try to isolate it to tooling. And not to try to decouple like it as a tooling problem away from a domain knowledge and. Understanding and empathy problem, I [01:30:00] think is a mistake because in general there is so much tooling out there. But what I do sometimes see missing and if you think about like so my team, for example, the in-law's team in a way the reason we exist. Yes, as we build infrastructure and platforms. But when you argue, when you start going, getting into what is the purpose of a platform, you could go the technical route, but some people could argue that internal tooling or a platform or a wrapper, a big part of that is actually just to make people's lives easier. Right? Like but but not just like. So it's not just a tooling problem, but at one point, does someone what what at what point are you making someone's life easier? Some people are okay with VMs. Other people are like, Yo, just give me a virtual environment to code in. So what is the point of easy versus the return on investment of making your life easier? But yeah, I'll turn it over to Mark because I think I.

Speaker5: [01:31:05] Find that question based off what you said. I want to refine that question to try and pick on Mark's brain in a more deeper manner. Right. Let's refine this at each step. So, okay, so it sounds like it's not just a tooling problem, it's also understanding where along your process understanding first of all, your entire end to end process of data coming in model getting experimented on developed queue seed released, monitored, fed back, relabeled, retrained. Right. It's understanding along that whole process, where are all the people whose lives you're making easier in that process? Right. And what do you provide as tooling? What do you provide as expertize? What do you provide as process improvements? So my my question to you, Mark, is when you're looking at a system like this, what's your percentage split on the actual delivery [01:32:00] software or tool at the end of the day for the actual model versus the internal tooling? Because right now it feels like we're spending a solid 90 to 95% of our effort on all of those internal tooling and infrastructure and just breaking down bottlenecks. That means that I'm spending 2 hours of an expert where, you know, I can make their life easier and it'd be 20 minutes for them. Right. And in a certain task. What's the like in your eyes and your experience? What's that distribution been of internal with external facing tools?

Harpreet: [01:32:38] Yeah, I would actually say neither. I think majority of my time I would say like 95% of my time is actually working on trying to build influence within the work for this. And the reason I bring this up, so for context, at the start up, I'm at I joined when I was about three years old. And so now it's five years old, been there about two years. My role was the first label data scientist. I say label because there's data scientists before or people with the data science work. I was the first labeled data scientist, and I think that's important because that was a shift in the company being like, we want to take data more seriously. And my manager, who's been with the company since the beginning, she easily data science work, but she has a psychology background. I work in HR tech, so applying ML to H.R, that's a very tricky space and that's a very serious space, especially because we do behavior change and really try to push people towards better, better behaviors in the workplace. And so on top of our minds, a lot is like how we create ethical machine learning and things like that. And what I'm trying to get at here is that like my role when hired was to bridge the gap between the researchers and the engineers. And so I do a lot of translating between the data science and psychology team and the engineering team. And [01:34:00] what's really hard about getting this data infrastructure is that engineers think about the now and, you know, how can we reduce risk now from the code base and scale the code base? Now for data science, I would argue that we're very future focus and it's hard for us to be reactionary because for example, I'm trying to build data infrastructure or have data quality.

Harpreet: [01:34:26] If that's crashing and burning, that's a major problem happening for data, for for engineering, less so because they have a whole system set up in place to handle bugs and and make things work. And so I would argue for the data science side, like it's hard to translate between engineering and data science for those pain points. And so this is like going really far back even before the ML component going to talent point. It really comes down to the data infrastructure. How can you get clean data and quality data that's aligned across the company and then use those data assets to build kind of these tools for machine learning and also machine learning observability and all these different things. Like if you don't have the data available to monitor your models, you're in for a bad time. And so I bring this all up is that to get infrastructure and to build these ML systems and raw data is a product rather than just a resource that spans across the entire company. And now that needs to be I'm like, I feel like I'm putting my bin hat on. So it's been learning his content all the time, but that becomes a strategic initiative for the company, and therefore you need to convince the entire company that this is worthwhile for machine learning and for doing kind of these data processes. And so it's easy to say like, oh yeah, machine learning opens up a lot, right? And people are really bought into that. But then you need to say.

Speaker5: [01:35:56] Well, we.

Harpreet: [01:35:56] Need a quality data warehouse to have machine [01:36:00] learning. Now that's a much trickier thing because now they're like, Well, do we need a data warehouse or do we need more features to actually sell the products? And, you know, you're talking about you need this data warehouse for this thing that's off for another couple of years before you kind of cash in on. And so it's this long term transformational project without the board during that time, people are leaving, coming in. You built up your champions. You're really like a salesperson within your company trying to build this. And so, one, you can't do this by yourself. You need a team of people and champions. And what I found is the bottleneck is not actually the tools and building things. The bottleneck is getting buy in from people, not outside the data team to be like, Yeah, we want to invest in this and follow this process with you. And they all have different timelines and priorities and to get that line across the entire organization is so freaking hard. I think just learning how to influence across the organization. I'm learning from my manager who's just exceptional at it. And I think what it comes down to is you have to be patient and you really have to identify who your champions are and serve to their needs and more importantly, why rail is actually just talk to our founder because I'm trying to build a use case for first data infrastructure and I was kind of like boiling the ocean. I was like, we need to have like this optimal data modeling for our data warehouse and like it was falling on deaf ears.

Harpreet: [01:37:26] I wasn't getting any traction. So I talked to the the founder who has a good high level shortage view of the company. And it became very clear I was focused on tactics and not. Strategy. I was focused on trying to build a data model, and that was going to be the thing. But the strategy was actually I want data accessibility throughout the whole entire org. So we're more data driven and ask more questions and therefore find more use cases for ML. That was the thing I'm really after. And so from there he was like, Great. Now they have [01:38:00] this use case, like what you're trying to strategically do and I agree with how do you find the first use case that would allow you to get that buy in from one champion? That's really important. Do it end to end. And then from there you replicate over and over again. So you get this for infrastructure. And so to go from like what we need model observability or QA, that's a very tactical thing. You need to go back and be like, why does the business need this type of QA? And then from there go back to what stakeholders all need to be aligned for that, and then from there, what individual projects can get those stakeholders aligned for this vision. And then you realize, Oh, this is going to take years, and that's where I find myself two years into this company.

Speaker4: [01:38:47] And to support Mark's point further right about like how it's not like or different people, I'm saying it's not doing problem, right? Like literally just look up like like model observability, monitoring tooling like chart or whatever or the ML ops tool chart. There's literally like 300 tools on that chart in like 20 different boxes. Like, it's not, it's not a tooling problem. And this is something that, like, I do feel like it's a little bit of a blind spot, especially with engineers because like and this is a struggle I've had with like very senior staff of engineers is that especially as the data science industry has been sort of just changing in terms of skills and talents, right. Like most data, scientists are not engineers. They just know that they can train a model, build it, and if they ship it, then they get a pat on the back gold star, get like a spot bonus or whatever, you know. But like and so some of the engineers, like, they really had a hard time with that where they're like, No, we need to, we need to make sure they know how to use Docker and Kubernetes more effectively. Did it. It's like, okay, great, we can keep doing that and fighting an uphill [01:40:00] battle. Or we can kind of like. Meet our business partners where they are.

Speaker4: [01:40:07] And sometimes and sometimes our business partners are the executive folks who are like, okay, if you need the tooling, like make the argument for it. But it has to be worth the, you know, the time of our engineers are going to work on it. It has to be work. The friction and the pain of switching to that tool if an existing one already exists and all that. And at the end of the day, like, you're still going to have to like any tool you bring in, you're still going to have to evangelize anyway. Like, I don't know about engineers in your area, but the engineers I know or they I sino if they don't want to use it, they're not going to use it. So if we don't do a demo, if we don't do like a walkthrough for them, there's going to be like, yeah, screw that monitoring. Yeah, we'll just go around it. But, but the thing is like no one, for example, wants to have a data breach. No one wants to be known as a racist company like these are no one wants to get hit by the FCC like so these are real risks when you don't implement monitoring and observability or for example, things break because like you have an air flow job that's constantly sending you bad messages.

Speaker4: [01:41:11] And then in the meantime, you don't realize that 50 of your models stop working in production. Right? That's a huge, huge risk. Or that we're still using data of people who have opted out of GDPR. Like in the best way is a lot of these are measurable risks. You can look at another company of the same size. It had a data breach and go, Huh? It just cost them like $60 million in lawsuits. Yeah, that's pretty expensive. So from a tooling perspective, it's almost like we're in this magical space of ML ops. Or you could just kind of pick and choose your own adventure and tooling, but being able to like bring in the being able to bring it in, in the, with the right team, with the right reasoning, getting that buy in for it. That is just and also like having the empathy [01:42:00] to understand like what is it you really need versus like what is it that you want? Because a lot of times engineers bring in stuff they want, but they don't. They bring in what their business partners actually need. Right. That's like that I think is really those are like the hard nitty nitty gritty questions for tooling. I can give you like ten examples of labeling or ten examples of observability. Right. But that's not that's the challenging part.

Speaker5: [01:42:24] I guess my big takeaway from what you guys have said so far, and I know Vin's got something else to add to this I'm curious to hear, but so far, my big takeaway is. Tooling is a very small part of the problem. The tooling exists, right? Choosing the tooling that's appropriate for the pain points that we're trying to solve. And and to be clear, we're not dealing with ethical questions in the use case that I'm here at. Right. We're not really talking about ethical issues around this or significant like human risk and things like that. So it's a relatively safe area, but there's still a degree of quality that's required. And how do you develop that at scale and its speed? Right. So it's the problem is more around how do I do the quality that we've been achieving. We've got certain tooling for that and we've got certain processes in place. But now we're finding that it's more than just that we need the right tooling with the right resourcing, with the right people, with the right talents and the right mentalities to come in to bring specific business values. Like what I'm hearing from this is I should take a step back and reassess from a high level business perspective what is the real pain point around quality at scale? And then consider what my levers are in terms of resourcing or in terms of process, in terms of tooling, what are the different levers that I can pull that actually makes sense to tackle the bigger problem at hand in general? So I think I think take a step back. I think, yeah, like you do get kind of bogged into figuring [01:44:00] out this problem and then you've got to step back a little bit from time to time. Cool. Thank you so much. Then I think you just want to.

Harpreet: [01:44:08] Add, though. Yeah, just.

Speaker2: [01:44:11] Yeah. Just the only time I think I've ever disagreed with Mexico is this is a tooling problem. If the tooling that was out there was sufficient, the largest companies would not be building their own stuff. And they all are. And the reason why is because the MLPs community has a whole lot of tools that don't work. You're going to I mean, you're going to get to a point where you go from a level two or a level three maturity to a level for maturity. When you start doing human machine teaming, when you begin to deal with reliability explainability and when you start building reliability requirements, that's when you begin to realize that you need to change the entire basically the entire life cycle in order to meet the actual requirements for reliability. There are definitely simple use cases and that's where pretty much everybody starts and that's where the majority of ML ops tools are aimed at, is that 80% of data science and machine learning use cases. But now as you begin to really examine reliability and the road down quality is the one that takes you here, you begin to realize you have to revisit everything that you've been doing from a data science perspective, and you have to change the methodologies and you have to change the workflow. You have to introduce more rigorous methodologies to handle high end use cases.

Speaker2: [01:45:38] And for some products you never get there. But the majority of high value use cases in machine learning, you eventually get to the point where you have to you have to actually meet reliability requirements at a higher level than people are used to. And when you begin to go down the ML Ops road and the monitoring and all of the tooling that we have right now, that's [01:46:00] like I said, it's sufficient for about a level two, level three maturity. Then you hit reliability and you begin to have there's another level on the other side of it. And that's why there are so many companies now who are rebuilding their ML ops and building custom tools that companies say, Yeah, I've got a feature store that works and LinkedIn goes now to build our own, and there's just case after case after case of companies who say, Yeah, we got the solution. And you look at more mature machine learning shops and they go, well. So that's it's an evolutionary process you'll get from level one to level two to level three. But realize if you quickly go to reliability requirements, if for whatever reason you end up getting accelerated, expect everything to change and the tools that are out there will become a problem.

Speaker5: [01:46:57] We're I'm seeing this and I kind of see both sides of it. Right. And it's not so much a question of getting tooling to replace domain expertize. It's more about how do I how do I leverage tooling for my specific use case, for my specific requirements to amplify domain expertize, to speed up and to speed up on delivery time, to speed up the and to maximize our quality. Right. And you're right, there are 300 billion different platforms and feature stores and tools out there in this space. So I guess I'm going to bring all of this back to a question for Harpreet. You're the resident expert in tooling and envelope's tooling here at this point, and I'm curious to know if you had to put a number on it. Right. What what I'm seeing is the difference between tooling in in ML versus the tooling and other technology spaces. Is market consolidation on tooling, right. We've seen that in [01:48:00] cloud computing. There's been significant market consolidation work. There are three major cloud providers out there. How many years out are we do reckon finger in the air. I'm not going to hold you to it. Right. How many years out are we before we start seeing some serious market consolidation in the ML tooling space?

Harpreet: [01:48:22] So most of these companies never are. So there's maybe 20, 30 million in funding. And the number of customers they have, you'd be surprised. It's not like 100. It's in the ten. Mid-teens middling times for a lot of companies I've talked to. And the contract has been huge. Right. So I don't know, man. It might be some of these companies, 60, 70 people are making enough money to cover salaries. How much burden are you going to have? Right. It's going to three years, I think. Right. It's just been a ton of money being pumped a lot less than. Year to year. And now with this downturn that's happening, we may start seeing some of these companies, I think. Well, I'm having trouble raising the next round or having to go for a down round or something. I don't know if it's answering a question or not.

Speaker5: [01:49:23] Maybe. Maybe that's that's actually in the most Machiavellian possible way. A really good silver lining of this, right. If all the companies that aren't able to make a convincing case start to have trouble. Right. There's going to be people looking for roles in that space, leveraging their expertize. So you're going to get a natural kind of economic driver towards towards, you know, a more collective effort on this front. Like, I know we're working with one of the we're working with a particular labeling platform. And I'm like, we I'm constantly giving them back ideas on, Hey guys, we need these features in there. This [01:50:00] is what we look at when we're talking about scale. And they're learning and developing their software based on feedback from us, plus a handful of other high volume customers right now. The interesting thing is in the labeling space as well, you see, I mean, labeling is just a part of this envelope's tooling thing, right in the labeling space, seeing the same kind of thing. So I wonder if there's going to be economic factors that drive us with like recession coming up ahead. Right. I call it Bell the cat, if you will. But maybe that's a positive driving factor in that. There's companies that can't make the next round of funding and other companies that do. And then the consolidation of investment goes into one more secure push. And then suddenly you've got, you know, a company of 100 absorbing two or three other companies of 20 each, and then suddenly their capabilities increase. Suddenly their contribution.

Harpreet: [01:50:53] To the space has increased.

Speaker5: [01:50:54] And you get this natural market consolidation and it'll come out of popularity of platform use at the end of the day. So it's almost.

Harpreet: [01:51:00] Like what is ease of platform? Some of these products have horrible experience. I go to a lot of tools and step three, step four, I'm getting error and following everything to the tee, right? Like none of that happens to me. I'm like, what is the alternative? Because this is not going to kind of have that things to get done, right? So immediately start looking for the next alternative. So it's going to come down to a developer experience. I've got my Devereaux glasses on, but I think the companies that could do that part right are the ones that are going to. How to live the next great. You've got to have. You have to go over to whatever part you're building with the end user in mind. You have to you have to make it easy for people to use. You have to show them the time to value as quick as possible during their onboarding experience. Otherwise, they just go through the next tool. Like ML, flow is great as an open source tool. It's amazing. Gets the job [01:52:00] done. There's a lot of overhead that goes to setting that up, right. Same could be said with any other tool that's being allowed space, whether it's a future store or data versioning or something that's like a orchestration tool.

Harpreet: [01:52:17] Kind of my $0.02 there from from art. Yeah. As I say, another thing to kind of check out this kind of a starting point is and just like to check periodically is in the US the FDA how they're thinking about operationalizing A.I. within health care. The reason why is that they are, as a government agency, connecting with both researchers, other government agencies and industry to figure out what the best practices are, to create regulation, to protect individuals for this exact use case you're talking about. And they're going to be saying the guidelines eventually that everyone else is going to follow, at least in the health care industry. And so FDA is one of them. But like, what are some other government institutions throughout the world that are thinking about this problem for high risk areas? Because they are going to have a lot of great documentation and starting sources right now for the FDA, they have a general guideline and more so called industry to be like, look, this is a problem. We don't have it figured out, but you all need to help us. And so there will be a great available typically come out white papers every every year or so talking about that progress and what they're thinking of.

Speaker5: [01:53:31] Yeah, totally. There's this there's this immense pressure on essentially the like that initiative, right? Like we're we're quite across the FDA regulations and in Australia TGA. Right, that's our equivalent of the FDA across regulations. Both TG and FDA are on device regulation, software development lifecycle regulations that come out of it. There are ISO standards on everything except for AML. Right. And it is very much a guys we [01:54:00] are curious the regulations basically like, hey, how do we do this? We've got some fundamental concerns and make sure you address those concerns, I guess. But, you know, it's there's a there's a pressure on the industry to to consolidate our understanding. And I think we're going to see that more in in the next 12 months than we have in the last five. I think that's what my gut is telling me. And it's got me very scared and very excited at the same time.

Harpreet: [01:54:32] Great discussion, coach. Thank you so much. Kicking off. Any final words on that topic? Same thing. Do you ever know? Going once, twice, three times does not look like it. Thanks so much for joining. Be sure to tune in to the podcast. The Science Podcast did an episode of Today with Dr. Laura Pence. She is one of the chief people at the Smart Race, the actual smart based company, chief wellness officer, something like that. We had a great conversation on resilience. I'm about to run out of brand new episodes. So it's been it's been six months since I've recorded a podcast episode. Six and a half. Seven months. So the well is got to dry up and we release some goodies from the from the backlog. I know 230 episodes at this point, so a lot of content out there. But we'll be back in my actual studio soon. I hope. I hope you guys will be back. Y'all take care and give us the rest the evening. I'm not trying to be Spartan race. I am not trying to do that. That sounds like sounds fun. One of these days, you all take care of the rest of the evening. Remember, you got one lesson. Plan it when I talk to the.