HH65-21-01-2022_mixdown.mp3 Harpreet: [00:00:09] What's up, everybody, welcome, welcome to the artists, the Data science, happy hour. It is a Friday, January 20. First, it is Data Science. Happy hour number sixty five. I feel like every few weeks now, we just hit another milestone milestone after milestone after milestone. Super excited to have all these guys here. Hopefully, you guys got a chance to tune in to the podcast episode that was released today with Dr. Joe Perez. Talked about, talked about a lot of stuff, man. So hopefully guys tune in and check it out. Talk about how to bring your Speaker2: [00:00:40] Story to life, Harpreet: [00:00:42] Your Data story to life and make it actionable. So definitely check that out. Hopefully, you AIs got a chance to tune into the comet office hours that we did on Wednesday. This week, we covered pretty much the importance of baseline models, did a quick presentation about that, as well as the accompanying blog post, and had a awesome community member come in and share her experience Speaker2: [00:01:06] In, Harpreet: [00:01:09] You know what it's like for her to to build models in her workflow and her process. So it's always fun to see how different folks do different things to hopefully get a chance to tune into that. It's available on my YouTube, also on the Comet YouTube as well, so definitely check that out. Shout out to everybody Speaker2: [00:01:24] In the building. Harpreet: [00:01:26] Dave Langurs Back to Earth, Darth Linger. It's good to see you again, Dave. It's been quite some time. Dave, what's going on, Russell? What's going on? Kanji is in the building. Eric Sims, Matt Damon and of course, last but Speaker2: [00:01:39] Not least, the legendary Harpreet: [00:01:40] Ben Vicious. The super excited. Have all the guys here, man. It feels like we get the old gang back together, man. Just give me flashbacks to a, you know, well over a year ago, man, get to see all you guys here. So. Yes. Let me open this question, man. I've been trying to find places [00:02:00] Speaker2: [00:02:00] To get Harpreet: [00:02:01] More information about data science and machine learning. I know all of us tend to hang out on LinkedIn. That's how we all know each other. That's where we usually go for for knowledge and information. But where else is Speaker2: [00:02:12] There Harpreet: [00:02:13] That we can go to ask questions? Maybe we want to ask questions pseudonymous or maybe ask questions, you know, in a forum where it's not as as public, so to speak. What are some other awesome communities that that, you Speaker2: [00:02:30] Know, I mean Harpreet: [00:02:31] About community. I mean, just like other places like whether it's, you know, different Speaker2: [00:02:35] Communities, whether it's Harpreet: [00:02:36] Different websites, Stack Overflow, Reddit, Quora, things like that. Where do you guys go for four or four awesome information and help when you need it? They linger. Let's listen. Let's let's go with you, man. Also, I have no idea how happy I am to see you again. Speaker3: [00:02:53] Then this probably won't surprise anybody. I'm kind of an antisocial person. You know, I like computers more than humans, generally speaking. So the good man, that's that's where I go. I really don't know of like any sort of like community that you're describing where it's not public, right? You can go to Stack Overflow and the associated websites and ask questions there. But I usually just Google and I find usually what I find, what I need. Generally speaking, so I'm a bad person to ask. Harpreet: [00:03:24] Yeah, well, that's the mark of a very expert Googler, because it's very hard sometimes for me to find the answers that I'm looking for. But yeah, like I was spending some time on on stack exchange. There's so many different parts to stack exchange, right? There's Stack Overflow. That's like the overarching sorry site, but there's one that was called like cross validated or data science. And there's like the Data sign stack exchange. And it was just it's crickets, man. People are asking questions Speaker2: [00:03:50] Regularly, but these questions Harpreet: [00:03:52] Go so unanswered. And I'm just like, Oh man, I wonder if that's Speaker2: [00:03:55] Is that Harpreet: [00:03:56] Someplace where I could step in, maybe help people and provide [00:04:00] some value give advice. You know, I tend to like answering more technical questions and getting kind of tired of the how do I break into data science type of questions? That's another story for a different day. But Eric Eric Riddick, good to see you here, my friend. Nice setup, by the way. Where do you go, man? Where do you go for, um, for for information kind of kind of like, you know, to get help or else do you hang out on on the web when it comes to data science and machine learning stuff? By the way, if you guys do have questions, feel free to drop them in the chat or in the comment section on LinkedIn, on YouTube, wherever it is that you're watching. Speaker4: [00:04:37] I mean, since since you asked, I should say I Speaker2: [00:04:39] Haven't figured it out, but also guys, Speaker3: [00:04:41] Look, you like this microphone and arm? It's Harpreet Sahota. Yeah, I don't know. I ask around at work. Speaker2: [00:04:50] We have a Speaker4: [00:04:51] Band. There's actually some pretty legit, really friendly guys and women who who are pretty helpful. Speaker3: [00:04:58] But sometimes I just get stuck on like systems questions. And that's that's when I feel like I really don't know where to go. Like, for example, you do stuff with comet and like, like, you'd probably be the guy to ask about, like experiment tracking and managing like model metadata and stuff. But actually, I feel like that like that stuff about actually putting models in production still is something only like a very small sliver of like these people in my network are talking about and thinking about. And so I feel like when it comes time to build systems, which is my job, I don't really have a lot of resources, so. Harpreet: [00:05:31] And awesome, thanks for thanks for sharing that. I'm curious, like what what is like a typical systems type of question, if that makes if that question makes sense. Speaker3: [00:05:43] Yeah, good question. Yeah, like I could ask you guys, like, how do you like, what's your feature story? You know, like Speaker2: [00:05:50] How how are Speaker3: [00:05:51] You guys working on a feature story? Some people use ElasticSearch, you know, ask, which is nice because it has like a built in vector similarity search. And that's the [00:06:00] way a lot of people do recommendations like recommender systems. So you could use a data lake or another question is like, how do you guys get your inferences available to the rest of your company? Speaker2: [00:06:09] Like, like you might Speaker3: [00:06:10] Develop models and make recommendations about something. But does that end up in other software like do you guys have like like an API that people did a gateway? I don't know. So things like that? Harpreet: [00:06:22] Awesome. Thank you. Yeah, thanks for providing that additional context. There is the trinity of Eric's in the building, so I hear we've got the three Eric's Eric Gitonga, Eric and Eric Sims, Eric Sims. Speaker2: [00:06:36] Where do you hang out for Harpreet: [00:06:38] For other questions besides like LinkedIn? And then let's go to Ken after that, Kenji, then then Vinicius. Then then I'm just waiting for questions to come in. So if you guys got questions, please do let me know. So go for it, Eric. Speaker5: [00:06:54] Yeah, so like Dave, I start with Google and then just go from there and I found myself recently, I found some really helpful answers on Quora because I was looking for some, like I was looking for recommender system project ideas, something that would help me connect with like an interesting Data or something, just that I hadn't considered before. Speaker2: [00:07:16] And it was a really Speaker5: [00:07:17] Smart guy there who, like, listed out some data sets I'd never heard of. And so it was just like a really helpful, well thought out answer Speaker2: [00:07:24] That I wouldn't ever really see a Speaker5: [00:07:26] Post on anything like that. So I don't go to Quora first, but I end up there once in a while. The other place that I sometimes go to is ResearchGate. When I Google Things and ResearchGate is like consistently way over my head. I wish, you know, one of these days I will have better math chops and I will get it. But usually I just get on there. It's like I came to the wrong party and I try to back out slowly. But it's definitely I mean, it's a wealth of knowledge as long as I can actually consume it. So there are probably a couple that come to mind. Speaker2: [00:07:59] Just curious about [00:08:00] the recommender system, like, what's your Harpreet: [00:08:03] Project kind of looking like? What's your ideation for that project? Speaker5: [00:08:08] Yeah, so I would I found like from that from that project, I found like like like a dating website Data set. So I want to make a silly recommender engine for Valentine's Day. I don't know if I'll have something ready in time, but it was like, you know, a fun, like, collaborative filtering thing that I could use or practice with. But what I what I really want to do is create one that leverages diversity and serendipity to help us to help avoid polarization in like search search results, because that's what happens so readily now. And so how can we do something? The way I think of it is like, can we make a recommender system that makes this better instead of just making giving us more of what we want? Harpreet: [00:08:50] Yeah, no. You share that you shared a research paper along the the lines of that, that injecting serendipity into the recommendation systems. I thought that was super fascinating by any chance might if you want to collaborate on that. Hit me up, man. Let's let's let's do something. I think they'll be a lot of fun. Kenji, go for it. Speaker4: [00:09:14] So I have a lot of thoughts on this. This is something I am constantly asked and and I also think I think about. So the first, obviously for me is Google. Like everyone else, I Speaker2: [00:09:28] Google everything immediately. Speaker3: [00:09:30] But I've seen a lot of people Speaker2: [00:09:32] Have success with YouTube Speaker4: [00:09:34] As well. So YouTube is like a a little bit more intimate question asking platform. Not everyone's going to read it. It's not like a Reddit style thread where things get get. Super well-publicized, Speaker3: [00:09:49] If you have a personal Speaker4: [00:09:50] Question, you ask it and the the creator is engaged in their community, a lot of the times you'll get a pretty good and detailed answer there. Also [00:10:00] Discord servers or Slack channels. [00:10:04] Have been there Speaker4: [00:10:08] More recently, I think, may be. And he's also it sounds like that community is really strong and a great place to ask relevant questions. Speaker2: [00:10:18] I also Speaker4: [00:10:20] Like Kaggle and GitHub, Speaker2: [00:10:22] So if you're Speaker4: [00:10:23] Having an issue with a certain library, there's literally a systems in place for you to ask questions about it. And there's also places where other people are having problems with that same library. You'll be able to to evaluate it and understand it better. You might find exactly what you're looking for, right? And then Kaggle, it's literally a place for a data scientist. They have a forum type architecture as well. Speaker2: [00:10:46] It's perfect for Speaker4: [00:10:48] For asking the right types of questions there. So I don't know if that's exactly exhaustive, but hopefully that's a little bit more outside of the box Speaker2: [00:10:56] And people might find Speaker4: [00:10:58] Something that fits their specific question style across one of those categories. Harpreet: [00:11:05] Awesome, Ken. Thank you so much. Yeah, shout out to Dimitrios with the ops community. That thing is this bump, and it's huge. Lot of stuff going on there. The thing with with like Slack Communities and Discord communities, at least for me, I I just feel like I can't keep up, man. Like it's, you know, it's extremely difficult to keep up on things. I guess that, Speaker2: [00:11:29] You know, we got that, that Charlie Harpreet: [00:11:31] Data thing, which I tried to be a part of, but then just got left in the dust because there's the movement so quick, right? But yeah, yeah, that's the challenging Speaker2: [00:11:39] Part to Harpreet: [00:11:40] Me, at least about Slack Communities, and Discord is just the pace at which it moves. I find myself being more of a fan of forums Speaker2: [00:11:49] Where the questions are more searchable and Harpreet: [00:11:54] Indexed a little bit better, and they they show up on search engines and stuff as well. Then what about you? And then after Venn will go to? [00:12:00] We'll go to Eric's question about sequel, and then if anybody Speaker2: [00:12:05] Else watching on Harpreet: [00:12:06] Linkedin or here in the Speaker2: [00:12:07] Chat has a Harpreet: [00:12:08] Question, please let me know I'll add you to the queue. Also, just quick shout out to some of the people that joined Speaker2: [00:12:13] Since we started Harpreet: [00:12:15] Jennifer Nardin. What's going on? Marina is in the building to Becky Goes here as well. Kristen, haven't seen you in a long while, Kristen. I could see you and Matt Diamond then go for it. Speaker4: [00:12:26] I'm going to give the old man answer, because I think everybody's covered the really good communities as you go in your Speaker2: [00:12:32] Career and you meet people Speaker4: [00:12:33] That are way smarter than you. Stay in touch with them and start building communities like little private communities because the smartest people on Earth will spend, you know, a couple of minutes here and there answering questions. But I found and stumbled my way into these little groups Speaker2: [00:12:52] Where there's, Speaker4: [00:12:53] You know, it's between five and 10. Some of them are a little bigger than that. You'll meet up about once a month, but everybody is and a resource to everybody else. And the questions that get Speaker2: [00:13:04] Asked, Speaker4: [00:13:06] Like I've learned more from watching other people ask really smart questions and seeing the answers than I ever have from my own questions. I mean, it's a broader approach when you have a little, you know, a little group like that. And as long as everybody is pulling their own weight and able to contribute and help in one way shape or form answer questions those groups. It takes a while to build them up. But those groups are invaluable because everyone actually kind of jumps in and jumps on questions. It's almost like one of these, but you have access to it by email or text or slack or whatever. So build these up. Speaker2: [00:13:43] And if Speaker4: [00:13:45] You can curate one, if you can be like the person that Speaker2: [00:13:48] Brings everybody Speaker4: [00:13:49] Together and starts introducing new people to the group and everything like that, you become such an invaluable person because when you get to a certain level of question, it's [00:14:00] like, where are you going to go to ask that? Mm hmm. You know, and so you have to have a different type of group if you want to get any sort of answer that's comprehensive and not, you know, oh, that's a stupid question. Or the the default five answers that you already knew, tried and didn't work. So build up these groups for yourselves. They're they're valuable and try to join one. You know, if you see people meeting on a regular basis, say, Hey, can I get it on that? Harpreet: [00:14:27] It's like we got going on here at the Data Science happy hours. I think that's one thing. I feel like that done a great job that was was kind of building this this space every Friday for us to get together. But I haven't had that success translate over Speaker2: [00:14:40] Into my Harpreet: [00:14:42] My own Slack channel for for the for the @ArtistsOfData science that it's crickets. I'm wondering what I could do to make that a better Speaker2: [00:14:50] Place, a place where Harpreet: [00:14:51] People can come with with more questions. You know, tons of people join every week, but nobody's I feel like utilizing that resource can go for it. Speaker4: [00:14:59] So that's something that I'm always thinking about is you create these communities and you sort of need a catalyst a lot of the time you need people that are going to be engaging consistently. That person can be yourself. But it might be some self-selection. But I think that empowering people who are active or early active to be more engaged, to be administrators or to to have more involvement and to think creatively Speaker2: [00:15:26] About Speaker4: [00:15:28] About how to improve Speaker2: [00:15:29] The quality Speaker4: [00:15:30] Of the forum is is awesome. I mean, that's something I've seen with the sixty six days of Data discord is that there are a couple of admin there that take it really seriously. They're super involved and I try my best to like, give them opportunities to put their own spin on things like a couple of them have made like bots for the server that like, removes spam. They do really incredible stuff that I have absolutely no clue how to do, and every one of those is incredibly cool, right? So the idea is that the [00:16:00] more autonomy you give them, the more control, the more they feel like it's theirs, the more engagement they're going to. Drum up and other people are going to see that and want to be more involved as well. There's also some really cool ways on Discord to gamify it. So if people comment they they raise their level and things like that, which I think is so cool for some reason, I'm not even the highest strength in my own server, which is which is wild, right? But again, I don't know as much about the Slack Speaker2: [00:16:32] Platform, but that is Speaker4: [00:16:33] One of the reasons why something like a discord, which is very developer friendly. Could be a good option as well. Yeah. Harpreet: [00:16:43] I didn't even know you had any discord for sixty six days of Data can go ahead and drop a link to that so people can join that. I'll be happy to join that. I think like, like I mentioned, one thing I want to do is just, I feel like I got a lot of like, Speaker2: [00:16:54] You know, knowledge with Harpreet: [00:16:55] Machine learning and stats that I do not get to share as often as I would like to. So. I just need an outlet for that, need an outlet for that. But thank you so much. Shout out to Dylan in the house doing good to see you again. Eric, let's jump to your question and then anybody joining in on LinkedIn. If you guys got questions, drop your question right there in the chat. Or if you're watching, smash that like wherever you are. Speaker5: [00:17:19] So this afternoon, my boss messaged me and said that they'd like to update our interview process, and one of the things they want to include is a sequel assessment, like when I was interviewing for it, I had like a take home project, but I didn't have a sequel assessment as part of it. And so it's the setup for it. We have credibility. I guess as a platform, I've never heard of it, but it's just one of the, you know, you can live life code on it. The idea for the Speaker2: [00:17:49] Interview is that it would Speaker5: [00:17:51] Be a 30 minute call with the candidate, so I would actually be talking to them and Speaker2: [00:17:57] Then having them answer Speaker5: [00:17:59] Work through any [00:18:00] any number of questions for roughly twenty five of those minutes and assessing whatever it is that we feel like should be assessed. And so I wanted to hear your thoughts on what would be what would you do if you were trying to put that together? Harpreet: [00:18:17] I want to go to Dave Lingam for this one. He's laughing, he knew I was going to come to him. Speaker3: [00:18:23] I'm not I'm not laughing because I thought you were going to me, I'm laughing because I hope you wouldn't. Oh, it's going to be hard for me not to go into a diatribe around the ridiculousness of interview processes these days. So I'm going to try and keep that on the side, Eric, as much as I can. So first up, what kind of role are you talking about? Because the kind of sequel that you want to test is going to be highly determined on the role. Speaker2: [00:18:52] So, for example. Speaker5: [00:18:53] Great question. Speaker3: [00:18:54] Don't ask Data analysts about indexing. I mean, come on, really, that's a dba kind of job for first and foremost, right? Maybe ask them about how they might use an index to make the query faster, but not like it's a clustered index or something like that. It just is making any sense. So that would be the first thing I would do is like, what's the role? So fill me in on that. Speaker5: [00:19:14] My job, essentially. So, analysts, analysts, stuff, not DVA. I don't even know closer to indexing is so not going there. Speaker3: [00:19:21] Ok, sweet. See, you prove my point. You prove my point, right? I would, you know, if you guys are familiar with me, you know, I'm a big fan of of principle. 80 20 rule, right? There is a 20 percent selection of SQL Data analysts and Data focus. People use about 80 percent of the time. So it's going to be the normal kind of stuff, right? Can you do? Can you use window functions? Can you use case when within, you know, inside of Group AIs and that sort of thing to actually create interesting transformations of the raw tabular data into something that's a little bit more aggregated and useful for data analytics. That's [00:20:00] that's typically where I would start. In terms of like the surface area of kinds of SQL, you would do, and maybe it's as simple as, I don't know this platform that you're talking about, maybe it's as simple as having a couple of different tables that you show them through screen Speaker2: [00:20:15] Share and then say, OK, Speaker3: [00:20:17] Join these up, and I want you to be able to create like date based count features or very, very good. That's a very, very good Speaker2: [00:20:25] Indication that somebody in my Speaker3: [00:20:27] Experience has enough SQL knowledge to actually do, you know, has enough SQL knowledge in the data analytic space. So that's kind of what I would do. I would start with simple tables, ask them like an 80 to 20 kind of kind of question and see if they can work their way through it. Speaker5: [00:20:43] Oh, maybe if if you don't if anybody doesn't have something else to add or if you do great. But if you have examples of good or horrible assessments, I mean, I have had one online skill assessment and it was terrible. The thing that was most stressful to me about it was that it was not live. I didn't have a person that I could talk to. It was just like, you either get this or you don't get this and you're on your own. It just didn't. I didn't like that feeling. But you know, if I'm going to be one administering it, I can. I take it on myself to be a human being, talking to a human being and try and help set them at ease because that's what I can do. So, yeah, good or bad examples are also appreciated. Speaker3: [00:21:27] Yeah. So I resonated with Vince comment earlier about being the old man. I think I'm probably the oldest person on the call, Speaker2: [00:21:35] So I'm going to give you Speaker3: [00:21:36] An old man answer perspective. Try to simulate as much as you can a whiteboard interview experience using a virtual technology. That would be my advice, which is, of course, as you indicated, would be live, right? Show them a table on a screen, walk them through it. They're going to get stuck. That's OK. I don't expect people myself to. I don't expect them to know the syntax like backwards [00:22:00] and forwards. I mean, this idea that somebody has spent four hundred hours on leap code studying SQL for one interview is just insane. It doesn't Speaker2: [00:22:08] Matter in the end, because once they Speaker3: [00:22:10] Land the job, they're going to Google the syntax anyway. Speaker2: [00:22:11] It doesn't matter. So what's more Speaker3: [00:22:13] Important is understanding what is their basic level of understanding of the core concepts, and if they screw up a little bit of the syntax, who cares, right? But you can only evaluate that live. Harpreet: [00:22:24] Is is kind of the it's I don't know if this is a good way of thinking about the question. Speaker2: [00:22:30] I guess my question Harpreet: [00:22:31] Is, is this a good way of thinking about it? Is like, what's like the most reasonably challenging question you could ask that would be indicative of other skills or indicative of, like, does that kind of make sense like indicative of Speaker2: [00:22:48] Lower level base level knowledge, I guess. Speaker5: [00:22:53] Acts like if one thought that I was wondering was like like speeches, for example, like, do I really have time or like the complexity to set something up where it'd be reasonable for someone to need to put something together like that? It's something useful. It's something that I use frequently. But like, I don't really know that there's necessarily in 25 minutes. I don't really know that I got time for something like that. So how do I assess it otherwise, you know, through conversation or whatever? Speaker3: [00:23:23] The example that I gave Speaker2: [00:23:24] And I've interviewed a lot of Speaker3: [00:23:26] People in my time, that's why there's all this gray in my beard. For SQL count count based features using like using a date column is like the one of the single best things I've ever found to actually ascertain someone's general level of skill knowledge for pulling data out of a relational database and transforming it into something that's useful for analytics. Because you've got to hit, you've got to do a group by, you need a case when you need to work with Daytimes, all at the same time. So it's a reasonable like it's a Speaker2: [00:23:57] Reasonable one problem kind of thing that [00:24:00] assesses Speaker3: [00:24:00] A lot of people's knowledge regarding how they use Speaker2: [00:24:02] Sequel with a relational Speaker3: [00:24:04] Database and then transform that into a feature set that's useful for analysis. Harpreet: [00:24:10] Then what are your thoughts here? Speaker4: [00:24:14] But in the comments, and I know this is going to sound sarcastic. But one thing that I would actually do is like, give a fairly hard. You know, I said ridiculously hard, but fairly hard question. And then say, Hey, Google it, Google. The answer, unless you know it off the top of your head. And based on the Google that you just did, you come up with a actually implement what you googled? Because that's kind of realistic. I mean, we run into stuff we don't understand all the time or just forgot. And if we could add that to interviews and actually included in the process, we're like, Here's a question I know you don't know. But the whole point of this is to figure out how fast you can figure it out. And can you then once you figured it it out, implement, because that's one of our, you know, I think across the board is data scientists. It's one of our biggest skills because there's so much that you can run into Speaker2: [00:25:08] That there's no Speaker4: [00:25:08] Way you've ever run into before or there's complexities that you run into and you'll ask the team and they'll go, I don't know, Google. So it's I don't I feel like it's more realistic. Harpreet: [00:25:21] The single most important Speaker3: [00:25:22] Feeling how we spent. Speaker2: [00:25:24] Oh, yeah, go for it. Go ahead. Harpreet: [00:25:26] I was going to say single most important skill in data science is resourcefulness, and that would be one hell of a way to test for resourcefulness. Speaker5: [00:25:33] Oh, certainly the first 20 minutes of this conversation talking about various variations of Google. Yeah, that's definitely fitting. Harpreet: [00:25:42] Russell, you had some thoughts here as well. As I scroll up through the chat. Speaker2: [00:25:51] Yeah. Speaker6: [00:25:51] Hello, everyone. I started in a couple of comments in the in the chat window. I'll start in reverse order, so I've just responded [00:26:00] to VINs comment about Google, and I was saying very often I'll still Google about simple questions, simple solutions, just so that I see what's out Speaker2: [00:26:11] There in Speaker6: [00:26:13] The in the database, basically see if there's better ways to do things. I mentioned anger bias or anchoring bias, you know, so that I haven't chosen my favorite solution that I stick to and keep doing the same old thing for five years. Even though a better procedure may have been discovered, you know, three years ago, you know a more simple way. So I like to try to make sure that I'm implementing best practice from a, you know, a respected peer forum, but also to see what else is out there, good and bad. So if there's bad methods out there, I like to have visibility of them so that if I see anybody else doing them, I can say, Well, yeah, I can see that that's popular. But you know, that's not the best way. Try looking at this way, you know? Speaker2: [00:27:00] So Google is Speaker6: [00:27:05] It's a it's a great resource, but it is volatile. So if anybody looks at Google for anything, you know, look at the 10 things, 20 different responses. Don't look at the first thing that comes back at the top of the page. I mean, very often those are ad generated anyway to try and look past the ads, but even then go down, you know, to the bottom of the page go two or three pages in trying take like, you know, an average of the answers that come through the ones that are getting good feedback. And are there a lot? Those are the ones to trust rather than the the odd one that's in there that, you know, may look good on the surface, but if there's no validation to it and you try to implement that, you may end up in a pickle. Harpreet: [00:27:48] Russell, thank you very much. Shout out to Greg Kilkeel in the building that you, Greg. All right. So question coming in from LinkedIn. I think he's here above him. [00:28:00] Speaker7: [00:28:01] So I'm looking for some help and guidance around how the organizations are establishing some Speaker2: [00:28:09] Ethical Speaker7: [00:28:10] Practices. We're working on some things, Speaker2: [00:28:12] But this is Speaker7: [00:28:14] A great forum to get some inputs from everybody Speaker3: [00:28:18] Here. So thank you so much for any of the guidance that you're providing. I had that one question. Harpreet: [00:28:26] So definitely. Great question. I just want to say I don't have an answer off the top my head, but I just want to let you know that on February Speaker2: [00:28:34] 5th, I believe I'm doing a Harpreet: [00:28:37] Podcast interview with Grant Fleming, who is the author of a book called Responsible Data Speaker2: [00:28:43] Science. I've yet to go Harpreet: [00:28:44] Through the book where I've gone through a little bit, but that might be a book worth checking out. I did skim Speaker2: [00:28:49] Through it, and it seemed very practical, Harpreet: [00:28:52] Practical and very applicable. You know, there's these apply those concepts, so definitely check that out. But in the meantime, does anybody want to jump in here, provide some insights here for and just he's looking for trends and organizations for establishing ethical practices Speaker2: [00:29:10] And maybe some good resources for guidance. I wonder if Harpreet: [00:29:15] Is Makiko here? I feel like we would have good response. Speaker8: [00:29:19] Hey, where are you? Oh, here you are. Harpreet: [00:29:24] Yeah, I see you now. Yeah. Any tips? Speaker8: [00:29:31] Yeah, I think it's kind of interesting because it's like ethically AIs the thing that everyone wants to do, but not everyone's frankly capable of doing it Speaker2: [00:29:39] Because Speaker8: [00:29:41] A couple of things, right, like one, it requires like monitoring and observability sophistication or maturity, which not a lot companies do have that. Yes. Secondly, and there's a couple of other things to write like, so that's like mall [00:30:00] performance, right? Then the second part that you sort of need is you kind of need AIs on the ground when it comes to the Data. Speaker2: [00:30:06] So you could argue Speaker8: [00:30:08] That then that starts touching like data governance, provenance, lineage Speaker2: [00:30:13] And Speaker8: [00:30:14] Then at the other day, like you need to be understanding how to connect like the data science and machine learning sort of efforts Speaker2: [00:30:25] With like Speaker8: [00:30:26] The broader business goals. So, for example, way back one, my favorite example in the world. So there was some Speaker2: [00:30:33] Residential, Speaker8: [00:30:34] Some real estate tech company, which may or may not have been Speaker2: [00:30:36] Zillow. I also don't Speaker8: [00:30:37] Remember it could have been Redfin or someone they were trying to roll out this idea to essentially do, like white glove treatment for people who were listing properties above a certain value. Now the problem with that is that property value is may or may not be correlated with things, for example, like race, like your economic, socioeconomic status, Speaker2: [00:31:04] All that Speaker8: [00:31:05] Stuff. Because in the U.S., we had this policy called redlining, where essentially, you know, people of Speaker2: [00:31:12] Color were not. Speaker8: [00:31:14] They're not provided the same interest rates and loans to purchase property. Speaker2: [00:31:19] Right, so, Speaker8: [00:31:21] Yeah, so there's like a lot of huge issues Speaker2: [00:31:23] There. Speaker8: [00:31:24] But I would say, like the first couple, like the three major components I see, at least on the technology side is one having good observability and monitoring is like absolutely key, both on like machine learning or both on your models and your data. Speaker2: [00:31:39] The second part is. Speaker8: [00:31:43] It's definitely like having AIs on the Data because of the distribution Speaker2: [00:31:46] Changes, like irrespective Speaker8: [00:31:47] Of the models, you kind of need to understand that ahead of Speaker2: [00:31:50] Time. Third is having a Speaker8: [00:31:53] Good connection with legal, to be honest and law of companies. I feel like sometimes that relationship isn't there [00:32:00] until like something bad happens. I wish the eagles them like, Oh, why did you like hopscotch around us? And then the Data science leadership is like, Well, because it takes forever to get through you guys. It takes forever for you all to do like review. And I would say, like the start of the fourth aspect is like. Having a conversation on. Like, how do the broader gate and machine learning efforts fit into the company in the business and the product? And like how that is connected with like the users? So a lot of words to say that like there are people who are doing ethically who are involved in that, Facebook has a fair Microsoft, Amazon, they all have groups that are focused on that. But when you start getting away from the big tech companies or like the companies with the huge resources, it becomes like. Frankly, more of a talking point as opposed to like something that people are like actively involved in. And if they are, it's like a side effect of them investing in like their monitoring and production efforts or they're monitoring Speaker2: [00:33:06] Observability, tracing and Speaker8: [00:33:07] Like data governance efforts. So sorry, Speaker2: [00:33:12] I should've been more useful. Speaker8: [00:33:13] But yeah, you know, if you're getting started like the Rio, like, you know, not everyone's there yet. Speaker2: [00:33:20] Yeah. Harpreet: [00:33:20] And there's tons of resources in the forms of links to like white papers and stuff in the chat. And as always, all the links will be in the show Speaker2: [00:33:28] Notes for this Harpreet: [00:33:30] Javier session as well. So feel free to click through on your leisure. Anybody else has any input here? I'm curious. So, you know Ben or Speaker2: [00:33:39] Dave or Harpreet: [00:33:41] I got a question like, what's the what's the connection between like data literacy Speaker2: [00:33:47] And like Harpreet: [00:33:49] Ethical data science if if there is one day? What are your thoughts on that? Speaker2: [00:33:54] Yeah. Speaker3: [00:33:55] So I guess the question then becomes is like a base. What [00:34:00] is data science? First and foremost, right? And I know some of you, if not most, you're going to roll your eyes going, Oh God, we got to go in this again. Because for a lot of folks to social media, for example, Speaker2: [00:34:11] Data science Speaker3: [00:34:12] Equals equal machine learning. So if you if that's what you feel, then then there isn't really a ton of overlap, actually. Speaker2: [00:34:20] If you think Speaker3: [00:34:20] Of data science essentially is this kind of umbrella term for being able to do analytics right, use data to make a more effective business in some Speaker2: [00:34:30] Way, then the relationship Speaker3: [00:34:32] Is one of a maybe of a maturity spectrum or a a path of Speaker2: [00:34:37] Development where you start Speaker3: [00:34:39] With data literacy, where you start with the basics, right? Where what is Data? How do I understand data? What is how do I understand the data is collected? What usefulness is this data provide me? How do then do I use the data in a relatively standardized way to achieve some sort of outcome? For example, how do I make better decisions? How do I analyze what's going on with my KPIs, that sort of thing? Speaker2: [00:35:01] And then along the Speaker3: [00:35:02] Spectrum, right along that journey, you move more into the data science kind of space. So I would say the related, depending on how you define data science. And these days, I'm not quite sure what the popular definition is, is because I don't pay any attention to it anymore. What free? Harpreet: [00:35:20] Let's go to let's go to Vin, and then after Vin Russell Russell's got some great comments here, and if anybody else has any input here, feel free to just raise your hand icon. I just think you talking Speaker2: [00:35:31] About ethical trends in Harpreet: [00:35:35] Establishing ethical practices Speaker2: [00:35:37] And perhaps Harpreet: [00:35:38] Do some good resources for guidance. Speaker4: [00:35:43] Yeah, I think the first thing to note is I'm going to sound really optimistic after saying this, but I have to lead off by Speaker2: [00:35:48] Saying every business Speaker4: [00:35:50] Is only as ethical as their Speaker2: [00:35:52] Bank account and their customers Speaker4: [00:35:53] Allow them to be. Speaker2: [00:35:54] So, you know, when you Speaker4: [00:35:57] Build out this at the very beginning, it's easy [00:36:00] to think, Oh, I've got a mandate, they're always going to stamp it. No, they're not. Now that being said, most companies are actually worried in some way shape or form about ethics when it comes to AI because they're worried about getting sued. They're also seeing regulations come down U.S., Canada, Europe, Speaker2: [00:36:18] They're Speaker4: [00:36:18] All. I mean, China's got Speaker2: [00:36:20] A robust Speaker4: [00:36:21] Set of regulations as well. I mean, they're they're coming out of a number of different countries. So the fear is we're going to get sued, we're going to have a massive fine. And for companies like Google and Facebook, they legitimately do not care. And that's the that's the barrier that you're always going to run into when you're building an ethics program Speaker2: [00:36:39] Is at some Speaker4: [00:36:40] Point, someone is going to question, well, if no one else is paying attention to these rules, why should we? And so when you start one of these campaigns, when you start one of these programs, you have to start with Speaker2: [00:36:52] A Speaker4: [00:36:52] Doctrine and it has to be adopted by the entire company. And that's really an and it's an honest doctrine. You can't you can't be aspirational when you build this. You have to go through and talk to individual business units and senior leadership and say, OK, what do we really, really most concerned with? What are our actual principles? What are the things that we're worried about? And those are going to be part of that doctrine? And everything makes more sense when you have some basic founding principles and some guidelines where you say we won't do this, we won't do that. We will go to this level of effort to make sure that we don't have these negative impacts. Here's what we define as negative and bad. Here's what we define is good, because sometimes what one side thinks is a bad impact. Data science and machine learning you as a company might not agree. And so build that out. It's not. It doesn't have to be a huge document. You're not writing a religious thesis. Speaker2: [00:37:57] You really Speaker4: [00:37:57] Just coming up with the core principles for [00:38:00] everything that you do going Speaker2: [00:38:01] Forward and you can have everyone Speaker4: [00:38:03] Sign that. So everyone throughout the organization who is talking about projects, who's building projects has now signed on to an ethical standard, and that's something you can hold them accountable for. You can put that in senior leadership skills. You can, you know, and this is why I say it has to be it has to be realistic or none of that other stuff happens or this document gets emailed and goes into the ether and someone's archives somewhere where no one can find it. So really, you know, I can we can talk about ethical frameworks because there's tons of them. But I think if you don't start with that foundation of just explaining what the company really cares about and what the company is actually willing to do and what the company will not do, you don't have any sort of success criteria going forward, and you can't really you can't build rules for every scenario that is ever going to come up and say, if you have a set of guidelines, at least people can look at that guideline and say, OK, I tried to stay in, you know, in the spirit of this guideline, or I have no idea, I don't think this covers anything. I need to go talk to somebody. And half of it is that second scenario where you run into something and you Speaker2: [00:39:12] Go, OK, I Speaker4: [00:39:13] Remember signing this document, and I don't know. You know, it's just that awareness piece that sometimes is enough to surface ethical concerns, and it gives a place to go someone that you can talk to if that ethical concern comes up. And so with that foundation, I think you'll be way better off no matter what direction you go in, Speaker2: [00:39:33] But also Speaker4: [00:39:34] Don't get shot. Don't don't let this be the thing that gets you fired because you are more passionate about it than the business is. You know, I'll conclude with that. Most businesses have great, great intentions, but at the end of the day, the bottom line and the shareholders are the bottom line of the shareholders. Speaker2: [00:39:50] Ben, thank you very much. Harpreet: [00:39:51] Russell, you got some great comments. Anybody else wants to provide Speaker2: [00:39:54] Some, you know, Harpreet: [00:39:56] Any any insight on this. Please do libido. Well, I just think you or [00:40:00] if you got questions on LinkedIn or questions here in the chat that you want to ask, please go ahead and let me know and I'll put you into the queue. Speaker6: [00:40:09] I was just building upon his earlier comments about, you know, bank plan analysis and bias and prejudice and that. Speaker2: [00:40:19] And I was simply Speaker6: [00:40:21] Identifying that the Data presence in that industry prior to the advent of AI could easily have allowed for better analysis and better decision making. And the reason it didn't is because the people, as Vin was mentioning, who held the purse strings and maybe the entire organization, Speaker2: [00:40:43] If there Speaker6: [00:40:43] Was institutional opinion and bias, chose not to. So just as important as the AI or the ML models themselves in making decisions is the awareness and acknowledgment of Speaker2: [00:41:02] Cognitive biases in Speaker6: [00:41:05] All decisions. So that's decisions of interpreting their Data decisions of building the model in the first place. You know, if you selectively choose a spread of a thousand, say, photographs, for example, to do some image recognition and you bias that selection so that there's 90 percent of the thing that you wanted to discover and 10 percent of everything else. And your model is going to give you the answer you want, but it's not going to be the right answer. You know, and the same thing applies to everything. So I was just really drawing attention Speaker2: [00:41:37] To the fact that Speaker6: [00:41:38] In the last, say, five to 10 years, I've seen a lot of, let's call it noise or a lot of reporting about cognitive biases. And you know, there's 40 or 50 there if you really want to drill down into them. But some of the most common like anchoring bias are confirmation bias, etc., etc. So. [00:42:00] Having the people that are in charge of building the models in the first place, make sure that they are well versed in the biases and are aware of identifying early signs of bias so that they can try to direct the model in the right direction before it's built wrong. And you have to go back and fix it. You know, if it gets too far, it's going to be written off. You know, it's more economical to build a whole new model than go back and try and fix a model that's biased from the ground up. Speaker2: [00:42:30] And also the Speaker6: [00:42:32] People that digest and consume the output from those models try to make sure that they are educated enough to detect the first instances of bias that may not so much organically present itself, but present itself where there was no sign of it in the first instance. Yeah, I think that's it. So just be really aware of those biases. Harpreet: [00:42:57] Thank you very much, Russell. Some comments coming in, Kanji says, I think having a person outside of the team who has no stake in the project observe for morality and ethics as well, and also having non Data scientists evaluate for ethics is valuable. Speaker2: [00:43:13] Makiko said I don't necessarily Harpreet: [00:43:15] Think machine learning is a great tool for identifying bias personally unless someone is willing to do model in future analysis. Yes, interesting point. There's nothing inherently biased about, you know, a random forest or a convolutional neural network or any of those model architectures themselves. Nothing inherently biased about those. It's the data that you train on. So just be responsible with the features that you select and make sure that whatever samples you have, whatever data you have, is it's representative of. Everyone or everything, you know, I mean, just be as representative as possible. Speaker2: [00:43:52] Greg, see your hands up. Harpreet: [00:43:54] Go for it. Speaker2: [00:43:57] Twenty to Speaker7: [00:43:58] Go. I think a [00:44:00] lot of great points were made when it comes to Speaker2: [00:44:05] Ethics, and you touched on Speaker7: [00:44:06] This, that the tool itself and by two, I mean the machine learning models, they're not unethical. So the best way to operationalize ethics is to embed it into your processes, build that framework first, defining what's ethical and what's not. And with that, embed this framework into your project lifecycle management. So if you think about the Data science project lifecycle management where it starts away in for each of these stages, you have a set of questions that you ask and see if they meet or don't meet the threshold for ethics versus not ethical. Then once you you do that you deploy, you have to have a mechanism for Speaker2: [00:44:55] What you call auditing. Speaker7: [00:44:57] So how do you measure that? How do you check from time to time when these models are making decisions that they're not drifting, they're not going towards the unethical barriers? So, you know, just this is where, you know, you kind of you have to bring this. What do you say, auditing framework to make sure you surface these areas? So lastly, just in case you missed it, I put on the side here that I do have a copy of a great dog from integrity. Speaker2: [00:45:39] They have something about Speaker7: [00:45:41] Responsible A.I. in consumer enterprise. They have a framework that show you how to operationalize this, and I believe you can abstract Speaker2: [00:45:52] It to Speaker7: [00:45:54] Other industries as well. So if you're interested in it, then it get your name on time. The [00:46:00] best thing to do is to be on LinkedIn and I'll send you a copy directly there. It's probably better to do it directly on LinkedIn versus send you an email, so I'd rather do that. So let me know if you're interested to read this Doc.. Harpreet: [00:46:14] I think so much Russell is asking to be possible to create an independent model solely for the purpose of analyzing the output of another model to identify evidence of bias. Yeah. These generative adversarial networks. For that, there is a link to a paper that will be in the show notes so they can Speaker2: [00:46:34] Check that out. It's called generative Harpreet: [00:46:35] Adversarial networks for mitigating biases and machine learning systems. So, yeah, definitely check that out, Makiko. Go for it. Speaker8: [00:46:43] And I think these three random points, like one like a was so something that helps us don't have racist people in charge of policy that that helps a lot, right? Because with redlining. Right. And what was the early iteration of Freddie Mae and Speaker2: [00:47:05] Freddie Mac, Speaker8: [00:47:07] Fannie Mae, Speaker2: [00:47:08] Right? Speaker8: [00:47:09] They knew exactly what they were doing, right? Like, that's where the term came from was. They had maps where they literally Speaker2: [00:47:15] Said, OK, this one, this neighborhood, Speaker8: [00:47:17] Let's color and red, and this one is yellow and green and somehow corresponds to ethnic groups and all that stuff, right? So they kind of knew exactly what they were doing. So if you think about it, the system worked great for them. It worked terrible for like the end users and the people who are impacted by that right, because they've done studies to show that a lot of the generational wealth that people of color could have had in the U.S., a lot of that can. You can tie it back to essentially the kinds of interest rates and the kinds of homes and property they were able to purchase during that time. Speaker2: [00:47:53] Right. Speaker8: [00:47:55] So, you know, so I think that you can have a perfectly working system, [00:48:00] but that is still kind of bad. So that's that's one thing to point out, right? So we do have to hold our leadership or people who are in charge of all that stuff accountable. I think the second part to right is I think the term bias is a little bit overloaded. So I was talking with Serge Macias about Speaker2: [00:48:17] This, but Yann Speaker8: [00:48:19] Lecun, he gone to. Speaker2: [00:48:21] There's a little bit of a kerfuffle going on Speaker8: [00:48:23] At a while ago because he had said there's no racist models. And. But that's not to say that machine learning cannot be racist, right? What he is saying is that like it really is the Data and essentially like how you structure the model and like that will kind of impact the results you get and also the different segments of people that are impacted. And I preferred that explanation because when we say the model is racist, we're like essentially deferring accountability and ownership for making the system, for making the pipelines, for making our products better onto the tooling, as opposed to the people who design and manage the tooling and the experiments, right? So I think that's something to consider is that like it is a little bit of neural overload a term. But on the other hand, if you have all the right Speaker2: [00:49:13] Data models like machine Speaker8: [00:49:15] Learning models, it's like all things right. Like if you only predict off the Data, Speaker2: [00:49:19] You give them right, the only Speaker8: [00:49:21] Predict off the features that you give them. So something Speaker2: [00:49:24] That you can Speaker8: [00:49:25] Do right, for example, is you hypothetically, if you had like protected data about people you could like, like analyze them all the outputs and say, like, OK, well, this is saying, let's give a lower interest rate to people of color than non people of color, right? So but at the same time, it's like, should you be analyzing off? So it's one of these things where is like, it's such a complicated question, but I think there's a couple of things, right? Like number one, if it's if it's a question of like, who should be doing better, we should always kind of like, look to ourselves as professionals and as like teams to go, what [00:50:00] can we kind of do to contribute to that effort? Maybe it's better observability. Maybe it's being more, you know, friendly to legal, getting them in the conversations more or something like that, you know, but also to like to a certain extent, we benefit off of models being very biased, right? Like we want them to give highly targeted predictions or what have you. So I do think it comes down to like companies need to. There needs to be incentives for companies to be willing to make a change. The incentive is not there. It's not going to happen. Speaker7: [00:50:35] So, you know, let's not forget the the complexity of it too. When you when you have a company that is transnational, right? So the values that you hold due to yourself in, say, say, is the same in China, right? Do they value dogs the same way we value dogs, for example, you're training an automated vehicle to roam around in the street. Is that vehicle going to do everything it can to avoid killing a dog here versus over there? So how do you build those ethical Speaker2: [00:51:12] Policies that these Speaker7: [00:51:13] Machine learning will be governed Speaker2: [00:51:15] Under to make sure that Speaker7: [00:51:17] You respect them and you don't get Speaker2: [00:51:19] Penalized? Speaker7: [00:51:20] So I know for sure it's almost impossible to remove bias from Data. Or at least you'll be running a rat race, but I'm also pro independently audited results. So if you think about underrepresented groups that are constantly being penalized by the machine. So when you pull the historical results of that machine, you can kind of Speaker2: [00:51:47] See trends in. Speaker7: [00:51:48] Those analysis can either be done by a machine or people to make sure Speaker2: [00:51:52] That be surfaced, those Speaker7: [00:51:53] Discrepancies so. Very good point here in Mexico. Thank you. Speaker8: [00:51:57] And actually another common another problem that [00:52:00] we're kind of seeing pop up, for example, like in like the NLP space for people to chew over is pronouns. That is a big, big thing, right? So, you know, at MailChimp, I'm not going to talk about the work that we're doing right, but or in other companies where we deal with a lot of like text Data, right? Whether it's emails, whether you're a residential listing, posting or what have you you want to like, we want to make sure that, for example, if you're if you're trying to target to persons like business title or like. Marital stress or what have you? There's all this data that's available now that you can buy. I mean, Speaker2: [00:52:42] For years that you could buy it, enrich Speaker8: [00:52:43] Richard Speaker2: [00:52:44] Data. Speaker8: [00:52:45] But like, how do you respect like the pronouns? Do you? Do you even care about like. Like, bring that your analysis, should you bring that into your analysis, so that's like another thing that's really interesting to bring up because in certain like communities, they're like, why are we even caring? Like, why do we care about pronouns? And in others, it's like, Yeah, we should totally care, right? If we're going to care about, you know? Like race or culture or language or nationality, like we should care about pronouns, you know, and that I think comes down to the company, right, like some companies are, you know, very much like. You know, we we need to support policies to allow people to bring their whole selves to work and feel safe and all that jazz. And like you know, some companies like Coinbase, they're like, We're not going to be engaging in any progressive, I don't say progressive policies. You know, they basically said they're not going to be doing these efforts. Speaker2: [00:53:41] I think other stuff, right? Speaker8: [00:53:43] And they basically say, like, it's just not part of our mission. We don't care about. It's not like. So what is machine learning look like at Coinbase versus like what is Speaker2: [00:53:52] Machine learning look Speaker8: [00:53:53] Like at a company where they're like, No, we that's important to us like pronouns like making sure you know your [00:54:00] nationality where you live, like, you know, all other stuff. So that's a part that's like it's not a technical problem, but it's like an internal conversation that reflects the business's priorities and all that stuff. Speaker7: [00:54:16] It's interesting, even if you. Oh, sorry about that. But. No, no. Go for it. I was going to say to the point of embedding that framework into the stages of that lifecycle, you know, for example, when you get to feature engineering, for example, you have to have the right questions, right? So if you're trying to guess. Uh, explore or classify or forecast whatever it is. Salary levels for people, you have to ask yourself, what does race have to do with it? Do you have to use that as a feature in your model, right? Is it really doing something right? So if if there's no mean, you have to bring up these questions to kind of ask yourself by ingesting this, does it really affect the result that you're trying to obtain? And sometimes you'll discover that a lot of these columns can be automated to prevent Speaker2: [00:55:20] Some sort of unnecessary Speaker7: [00:55:22] Bias to be ingested Speaker2: [00:55:23] Into into the Speaker7: [00:55:24] Models. So asking the right questions, asking the ethical questions is the way to go. Harpreet: [00:55:30] Yeah, just because the feature is highly predictive doesn't mean that you should use it to make a prediction. That's an interesting point, because think about the pronouns. I'm just wondering like if you're training on on Data for NLP tasks, I'm sure there has to be a cutoff point temporarily for the Data that you use because this modern usage of pronouns probably wasn't. You know. As pronounced No. Five to seven years ago. You know what I mean. But before we get to Vince's point here, [00:56:00] there's a somebody on on LinkedIn. Brian had suggested Reed Blackman to follow Reed Blackman, CEO of Virtue and ethics adviser on LinkedIn. I post a lot of great content on a on ethical Data Science Article III. Before we get to then I just quickly want to read off this post. That's very relevant. His most recent post said that one of the most dangerous ideas in AI ethics community is that the Data scientists and engineers can by Speaker2: [00:56:29] Themselves save us Harpreet: [00:56:30] From the ethical risks of Speaker2: [00:56:31] Ai. Take each Harpreet: [00:56:33] Of the big Speaker2: [00:56:34] Three ethical Harpreet: [00:56:35] Risks of AI bias, black boxes and privacy violations. In each case, a large segment Speaker2: [00:56:42] Of the AI Harpreet: [00:56:42] Community thinks we can already solve this with technical fixes. We get, for example, mathematical definitions or metrics or fairness, lime and shap for explainability and techniques like differential privacy. But those metrics are incompatible with each other. Those explanations are unintelligible to the people that can be wronged by Speaker2: [00:57:03] Ai, and anonymous Harpreet: [00:57:06] Doesn't entail that people have control over their personal data. If senior leaders don't understand that these technical approaches to ethical risks are far from Speaker2: [00:57:15] Sufficient, we are in Harpreet: [00:57:17] Big trouble. Speaker2: [00:57:19] And then then in the Harpreet: [00:57:20] Comment section of that same Speaker2: [00:57:21] Post, which I'll share Harpreet: [00:57:23] Here in the links says that I doesn't give rise to novel ethical risks. It gives rise to novel sources of ethical risks with which you are all too familiar. Speaker4: [00:57:35] Vancouver Actually talking to Greg's point, the when you get into the the layers of different ethical frameworks with which you're actually getting into is different Speaker2: [00:57:48] Ontologies where Speaker4: [00:57:49] People classify things differently. Categorize things and their relationships between categories change. Speaker2: [00:57:56] And this is one of Speaker4: [00:57:57] Those big pieces where just awareness of ontologies [00:58:00] and beginning to build out a value Speaker2: [00:58:04] Connected ontology, Speaker4: [00:58:06] It brings awareness to how someone could now use this. Every time I talk about this, I use vegetarians, vegans and carnivores. Speaker2: [00:58:15] When you start talking Speaker4: [00:58:16] About, you know, serving somebody in AD, do I serve this person and add for a steakhouse if I do that to someone who eats meat? Awesome. You know, I have connected with their ontology. My ad matches with their ontology. Someone who's a vegetarian, that's a different ontology. They look at meat and meat is not connected with food. Whereas for someone who's a carnivore, meat is food. And so when you start creating these ontologies, you inadvertently start gaining insights into some of the value systems. And so implementing ontologies is one way that you can start to at least understand some of these areas. And I'm not going into the really complex, controversial ones. But, you know, I think the food and that one's about as easy as it gets on. I'm talking about this topic, but you begin to see more complex relationships and value Speaker2: [00:59:14] Systems as part Speaker4: [00:59:16] Of your metadata around your Data catalogs, and those eventually build out ontologies. The hard thing is when you have conflicting ontologies where the majority of people overwhelming majority of people view things one way and there is a negative impact Speaker2: [00:59:36] Towards a Speaker4: [00:59:37] Very small group. And that small group, Speaker2: [00:59:41] Just Speaker4: [00:59:42] Overwhelmingly throughout history, has always been trampled because in some cases you don't even know they're there. It's the last thing I'll say is there was an entire group of autonomous driving engineers working on machine learning who learned by tweet [01:00:00] that their computer vision datasets were missing trans individuals. And so there had been a couple of experiments done that showed that cars were more likely to run over people who were trans. You know, and this is one of those if you don't have somebody like that on your team, if you don't have somebody that falls into one of these groups on your team or who you want to say like that, I mean, who thinks that way Speaker2: [01:00:24] Because Speaker4: [01:00:25] You really do need someone Speaker2: [01:00:26] Like that who just thinks Speaker4: [01:00:28] Of every possible group, no matter how small Speaker2: [01:00:32] And has a moment Speaker4: [01:00:33] Of, well, we check what would we test Speaker2: [01:00:37] Because Speaker4: [01:00:38] There's just so many of these and you can have a crazy impact that just shows up out of nowhere because Speaker2: [01:00:46] They're thinking about it, just Speaker4: [01:00:48] Lack of awareness. Harpreet: [01:00:52] And thank you very much. Any other. Insight here, any other points? Got a question actually coming in from coast up, guys? Happy to help you stay on this point where we can move on to the next question. This one is not as touching a subject, but Kosta wants to know. Data science or machine learning teams in large software products, how do you guys go about task estimation? Speaker2: [01:01:22] Story points, Harpreet: [01:01:24] What's different to you and what's the same? How do you factor in experimentation? That's a good one, actually. Metro's test points pace. That's something that piece had copy shared something. Whatever match ups the other day about data science and scrum will go ahead, and I'll post that link here. I mean, there's some parts of data science like experimentation, for example, that it's kind of open ended. So I feel like just time box can get and then giving that thing like five points because it takes a lot of work. But [01:02:00] but I'm curious, I'd love to go to to to Makiko, if you're available, if not, maybe a. Vin or Dave on this? Speaker8: [01:02:10] Yeah, it's funny, I think we're kind of struggling with this right now or we're we're dealing with it. I feel like it goes to the Speaker2: [01:02:17] Question of Speaker8: [01:02:18] Like, what do you do with like one off projects like is this like a recurring cast versus like a build Speaker2: [01:02:25] Project? Speaker8: [01:02:26] I feel like build projects you'd almost want to look at existing like software like frameworks for estimating storage points for regular software. But if it's something like, for example, you're consistently like if you're building a pipeline off of existing infrastructure, I would just look at how long it's taken before. Um, honestly, if it's like something that's like totally like new and it's I was going to say balls to the walls, but I got to figure out, I got to figure out more PC appropriate for this use, but I kind of almost feel like Speaker2: [01:03:01] It's you give the maximum Speaker8: [01:03:02] Number of story points. Make sure your leadership is like super nice. And you also caveat a lot of stuff like make it seem like it's going to take 2x longer than what you think it's going to take. So even if it only takes one point seventy five x longer, then it's like it's still within the bound. And then just try to scope down from there. I feel like what was it? The mythical, the mythical man month like? Wasn't that written in regards to how bad like waterfall a waterfall was? And part of it was like estimating story points. So I feel like I was inspired by that to go, you know, just give them the worst possible estimate, try to copy as much from other projects and see how long they took. And then try and find optimizations so that even if it's 15 percent shorter, it's, you know, [01:04:00] we also can use we also use story like poker cards. Those are nice. For one thing, it's very tactile, so you can just go with the cards, but also to you. It's like a way to like, raise disagreements because I think that's that's like the hard part is like sometimes people have information that kind of will help you speed up stuff. Speaker2: [01:04:21] But like you need Speaker8: [01:04:22] To kind of give them the opportunity to go, no, you're wrong because x y z. So story cards like the poker cart, agile poker cards, whatever, those are great full on agile process. Not so much, but agile poker cards. Those are nice so Harpreet: [01:04:40] That there's some aspects of machine learning that maybe you could fit into the agile framework, you know, build the Data pipeline, right? Like, that's pretty tangible. Speaker2: [01:04:51] Like, you know, Harpreet: [01:04:52] Experimentation, probably Speaker2: [01:04:54] Not finding the Harpreet: [01:04:55] Right model. Probably not those things. I guess just time box and move as quickly as possible like I like. Well, say Andrew Ng, that does like those one day sprints just Speaker2: [01:05:09] You can get up and running in Harpreet: [01:05:10] One day, it won't be perfect, but at least we'll give you a skeleton of what to work out, what to elaborate on, I guess. Then what are your thoughts? Speaker4: [01:05:19] Well, a yield most of my time to Greg, because he's the OG here. He'll have possibly the best answer on all of us. But I mean, from my perspective, what I do is break everything up into three different workflows. You've got your Data Speaker2: [01:05:31] Workflow, you've got your research Speaker4: [01:05:33] Workflow and you've got your model development workflow. Your model development workflow is pretty traditional. You know, by that time, you know what you're deploying, you have a pretty good idea of what you're doing. And so that's probably the most predictable side of this. But when you talk about like a data gathering activity, how long is it going to take you to find the data that you need for a particular project? I don't know. I don't even know how to estimate that, you know, it could take, you [01:06:00] may Speaker2: [01:06:00] Never find it. You may not Speaker4: [01:06:02] Be able to gather it, you know, so there's there are going to be processes that you can scope with more traditional, more accurate Speaker2: [01:06:11] Than there are. Way, way, Speaker4: [01:06:13] Way uglier ones, which you have to create a different type of framework altogether Speaker2: [01:06:17] Where Speaker4: [01:06:18] You have gated reviews. You know, like you were talking about that one day cycle or one week cycle where you just see how far somebody can get in a week, present it to the rest Speaker2: [01:06:27] Of the team, have something from product Speaker4: [01:06:29] Management Speaker2: [01:06:29] There. Maybe have Speaker4: [01:06:30] Somebody who's a stakeholder, you know, who may be a voice, the user or something like Speaker2: [01:06:34] That and just say, Speaker4: [01:06:36] Hey, this is how far I got. What do you think? Should we do another week on this? And then you can have a conversation and the business can make a decision. We're going to keep going or not kill it next one. And now I yield my time to Greg, who's going to rule this answer Harpreet: [01:06:50] And go for it. Speaker7: [01:06:52] Well, my my focus is going to be about how do you. T-shirt, size pauses, deployment, experimentation. So if you think about a recommendation system. There are some tests that you need to make to make sure that your models are. Behaving or to collect data about whether your models are doing things as expected, and for that, you need to figure out how long it will take you to collect the right data for it. And one good approach to do so is to when you're designing your experiment, you have to split your control in your test at the level of testing. So for example, Speaker2: [01:07:42] You're checking whether your recommendation Speaker7: [01:07:46] System is giving you more sales. And do you want to put that split at the time where the person clicks by? Or do you want to split your population at [01:08:00] the time the person enters the website? And for these two stages, you will need different sizes of population, so you have to really understand what is the traffic on your website right now that will give you that population. And obviously, you can have 50000 visits coming in when only 10 percent actually clicks the button to buy. So when you put that split of control and test at the click button, then you don't need as much sample population to perform your experimentation versus putting it at the entrance to at the entrance. You may require you to have 50000 times to one for the control, one for the test. So it will take you more time. And then other considerations to take is the model that I want to test. How much data do you need to ingest in it? What kind of models? So if you're for a recommendation system, you are having this funnel Speaker2: [01:09:02] Approach where your Speaker7: [01:09:04] Filtering from the top down, you may be using a linear model, then you use some excuse and then down there you have a more sophisticated one. But the neural net, these models require a different amount of data and then the time for training and all that stuff. So you have to understand the technical requirements of all of these and understand the size of Data that you need to ingest. So all of these factors and to how do you design a good experiment that will minimize the time it takes to collect the data to confirm whether your newly launched feature is doing the job or not? Harpreet: [01:09:40] Excellent. Thank you so much, Greg Davinia. Any thoughts here? Speaker3: [01:09:48] I was just typing in the chat, but I can go ahead and say it here. Yeah. So I interpreted the question from a methodological perspective, which is how do I communicate to my product owner, my scrum [01:10:00] master? Whatever the term du jour is for the person who's running the project, how do I communicate to them the particulars of what the nature of the work is vis a vis the methodology? So for example, if you use Classic Scrum, you need a product owner first and foremost that can define for you up front. Hey, I want you to develop a machine learning model to enhance the product or do something that's cool, and I need that product owner to define what is success, right? And I'm going to use accuracy. Please don't shoot me for I'm just using that as a moniker for success. It could be sensitivity, specificity, whatever, right? F one score or whatever. But I mean, he's accuracy, so I need an accuracy of at least 80 percent. And then what I need as the data scientist, the machine learning developer, whatever the term is from what I do, I then also need the ability within the methodology to Speaker2: [01:10:53] Say, OK, look, that's my Speaker3: [01:10:54] Goal. However, I can't guarantee you that I'm going to be able to deliver that unless you allow me to do what is frequently referred to as a spike, which is a particular type of exploratory work effort within an agile project management framework. And if you're doing classic scrum sprints, we're supposedly 30 days. But if you're XP ish, then you do two weeks, whatever it might be. Hopefully you've got enough time for the duration of your Speaker2: [01:11:19] Spike to do the Speaker3: [01:11:21] Initial perceived initial valuation to say, Look, yep, maybe this is possible product owner that we can do this with additional follow on work Speaker2: [01:11:29] Or nope, it's Speaker3: [01:11:31] Not looking Speaker2: [01:11:32] Too good. So what you need to Speaker3: [01:11:33] Do is you need to be able to incorporate this idea of uncertainty within the methodological framework of how you do project delivery, because that's the reality. Just because you want to build an awesome machine learning model doesn't mean you're going to be able Speaker2: [01:11:44] To do it, but you won't know Speaker3: [01:11:46] Until you try. So you need to have the you need to have the tools in place from your project management perspective to allow you to do that. I've worked on Speaker2: [01:11:54] Teams where we had spikes Speaker3: [01:11:56] That was a particular type of sprint [01:12:00] deliverable Speaker2: [01:12:00] For someone who Speaker3: [01:12:02] Did work that wasn't expected necessarily to return back actual working Speaker2: [01:12:07] Code. The idea Speaker3: [01:12:08] Was to gather information, and that's the first thing that you need to start with if you're going to be building a machine learning Speaker2: [01:12:13] Models part of a product development Speaker3: [01:12:15] Methodology. Harpreet: [01:12:18] Awesome. Thanks so much, Dave Coaster. Hopefully that answered your question. Doesn't look like there any other questions coming in on LinkedIn or YouTube or anyone, anywhere else. Speaker2: [01:12:30] So we'll go ahead and wrap up. Guys, thank you so much Harpreet: [01:12:32] For hanging out. Be sure to check Speaker2: [01:12:34] Out the podcast episode that Harpreet: [01:12:36] Was released just today. Dr. Joe Perez Next. We got a big Speaker2: [01:12:41] Week, Harpreet: [01:12:42] A few events. I'll be doing the comet. The office hours on Wednesday, this Wednesday, we've got we're talking all about Data. Understanding Data, validating Data version, Data, Data pipelines, Data governance, all that stuff that a panel discussion with good friend of the show, Matt Belleza will be there. We'll also have Jimmy Whitaker, who's a dev advocate, over at pachyderm. Then we're Speaker2: [01:13:09] Also going to have Dr. Harpreet: [01:13:11] Abe Gong, CEO of Super Conductive. That's a team behind great expectations, Speaker2: [01:13:17] So it'll Harpreet: [01:13:18] Be a great session, so definitely tune in for that. Questions are welcome. Guests are encouraged to come in and join us. Speaker2: [01:13:27] You know, it'll be a lot of fun. Harpreet: [01:13:29] Then the following week, I've got a panel discussion with three other awesome Speaker2: [01:13:33] Individuals people who I Harpreet: [01:13:35] Truly, truly look up to respect and admire. We got Speaker2: [01:13:39] My good Harpreet: [01:13:40] Friend and, you know, I think you all know her. Speaker2: [01:13:42] Jonathan Tully, she'll be Harpreet: [01:13:43] On the on the comet office. We're going to be talking about experiment management, so we got shot independently. Susan Shou Chang Give you guys know Susan. She's epic. She's awesome. Follow her on LinkedIn and then w Ronnie Huang, who used to build laser guns and now is [01:14:00] a research scientist at Google. So that's going to be an epic one too. So, yeah, I'm pumped for the remainder of the comet. Speaker2: [01:14:09] Office hours, just Harpreet: [01:14:11] Like for panel discussions lined up over the next few weeks. So it's going to be a lot of fun. So hopefully you guys could tune into that. As usual, a bunch of cool podcast episodes being released in the next few weeks, you're going to have to stay tuned and and see what's up with that. Lot of good stuff happening. And then what else has happened this weekend? But like I said, presentations as well, I'll be Speaker2: [01:14:32] Presenting with Harpreet: [01:14:34] So talked about the panel discussion and then I'm also doing a webinar with Pachyderm on Wednesday, and we're going to be talking about future proofing your ops Speaker2: [01:14:43] Stack. Harpreet: [01:14:44] And then also on Thursday, doing a webinar with cognitive lessons from the field and building your ML apps strategy. So a lot of talk about MLPs, and the more I learn about apps like, the less I know that it's just a massive broad field, so I'd be sure to join in on those if you can interpret. Speaker7: [01:15:10] Yeah. The question is, is this something you won't be doing? Yeah. Harpreet: [01:15:16] I will not be resting. That is for sure that that is for sure, man. Yeah, definitely. Pact for the next few Speaker2: [01:15:26] Weeks Harpreet: [01:15:27] And hopefully making it home to California sometime in February. So I'll be in Sacramento at some point, mid-February. So, Vin, you know, I'd definitely holler at you. Anybody else that is in or around the Northern California zone? Get at me. You know, Sacramento is home, but I'll be spending some time in San Francisco as well. My sister lives out there, so I'll be out there too. So hoping to connect with the, you know, Mickey Cohen and some other friends, maybe even Marc. Guys, take care. Have a good rest of the weekend. Remember you got one life on this planet. Why not try to do some big [01:16:00] cheers, everyone?