45HH-2021-08-13 Harpreet: [00:00:06] Yo, what's up, everybody, welcome, welcome to the @TheArtistsOfDataScience Happy hour number forty five, who's been here since, like, a happy hour. No. One. No, no spinning around for for quite some time, but a happy hour. A number. Forty five, just a few more weeks till it's been a year straight of this stuff. So thank you guys for joining me on this journey. Shout out to everybody in the room. Spencer a babe. I got to give a shout out to Abe. He's been crushing it with with helping with the social media stuff on the Data science page on LinkedIn driving that engagement and increasing the brand awareness. Abe, I cannot thank you enough, my friend. Shout out to Eric Ostin, Jo Russell, Moesha and Jacob. What's going on, guys? Super happy to have all of you guys here. Man It's Friday, Friday 13th, Friday, August 13th. So something before before Abe gives his shout out to us. I want you guys to think about this. What are some Data science projects that revolve around Friday the 13th that you'd be interested in checking out? Like, is there like a. A myth out there about Friday the 13th that you think there might be data available to debunk, if so, what type of Friday the 13th project would you want to do? I'm I'm curious about that of a government. You've got to you've got to shout at some folks, right? Speaker2: [00:01:25] Yeah. Yeah. So I don't see Mark in the chat, but tell him thank you. So appreciate you guys helping me out with Cequent stuff. What I like this. I mean, maybe like like superstitions about different superstitions from different cultures about Friday the 13th. I don't know that's an idea, but I actually can't stay long. I got to go be on Candy from my wife and one for school events, Harpreet: [00:01:48] But I'm in. Well, thanks for stopping by, Ave. And again, thank you so much for all the help with the with the social media stuff this week and really appreciate that. Um, BMF Friday the 13th. If you could do a Data science project, uh, that [00:02:00] was about Friday the 13th or that, you know, if you think that there's data out there that you could use to debunk Friday, the 13th myth, what would it be? I, I got to like, you know, relatively simple thing. I'd want to do some Data is, you know, in the city of Winnipeg, we have an open Data portal in that open Data portal, has a distribution of parking tickets all through the city of Winnipeg. So I'm wondering, are cops out there enforcing parking tickets more stringently on a Friday 13th than any other day? Um, I love to hear from from you guys, man. What would you do? What would you do for a Friday the 13th project Speaker3: [00:02:38] I was thinking of how I could think of was Apollo 13, because I used that gift today for Tom Hanks saying, Houston, we have a problem. So it'd be kind of cool to do an Apollo stunt. Were there things that led up to Apollo 13 and then the really? Well, it wouldn't be with the thirty, but that's the only thing that come to my mind. Do some modeling of the space program management. I'm like, brother, Harpreet: [00:03:07] That works really lame, though. It involves the number thirty. So that was sure about you. Speaker4: [00:03:13] I think I would be interested in knowing how many people watched movie kind of thing on Friday the 13th. Harpreet: [00:03:21] Oh, that's a good one. Yeah. Joel, by you, Speaker5: [00:03:24] I don't know how many people actually travel to Camp Crystal Lake, which was where thirteen took place. You watched the movie. So maybe there's some weird fan laughing going on there. So all of you shouldn't be the people who would go for this. Laughs If they were. Harpreet: [00:03:43] You know, I just I just recently found out what a what a stripper is. I thought that's pretty, pretty interesting. Uh, apparently my next door neighbor is a of I don't know, Speaker5: [00:03:52] Are you watch. There's a really good movie called Dark On. Uh, yeah. It's it's well worth watching. I think it's a weekend movie [00:04:00] for you and your better half. So it's about it's about medieval lovers to battle each other in a park. It's really Speaker2: [00:04:07] I Harpreet: [00:04:08] Don't put on tonight. Speaker5: [00:04:09] Yeah. You're totally not watch it. But yeah. Speaker2: [00:04:12] Um, I Harpreet: [00:04:13] Watch a lot of the stuff that people recommend, but Speaker5: [00:04:16] It's pretty funny. Harpreet: [00:04:16] I type that name of the movie out in the chat for me as well as I don't forget about you man. What was the project that you would do on Friday the 13th? And if anybody else wants to chime in here, that the question I'm opening with is, look, it's Friday the 13th. If you could do a Data science project with just the regular Data project that was thematically around Friday the 13th, what would it be? What type of project which we want to to do this year from you? And then, by the way, if anybody else has questions whatsoever, whether you're on YouTube or Twitch or LinkedIn, I'm keeping an eye on all the platforms or even here in the chat. Let me know if you've got questions. I will add you to the queue. Speaker6: [00:04:53] Go for it. Yeah. So my first idea was kind of along the lines of your parking ticket thing. Speaker2: [00:04:57] I was more thinking, though, about injuries because, you know, like I've seen car accidents, that kind of Data would be fairly easy to track. You might be able to find some sort of I don't know, I was trying to think of, you know, like falls like ladder related things, anything like that to get towards superstitions. But I couldn't really I don't Speaker6: [00:05:15] Really know where you'd find Data for that. And then the other one kind of longlines. What you're saying is Speaker2: [00:05:20] Could probably come up with a pretty interesting somebody who's way better Tablo than me. But if you had some cool Tablo chops, you could probably come up with a really interesting analysis of like a horror movie, like franchises, you know, like comparing them like Speaker6: [00:05:34] 30 and compared to Speaker2: [00:05:36] All the different horror movie franchises could be kind of interesting. Speaker6: [00:05:40] So those are my two thoughts. Speaker2: [00:05:41] Yeah, I like that. Harpreet: [00:05:42] I mean, if somebody works for an insurance company, they probably have access to a lot of good data that they can test them against. Right. Speaker2: [00:05:50] You wonder, though, might you see if Speaker6: [00:05:52] People are being safer because they're paranoid? Speaker2: [00:05:55] Would you possibly see like a reduction in injuries or [00:06:00] terrible occurrences just because we're all aware of Friday the Speaker6: [00:06:03] 13th and now we're being careful? Speaker2: [00:06:04] It can't, like, sneak up on you anymore or or is the fact that you're aware of it on the. Still, it's going to get you anywhere with women Harpreet: [00:06:13] To wear thing. Let's hear from a letter from an Austin man I haven't seen in a while. Good to have you back then. Let's go to Austin then, Mark, and then we'll go to John's question. John's had a question and then a by the way, if you guys have questions again, let me know right there in the chat and I will go ahead and put you to queue up and go for it. Speaker5: [00:06:31] Yeah. Speaker2: [00:06:32] So I think I was saying about it and I was leaning more toward, like, Predator 13 black cats, that kind of thing. And just curious if there if you could get your hands at enough adoption Data to see like our people adopting black cats, like on other days or maybe because of Friday the Speaker5: [00:06:53] 13th, you might have Speaker2: [00:06:55] People that are like corporate fanatics and like owlets Speaker6: [00:06:59] And there's Speaker2: [00:06:59] A spike in them on that day or those days. So that's kind of where I was thinking was Speaker6: [00:07:04] Just a little different. Harpreet: [00:07:06] I mean, if you're like humane shelter type of Data, it should be public since those are public organizations, I believe that might be interesting to look into. Mark, let's hear from you. And then after Mark will go right into John's question, Speaker2: [00:07:20] I had a pretty tangential thought. I was like Friday the 13th, Jason, Jason wears a hockey mask and business. They have the hockey stick graph for startups when they try to pitch to investors. So the idea is comparing hockey stick investor graphs and how well they match up to reality. So you can probably find some decks of startups somewhere, watch some pitches, get that data and then compare it to like series A, series B or just failures of companies. Huge stretch. But that's where my mind went. Harpreet: [00:07:52] That's that's some creativity right there. That's connecting. That's connecting a lot of different layers of neural network. I like that. I like that. Thank [00:08:00] you very much, Mark. All right, guys, if you guys have questions, go ahead and let me know in the chat. I will add you into the Q keep in keeping a good eye on everything that's coming in. But right now, let's go ahead and jump straight to John's question. John, good to have you back on. It's been quite some time. Speaker6: [00:08:14] Yeah, it has been. Yeah. Thank you as well. I recently started a job, a new job about four months ago as a lead data scientist. So education's very easy here. Thank you. Yeah, it's been it's been ridiculously busy and know based in the U.K. So my time here is ten thirty when this starts off that after a long day of work. So, yeah, I get to see all your faces. So my question is on predictive modeling actually, and I'm doing I'm doing a good well project. I work right now for a charity and the charity is a UK based charity that basically redistributes food waste from large supermarkets to those that don't have the funds to buy their next meal, basically. Right. So the charity works by using volunteers. And what they do is they sign volunteers up to chefs. But what they often find is that they don't feel the chefs. So what they're looking for is a way to predict what chefs are not going to be filled so that they can take action beforehand and reach out to more volunteers to build chefs later down the line. So the approach I'm going with is actually looking at it probabilistically. So the idea that came to me was, what if you treated a volunteer sign off as an event? Right. You can get the probability distribution of volunteer sign ups and then you could, from that probability distribution, maybe fit a probability density function. And then once you figure that probability density function, you could simulate volunteer sign ups, like how many volunteers are going to sign up for shifts over [00:10:00] a given period of time. You can then compare this to the actuals and calculate errors. And then maybe you could use machine learning or some kind of regression model to predict where those areas were and adjust your original simulated volunteer sign ups. And that could be I said that's kind of the approach I was exploring. So I just wanted to put that out there and see if anybody has done a similar thing. Harpreet: [00:10:27] You know, I haven't done something similar to that. But just based on, like the problem statement, it sounds like you're trying to predict, like the count some type of count value. Right. So if it's a of value, I'd be inclined to probably use some type of generalized linear model specifically, maybe like a post on regression or something along those lines. That's kind of the direction I would try to sniff in and see if I could find something that that fits. They're probably happy to hear what other people have to say about this to anybody. Have any idea that that's kind of my my. But he says they're both on regression or some other type of glim. Um, but Speaker3: [00:11:04] If it's OK, I was really trying to listen carefully, John and I. This sounds really interesting, but unlike your food waste, major travesty, so what what is your what is the goal of your current exploration? Speaker6: [00:11:21] So the goal is to be able to inform the volunteer, reaches out to the volunteers to get them on shifts, to inform them beforehand whether a shift is likely to be filled or not, so that they can take some action to reach out to volunteers so that they can fill those shifts. Because if you imagine if you don't fill your shifts as a volunteer manager, all that food can't be delivered to where it needs to be delivered. Right. And without going into too much detail about the shifts, they need different types of workers for each shift. But I don't think that necessarily informs the predictive modeling in [00:12:00] a sense. Right. You can break out at whatever level what's right. But the idea is you need to tell the volunteer manager your shift Wednesday afternoon is likely not to be filled. So take some action to to get more volunteers. Speaker3: [00:12:13] So if I've heard you correctly, what occurred to me is this is a process essentially of need comes in, got to step up the right number of people that if you map out the process, then you begin to think of what Data you need to do step. But it sounds like there are some features that need to be determined to predict the number of workers. And yet there's also the challenge of getting hold of the volunteers, right? Speaker6: [00:12:41] Yes. Yes. So what we're doing right, the tool is simply to inform some decision making meeting later on. Right. So when we stop where the volunteer manager takes action. So what would happen is the volunteer manager would log on to an interface. We would have the tool up and running and the tool would tell them, hey, your shift, that is a month away, let's say on Wednesday afternoon, you're not likely to get any volunteers for that shift. So take some action now to reach out to your volunteers. And that's based on previous Data. Right. So going with my approach before Will, I would probably be looking at is I'd look at the distribution. So let's call it an event distribution where one an event is at the level of how many volunteers have previously signed up to a Wednesday, for example. Right. So will be a distribution sometimes it might be one, could be six. We could fit a probability density function to that distribution once we gather up from Data. And then we could use that probability, that city function to simulate for a given time period what how many volunteers are going to sign up and then use look at the residuals between that and the actuals and maybe a machine learning model to predict those residuals so we could [00:14:00] adjust our simulation? Because I, I have this intuition that we have this this kind of intuition that because the probability density function is based on days of any simulation that we gather, it's going to obviously take a probability density function, assuming it represents the population. Speaker3: [00:14:19] And then you are saying that you are saying that one of the one of the key issues in executing these processes of getting volunteers to deliver this food is really just planning ahead. So you have enough volunteers, right? Speaker6: [00:14:36] That's that's a yes. That's exactly it. Speaker3: [00:14:39] And forgive me for doing this because I do want to hear from others, but I read the latest post very carefully. I'm one of those people. And then talk Tomlinson's. Is he? It sounds like he's just learned a lot of hard lessons. That's what I love about him. And I can hear him kind of saying. And by the way, John, I'm not suggesting there's not machine learning that could help with this, but being a father of nine, I'm thinking, oh, this might be a bigger human problem than a machine learning problem. Again, not to say no machine learning. It's OK for the Data scientists to go. You know, maybe there's a human solution here. So we we have a really solid process of always getting enough volunteers. I apologize. I know we're Data scientists, but my brain kept thinking it's OK to think outside the Data science realm and ask these volunteers, there's times we just need you on call like a backup jurors, so to speak. And there are times where you need to say, I can do it if you can't get some of take your place. I'm just saying, if you look at this, don't be afraid to call on human resources. Help to come up with solutions to. Yeah, it sounds like you're you're thinking the right way on the other side. Speaker6: [00:15:57] Yeah, I can agree. And [00:16:00] just to kind of reiterate as well, this tool is there to help, right? It's not we're not expecting this tool to solve every problem, the organization. In terms of feeling volunteers, but right now the current situation is they're doing it randomly. They have no idea how many volunteers are going to turn up on the day. They have some got filled. And the tool should help them kind of formalize that Guptill using statistics or using modeling. Right. So, yeah, I, I completely take what you're saying. And I think a lot of it is going to be down to how that the outreach at the end of the day. But if we can give them something that kind of at least guides the outreach that that works as well. Harpreet: [00:16:41] Thank you very much for that, Tom. Let's go to Mark because I feel like Mark might have some insight, because this sounds like a typical problem that you deal with in health care, which like patient show ups type of thing. So I'm thinking you might pick in that direction. I'm not sure, Mark, but what do we think? Speaker2: [00:16:57] So I have two thoughts. One is the health care one, because I did take a health care modeling class that I really loved. And there is cueing models to help you figure out, figure that that equation, they use it for like telephone lines when they used to have the whole to go transfer people. Right. They use these queuing problems to figure it out. I'll go pull it up and put it in the link. I'll go find out for you. But that was actually a more simple thing, kind of along the same long, long lines of time. And this stems from I used to do student affairs. I was a data scientist and I work for the service center where I would actually train students on how to do public social impact work. And many times there are a bunch of Stanford students who are superexcited want to do these complex things. And many times the nonprofit is like, yo, we just need you to sweep. Like that's been a problem for like three months. If you could just do this simple thing, that would be really great. So I think the first question before kind of going to technical is like, go talk to your stakeholder. And like, I want to know, like, is this going to be like a generalized model [00:18:00] where they use for multiple sites, or is this specifically only for this one organization? Can you find that context real quick? Speaker6: [00:18:06] Yeah, absolutely. So this is the context, right? The charity operates across London. They have multiple sites in which they which are based in the four quadrants of the NSW. Right. And from those multiple sites, they get volunteers in and they'll be like, for example, a driver that will be a warehouse operator. But each volunteer operates on a shift itself. Right. So if they don't get enough volunteers, the shift is obviously not filled. Right. And the amount of resources they distribute is limited. Like, for example, if a shift requires 10 volunteers, they only get one, then those resources don't get distributed to where they need it. So obviously what they're trying to do is they're trying to optimize that process. So this tool is just one of the things they're going to use or they were helping them build to help them in that direction. Right. So, yes, there is there is definitely a lot of kind of human interaction at play here. But the tool is, is there because right now they're just doing it randomly. So they just turn it off on the day and realize that there aren't enough volunteers have set themselves to be more. Speaker2: [00:19:18] And the reason I ask, like it was multiple organizations are they're gonna have all the different processes, but it's like one organization that simplifies things. And so I would actually like go to the stakeholder one, ask them like, are they going to be able to maintain a machine learning model for themselves? Yeah, exactly. Or is something like a really killer Excel spreadsheet with some cool macros going to be B the thing that that really takes over the edge because you can build this model that's like super correct, but they can only maintain it for like a month and that and then throw it away like it wasn't really helpful. So that's that's something I saw a lot with the social impact where I helped out with. In addition, I would map out like, why are the phases [00:20:00] of engaging someone for volunteering? So like, they they put out marketing for people to join. They sign up on this link, they choose their set dates. Right. And then once you know, those phases becomes easier to model out like what's really happening. And in addition, you can probably maybe reduce your scope where you're saying, hey, there's a large problem, our model can really help you in this specific area and this chain of events. Speaker2: [00:20:27] Right. So that's another component. And then another thing just off top of my head is something you should really look into is seasonality. So especially with food providing food services, one of the nonprofits I worked with was similar thing they had. They provide food services to the community and essentially they had a partner between grocery stores and getting the food there and then getting the food out there and the need. And so seasonality was a big part of it. So like from the US perspective, like Thanksgiving was a huge influx time. And so they got a lot of food in a lot of volunteers, the like, I imagine like in year when people are on vacation or holiday, what they call it over there. A lot more people may be out of town a camp volunteer, so that would be another thing to go talk to your stakeholder and figure out where the faces and what's the seasonality of things. And I'll help come across some gotchas and then I'll go put in the comments as well. Those models from health care. Speaker6: [00:21:24] So, Frank, you say on your own on your second point there as well, so this project came about because that analysis had already been done. So it's kind of already understood the kind of chain of events that leads to somebody turning up to the warehouse. Right. And they establish that the weakness was in that they didn't understand who was going to turn up to the warehouse. So they couldn't be proactive in taking steps to do that. That's where that's where the tool comes in. And just to give a bit of context, this is going to be a minimum by product for September. So we're not expecting [00:22:00] to kind of solve the problem end to end. And yet we consider this massively as well, whether the organization itself would be able to manage a machine learning model because of the Data drift retraining elements of it. So I'm also running this in conjunction with a software development team as well, that kind of taking those things into consideration and how we can best kind of simplify that process. But yet we do approach. The second part of the approach doesn't have to include machine learning. This is more to improve the model's performance as much as possible. It could just be simulated based off distributions observed in the Data as well. And that doesn't require any training of any model. Speaker2: [00:22:43] So I just put in the link. A book is called Modeling Public Health and Health Care Systems, and Viser brought this book. I took his class and it's really good. He makes things very simple, but particularly at two things. In there you have the cue optimization chapter in there and then they have a resource optimization chapter in there. I think both of those would be really good. Help you help you kind of give you a sense of like how things there and more importantly, what I liked about the book is that it talks really focusing on like conceptualizing simplifying the problem first before modeling. Speaker6: [00:23:13] Awesome. Thank you. Harpreet: [00:23:14] Let's go to Russell, then the Data engineer formerly known as Joe, who's now known as just first name. And then we'll go to to Eric has some great insights in the chat. We'll go to Eric after that. Before we do that, though, John, somebody on LinkedIn D'Wayne wants to know. Not Dwayne. Yes. Dwayne wants to know. Are you from South London? Speaker6: [00:23:36] Yes, I. I'm from South Florida when they go to London. Speaker2: [00:23:41] And I just ask a question. Do you understand? I will. I want to know, John, if the people or the volunteers are very Harp volunteers, is it always like the same similar people turning up or is that like a really different people every time? Speaker6: [00:23:56] That's a really interesting question that, Tony. So we don't have much [00:24:00] data on actual volunteers. We just have them as events right now. So when somebody signs up for the event that that person that signed up is not the person behind the event, that Data is not well tracked by the organization right now. So that is something that we could recommend doing. But then there's a cost of kind of tracking that as well. Bearing in mind it is a charity. Speaker2: [00:24:22] So thank you for your Harpreet: [00:24:26] Get out and no problem. Good clarifying questions are good. So we'll go to Russell, then we'll go to Joe and then we'll go to Eric. And then Monica's got some great comments as well. And then, by the way, if you guys have questions, please let me know. I will add to the Q doesn't matter if you're on LinkedIn, YouTube or Twitch. Go ahead. The question is, will the Q after we get you to have to get through this discussion without question, coming in from from Mark and another question from Wayne and then another question from Mahomet, both of whom are joining us from LinkedIn. But Russell, go for it. Speaker6: [00:24:59] Ok, thanks. Good evening, everybody. Speaker2: [00:25:02] So, John, a couple of questions. Speaker6: [00:25:05] It seems that you of to clear signs of the money you want to produce, one is kind of the forecasting of the requirements for volunteers, much as Joe's put a couple of good companies in the forecasting capacity model. So that's predicting when you will need people to come in and help you out. But have you considered also, if you have the people that are supposed to come in and you've got a fully subscribed event, but then people don't know, are you going to deal with that differently? Or you expect to do that within the same model? I would call plus those two things is kind of different. You've got your forecast. Plenty to understand when you need people and if that requirement changes seasonally or Data, weekly, etc. But then if people fail to turn up, that's a different challenge. You need to deal with that. You need to know if you've got to be on standby, that will come in. And if someone has failed to show without prior warning [00:26:00] and perhaps also some modeling on the people that are volunteering, if they're persistent, none shows everyone likes to contribute to charity to it. So it's a good thing. But some people may have a little bit of vanity. And as I like to think that being the good person, but it's not really, you know, first and foremost that they're. Speaker6: [00:26:24] Priorities, something turns up and it's pretty quick. Some charities in the past and experience, I don't think so. So two questions. One full costing your your requirement for volunteers. And is there any seasonality or change of your apartment or is it pretty much every day of the week? It's just you need, say, 10 people shifts every day to kind of do this kind of thing long term. And then secondly, how can you deal with families not just like you? I mean, those those are really good question. So on the first question there, there are a set number of required volunteers for every shift. And what we're trying to predict there is the amount of volunteers that are going to turn right. So some shifts, as you imagine, get filled with shifts, overcapacity, and those some shifts are on the capacity of the ones that are important to us are the ones on the capacity, because those are the ones we can act on and add value by doing volunteer reach out. But there has to be kind of a way to predict whether a shift is going to be on the capacity before the event occurs. Right. So that's that's kind of on the first question. On the second question, in terms of no shows, this is probably beyond the scope of the project. Speaker6: [00:27:45] And the reason I say this is you talked to there briefly about the at the volunteer level themselves. Right. If we could identify if there are particular volunteers that don't regularly show or regularly don't show. And to do that, you'd need [00:28:00] Data on the volunteers themselves, which we don't have. So we are limited by Data there. So there are there are challenges in that we're doing this project as a kind of goodwill project. So there are time constraints. It's not actually LBA. You work for the organization. So we have to build a minimum viable product that can add some value. But we're not looking to boil the ocean so we can build a tool that says, hey, volunteers, we can tell you what ships are likely to be filled by. The rest is for the charity to act on those insights and try to bribe volunteers to shifts as the tool identifies. So right now we're going from a state of the charity to just guessing and randomly kind of assigning resources to filling certain ships was moving to a more directed instead of firefighting, where we're kind of being more proactive about it. And that's that's kind of the goal Harpreet: [00:28:58] To do a good job. And then, Monica, also shout out to Toure. Good to see you again, talking to Brandon. Brandon again, I think Brandon and I got a topic that we're talking about in the messages on LinkedIn Data that you want to bring up. We should do that. But let's get through John's question here by going to Joe, then Eric and Monica. And then if anybody else has questions, let me go ahead and I'll add you to the queue and we'll keep on moving. Oh, yeah. Speaker5: [00:29:26] I think Russell did a good job of actually explaining what I was going to explain. But I mean, at the end of the day, it sounds a lot like a capacity planning problem. It's a classic supply chain problem, really. It's just you got demand, you have actuals. How do you deal with the the difference? Right. I would just go ahead and say those types of problems and just try and mimic the approaches. I wouldn't overthink it too much. Just how much are you expecting over time? Like on Wednesday? I need I'm expecting this many people. How many people show up? OK, great. How does that affect my other forecasts? [00:30:00] Right. So that's that's the approach for the demand part. As Russell also points out, you have to look at the volunteers who don't show up. Right. And that maybe if you have enough Data and you want to get fancy, do a cohort analysis of like the drop off, you know, over time the churn of volunteers. Right. So if there's a group of volunteers that have been with us for over a year. Right. How many of them are dropped off, what's the rate and so forth of volunteers that have started us for a week, how many of those are coming back the next week and so forth? I mean, that's another analysis you do separate to understand the behavior of these different buckets of volunteers. Speaker5: [00:30:38] So I'd look at it that way and then obviously just make a list of people like you and just get they don't hire again or just tell them to try harder to volunteer, because the problem is they have no incentive to show up. This is inherently the entire problem with the situation. There are not financially compensated unless they believe in the mission of what you try to do. To Tom's point, it's a human thing as much as a technical things. It's like you're paying these people. They really don't owe you anything except their time. And given how flaky the world is right now, people don't show up for it, even if they're getting paid a lot of money. It's just so volunteer. I mean, I don't know if there's a greater chance somebody shows up if they're not getting paid than if they do get paid. I'd be interested in that analysis. But I mean, as I pointed out in the chat, I mean, you know. A nice chair that I was expecting next week and other like, oh, we don't have any people to fulfill that that'll be here in October maybe. Speaker5: [00:31:29] Thanks, guys. This is happening everywhere. I talk to people who have restaurants. They're paying people. Whatever you pay people at restaurants, they can't find anyone to hire at this point. So I have to close a restaurant. So this is I would say maybe I'm not sure what's going on in the world, but I would say this finding people who reliably show up in general is good. If you could solve that problem, like I'm sure a lot of businesses would love to throw money at you for this program. It's of in the chat here that he's looking for companies that might be a startup. [00:32:00] You might have an app for volunteers like, well, that assumes you want to pay for an app of which you're making no money to begin with. But I could see businesses paying a lot of money for this thing. This is like the biggest problem going on. It's like it's not so much like having to let go of workers. Like, you cannot find anybody right now. It's insane. So soapbox done. I will be myself. Speaker2: [00:32:22] Thank you. Great insight, Joe. Thank you very much. Real quick, this maybe just off off first names rant is maybe the problem. Sam, isn't like predicting how many people you need, given that like the volunteer problem is hard to figure out who will even show up as maybe predicting like which events will be a failure or Vince will have like all the notions, we can't do it. And this shifting the perspective a little bit is rather than predicting how many people you need, just whether or not that will happen or not. And I feel like that potentially simplifies it a bit more or it could be more complicated. Know I'm coming this blind, but essentially, just like it gives you another option to provide that. And I still provide some of the solution, like, hey, this is going to be in danger, reorganize your resources or something. Speaker6: [00:33:04] Yeah, I take both points. So we have always thought we'd go kind of inside. And I'll move on to what you were saying, Mark that. So, Joe, I think you quite rightly pointed out that so many things kind of impact whether one but how many volunteers you're going to get. Right. So that's kind of where our approach has been in terms of trying to treat these volunteer show ups as a random variable, which means there's a probability distribution attached to them somehow. So that's kind of the approach is to understand using the data, look at the data, to understand the probability distribution of the amount of volunteers that turn up for, say, a warehouse shift across all the data we have and then a a probability density function that can help us simulate returns up to our shifts. Mark, on your point, we have looked at several different models of what our target [00:34:00] variable is. And we came to the conclusion that we don't think that it's a standard machine learning problem in the sense of is a label has some data fit your model to the Data based on your label? There's several this kind of several reasons. But one thing is kind of feedback. So briefly, what we make, what we mean by feedback is if we tell it, if we tell our volunteer manager, right, based on this data, no one shows up to this shift. Speaker6: [00:34:27] When they act on that, the next time that data feeds through the model, the model tells them, well, this shift gets filled. Right. But that's only because they've acted on it. So that's why that kind of supervised learning approach, we explored it and we realized that maybe it's not the best approach to solve this particular problem. And on this, just to kind of close the point that what you're talking about is kind of like a gap in people that have the right skills to fill positions. And I'll say now, like we're not we're not trying to we're not trying to solve all of the problems of the charity with this tool. This tool is really just going to be a minimum viable product just to solve this one issue, a guide guiding action. Really? That's it. We don't try to we don't even claim that once we released this tool that, you know, then all the volunteer management problems are going to go away because it is deeper than that. Right? It's there's there's more to it than that. So, yeah. Point taken on that area as well. Speaker3: [00:35:28] Real quick, if I may, Harp. Yeah, John, I think you're demonstrating good thinking here. We we're having fun in the Chatto. There's some good stuff there. Just the spirit of this is a Todrick you and I developed together. It's a spirit of imagine you've got this matrix, you've got your business needs and you've got the current Data assets. And for one approach, one modeling approach, but you may not have the Data is a lot of us were sitting [00:36:00] around or wouldn't it be nice to get Data on each individual humans reliability in this realm? Well, you may not have that, but you can start with an inferior model that does something better than, you know, statistics. And then you can start collecting data with the tools just mentioning and others. It doesn't mean just because you don't have the data that. Right now, does it mean you can't start collecting them to get a better model in the future? Speaker6: [00:36:28] Yeah, completely, completely agree with that. Like and that is kind of part of the longer term plan is to inform them, to inform the organization we're working with. On what basis should they should consider collecting? There's a cost to collecting data as well. So because they're a charity that kind of sits on their side in terms of the that's that's kind of a decision above our station, really. That's kind of more for the leaders of the charity to decide whether they want to spend on infrastructure to start recording that type of Data. But yes, I completely agree, Eric Harpreet: [00:37:06] Or Monica, just to want to contribute here. Speaker2: [00:37:09] So let me know. I will say something to take me like forty five seconds. So a couple of months ago I had to do something a little similar with prediction predicting stuff and yeah, like machine learning would have been cool, but I just didn't have enough data for it really. And so all I did was take Baalu day by day volume or I guess you could take shift by shift, volume by location or whatever. And I just did like a two week. I looked at it. Today is Friday. I looked at the last two Fridays and waited the most recent Friday a little heavier than the Friday before that to try and predict tomorrow or I guess Speaker6: [00:37:44] Today or Saturday or whatever. Speaker2: [00:37:46] And then if because it was available, Speaker6: [00:37:48] There was like there's like with like a little seasonality table Speaker2: [00:37:51] That we could say, OK, well, July is a little higher, April a little lower. And just like you could have like a spring, summer fall just [00:38:00] hardcoded in. Same thing with day of the week or it's like weekends, people are twice as likely. Speaker6: [00:38:05] And so I just put in a Saturday, Speaker2: [00:38:07] Sunday modifier and a Monday through Friday modifier and just hard everything in it works just fine. And I got done way faster than if I would have tried to throw stuff that I Speaker6: [00:38:17] Barely understand at it. Speaker2: [00:38:19] So anyway, they could do that. Excel, if you want to Speaker6: [00:38:22] Be some insightful. Definitely. And yet again, we are trying to overcomplicate it. We've cycled through several approaches and yet and seasonal trends is something we're definitely looking at in the exploratory data analysis. If anything, that's going to give us the level of granularity. At least we need to start looking at those probability distributions. Well, so, Speaker2: [00:38:48] Yeah, very similar Speaker4: [00:38:50] To Eric. I was analyzing safety incidents for utility company from a historical perspective. And just to see days of the week or certain months, if there were any trends, spikes in safety incidents or in this case, not enough volunteers, then to bring in that human aspect of, in my case, training those individuals during those times. Or in this case, if you don't have enough individuals during a particular month, maybe doing a marketing campaign or calling past volunteers to say, hey, you know, to get it out there and see if that brings in more volunteers because, you know, on Fridays you won't have a lot of volunteers get out of that. Speaker6: [00:39:34] Just want to go with you, Harpreet: [00:39:35] Though, Don, you got a lot of good advice here, like forty five minutes worth of good advice. So go back and listen to this. And there's a ton of stuff here in the chat. Let me know if you want me to read the text above for the chat. Happy to do that. Let's keep it moving to let's go to Mark's question that after Mark we will go to Duane and then Muhammad as a couple of questions. But Mark, go for it. Speaker2: [00:39:58] I think I saw a note to it like this conversation [00:40:00] before. And the question was so great. Like I really felt like a lot of the work Data scientists is like the same conversations I have my colleagues going back and forth of different ideas. So it's really cool, except for my question. Kind of similar vein machine learning, but with a caveat is that I'm not trying to send off a model. I'm trying to teach others about machine learning. And so I have four minutes. I signed up to this program and we're learning about Python and art. And so we're working on machine learning problem together. I found this really this amazing researcher collected all this data and Brazil public sources of infant mortality within twenty eight days, which is really standard 20 days, standard health care and cleaned up the Data. Where is amazing data set for machine learning has like 30 different Data features. Classified today was a mortality zero one for the it's over a million rows. It's really amazing. And so essentially it's that balanced classification problem for health care. The key thing is that we're not trying to ship it. We're not trying to show like this is what's happening. Some research thing. I'm just trying to teach people like about machine learning as like one of their first projects. And so with that in mind, thinking about balance classification problems, what do you think are some core things or foundations that you would wish you learned when you're learning about this, about machine learning? So I can better teach this concept. I have some ideas. I would love to speak to the mastermind of data science here to figure out how I can make that even better. Harpreet: [00:41:34] So so the teachable moment here is, is it how do I teach my mentees about the perils of imbalanced Data? Is that like the message you're trying to get across them? Speaker2: [00:41:46] That and also just take experience like their first project of like, oh, cool. I implement machine learning model as horrible accuracy, but I did it once and so now I got past that curve and worked through that and will post the data set as well. And [00:42:00] the chat, some people just have idea it's a really cool data set or really important asset to work with. Instead, data their first ever machine learning model. So I got a mixed bag of some of them have done machine learning before. The others is like their first time ever. So it was like trying to find something in between for them I think are pretty good, if I may. Yes. I mean, I know you're trying to throw them right in the water, but just all what I heard from you, I don't know. That's kind of like what I would act upon if you give me, like, a really hard problem that no way you get some balance, that, you know, you pick something and it gives me like thirty, I don't know, fifty percent or whatever it is. So I don't know how I mean, you know them better than I do. Right. So maybe that's what motivates them, that kind of like hard love. But also I think think about the domain that you're teaching them, meaning they are passionate about this project. I know that's where we're all health care. So finding health care Data is hard, right? Oh, yeah. Yeah. Because when I when I started, I remember my first project, the professor introduced me to the simplest Data said that everybody knows. Speaker2: [00:43:20] Right. And he's like, Antonio, did you know if you're on the Titanic, you'll drown? I'm like, what are you talking about? Like, how do you know this? And he's like, Well, there's this thing called machine learning. And based on your demographic, Data predicts that you were going to drown. And to me, that was like, all right, I've seen the movie, I don't want to buy into that. Kind of got me into it. But it was also like, simple enough where I would build something and it gives me like a ninety five percent or whatever percent prediction score and kind of give me motivated Ben Taylor, who is on this call. He always tells me, like even if you want to give him a Harp project, also give him some wins. If you want to stick with your project, maybe like, OK, this is [00:44:00] a Harp project, but we're going to celebrate once the data's clean. If you create five features, we're going to have a celebrate. So maybe the end goal in the model is not going to be that great. But making sure you have some some milestones along the way, I think, to keep them motivated. I think it would help you because just being like just printing this and they get lost, that might be a little demotivating. At least that's based on my personality. Speaker6: [00:44:27] A lot of kind of add to that as well. Right. So you talked about accuracy, that which I think is. Very interesting, especially for imbalance classification problems. So what I would say is get them to pay particular attention to evaluation and the confusion matrix. So especially in health care, that's really important because obviously having false positives, false negatives in health care means something completely different than it does to say what I'm doing right, because that the impact of that is huge. So I would say just a lot of focus on evaluating the results of a machine learning model, because the reality is, like with cloud computing tools like Amazon make, you can build models very quickly using Ōtomo. But not everybody understands the what the model is doing and what the results of that of that model. And I'd say that's that's probably the most important thing, especially in the health care debate. So imbalanced patient focus, particularly on getting them to understand what the confusing matrix is, is telling them about that, about what it's about as castigation. If you just say yes, one hundred percent at a time, the accuracy is going to be quite high. Is that right? Speaker2: [00:45:40] So that's another thing that made me excited about those Data. So I was like a star is a great example, especially in health care. The first question I asked, like, what's the cost of a false false negative? Like how can we frame this problem where we give our end results? That's actually something beneficial and not going to cause more harm. Right. So I really love [00:46:00] that that point. And also that's one note in the comments people are saying to create a balanced Data said a sample, and I'm totally going to implement that. And that's a great idea. Harpreet: [00:46:09] Let's let's go to that might have been Nicias comment that she was talking about. But if you go ahead and you have some vague comments in here that after initial go to a Tom, then anybody else wants to jump in here. Definitely. Brandon, love to hear your thoughts on this as well. Are you still here? Yeah, yeah. Speaker4: [00:46:27] Yeah. So am I am mentoring someone from India. That's and the we are working on a machine learning problem, not in healthcare, but Kaggle data set. I felt like that was the easiest way to go about introducing it just because we were of different domain. But in your case, you said everyone has to have get experience. In that case, I have. My suggestion would be to create a subset that is a balanced dataset from the mean one, to bring them up to speed on the basic concepts, especially the Confucian metrics, the two positive sensitivity and specificity, all those trade offs. And the other thing next, when you implement the imbalanced classification, one of the things to look out for would be the sampling techniques, sampling techniques come in place. Many people do not really know how to modify your model when you use a sampling technique. If you see even the literature for research on this idea, they use a sampling technique, but they do not modify their regular even to, let's say, regression model, because the equation changes when you do something sampling, but that something probably might be to be aware of if you are teaching them something on imbalanced classification. But my suggestion would be to go with a balanced classification, make it easy on them, [00:48:00] Speaker2: [00:48:00] Get that slightly Speaker3: [00:48:02] Over. Yeah, thanks. I was feeling left out because there's a chat going on with Meems about the Titanic and I've got a confession. I've never seen the movie, so thank you for letting me talk. No, seriously, you'll get a spot on some other episodes. You start with a simple balance Data stuff, because we have to create a pipeline. We have to create our toolset. But I want to just say something. I didn't hear anyone mention you. If you think of each mechanism we have in the pipeline, we've got our cleaning, we've got our scaling, we've got our feature reduction. We've got a feature engineering, we might add PDF. We need it. But at each element in the pipeline, we can try different things, different scaling methods, different different things along the way. And one of the things we can try is especially classification problems. If we're having to fill in a you you've got the random sampling. What people have mentioned that you've got smoke. There's there's different things you can do. You dial in different mechanisms. But I feel like men. I'm not criticizing anyone here. But imagine not I or the ones that really we like. If we see each other post on this, we're like, yes, listen to this. We need to remember to encourage people to use cross validation as much as you can. What, because you're not just looking for the most accurate model across all the folds. You're looking for the least the tightest distribution of accuracy's across the board. That's a big signal that you're generalizing well with your mom and by dialing in different methods of balancing the Data stuff, you can see which one's going to work best if you do that type of hard metric. I hope that made sense. That didn't think think probably put that in his book. Speaker2: [00:49:58] I definitely did. [00:50:00] And I appreciate because again, I am writing Google column notebook like tutorials for them and we do them together in our sessions. And so when I started on this one, I had a block. I was like, whoa, this is a little too intense. Let me go get some advice to figure out how to scale this back for them. So I'm so happy I talk to you all because I think I'm going to do something really good for them. Speaker3: [00:50:21] Now, if you start with that intensity of pipeline, you will overwhelm them. But if you do it, little steps, they'll go, oh, yeah, that makes sense. It's like telling them to jump over the canyon right away. But if you walk down. Speaker2: [00:50:33] Yeah, I as want to get some feedback as well. One idea I had and is to think teaching is getting over that, that hurdle of like, oh, machine learning is so advanced like there's no way I can do it. Right. I want to get past that hurdle. And the idea was like, hey, we're going to implement a machine learning model on this really bad Data the model is going to be horrible, but you're just going to implement something. And so you did machine learning what's right. Would that be throwing them too much in the deep end, or do you think that's going to be like, OK, like. Speaker5: [00:51:03] I think so. I mean, echo what I'm saying. I would Speaker2: [00:51:05] Say like take that Speaker5: [00:51:06] Approach, but starting with a balanced Data set, like a super easy one actually like it's a dead simple set. And then Mester Data set up like make it really imbalanced but use the same data set which just this is our second time doing machine learning and now they're going to see that. Oh jeez, that approach I took that first time doesn't seem to work the second time. Gee, what happened. Well, I'll do some stats on your Data said figure out propensity labels and so forth. Look at a confusion matrix. I was brought up a lot too, I think. OK, look at how imbalance all of a sudden the predictions are. And then D to Tom's point, how would we go about solving this problem that if the number of representative samples or have declined a significant amount as a student this way, what what do you think would be some ways that we could we could solve this problem? How would you how would you think about this? I would actually want teach. I always try and leave it open to the student [00:52:00] to figure out right where. It's like, you know, I'm going to give you some handsome to give you some approaches. But at the end of the day, like come up with a way to make sure that you can either re sample correctly or come up with a better proportion of of of labels such that you're not constantly going with this kind of accuracy paradox. Yeah. Which I mean that term very literally in the classical sense of prediction. Speaker2: [00:52:25] So definitely. Oh my gosh. Thank you so much. Again, like going from just doing Data science to teaching others like a whole other ballgame. So I appreciate this advice so much. That's great. Harpreet: [00:52:36] Great tips. Brandon, what do you think? Speaker5: [00:52:39] Yeah, I was just hoping you could build up on this. Right, because just the I saw in the comments, some people are saying you could even inject fake Data. Right. Mess up the Data in some way. Right. Because you're trying to figure out how generalizable Speaker2: [00:52:51] A model is. You can build Speaker5: [00:52:52] All the way to drop off. Right. Isn't this what Drop-Out does in neural networks? Right. You're trying to get it to be more generalized. You're dropping off, you're setting ones to zeros randomly. And then that has the equivalent of like on something a bunch of neural nets in one training run. So you can even go all the way to that kind of a concept just by starting with the basics. Speaker3: [00:53:09] And quick, add here that if you guys haven't followed my fake Data post, let me know and I'll send you the notebooks, whatever. If it's like Joe synthesizing, if you and many of you if you start with that basic set that you know how to create fake noisemakers, fake Data problems, the world's your oyster and machine learning. You don't have to wait till you find Data problems in real world data sets to harden yourself that you have another advantage because you created the problems. You already know in advance what answers your model should give you. You have like the solution manual, which we don't ever have in real world, real world problems to tell you your pipeline is not really doing the best thing and then you can fix it with the known answers [00:54:00] and work backwards. Now throw the real Data at it. You have a better clue about how to modify the Data as you go through to get to more accurate predictions. Harpreet: [00:54:10] Thank you very much, Ben. What about you? Any, any thoughts on on this topic. Speaker2: [00:54:14] Sorry, I was looking at your stuff. Yes. Harpreet: [00:54:17] That that's also an option. A deep learning with the Tensorflow TPS keeps. Speaker2: [00:54:23] What's the what's the with the quick summary of the question that was asked or a quick summary, essentially, I have a few minutes that did this program and. Teaching them how to code and do machine learning, just simple stuff. I found a data set that's pretty, pretty good for for learning. It's pretty cleaned up already by its imbalanced classification problem. Still world Data. And so just Invisalign like, what's the best way to bring someone up to speed? Not necessarily make this the best solution ever, but teach them so they get excited and fill that spark for Data science. OK, so I heard that question. I thought maybe a new one or another. So Mark, one of the things I like to do with new students is really try to light a fire under them with personal passion projects, because what you want is you want them to go off the next week and say, hey, Mark, guess what? I tried this when I tried on the weekend. I tried this thing a lot of times, like on the edge of jets and stuff like this can really wake them up. And the hope is like, and how did it go? And they tell you, oh, kick my ass. Like Saturday I had all these issues. I tried to website didn't work like you always want to of run into the weeds. And that's but the passion will help them kind of muscle through it. So I'm a huge believer in that just because that's the issue. A lot of these Kaggle data sets are too clean. They're not real. So throw them into the real world and have them areason, give them a reason to swim. Thankfully, all of them are health care background. And we found a worthwhile data set for health care, which is pretty cool. And that's awesome, too, because health care in itself is inspiring. Like what? Especially if you get access to [00:56:00] patient level data or something where it's like if you do this, it'll actually make a difference. And the patient level, it's wild public data set from Brazil, I don't know, it's public. Speaker6: [00:56:10] So AIs a monk is that is also a bad word. Speaker2: [00:56:16] I think I would love to teach them that, but I think it's outside the scope of it because we're doing a Google collab notebook's. And I think that would just be like a lot going on to switch gears, Emelle. But what's your thought on that, because maybe I'm missing something and I would love to hear your Speaker6: [00:56:35] Perspective, right? So I know that a lot of people aren't necessarily comfortable coding machine learning models from scratch, but that doesn't necessarily mean that you can't be an effective data scientist. Right. So really, what we're trying to do is we're trying to leverage automation where we can solve some of the common machine learning use cases very quickly. So if you're teaching a group of newbies to machine learning and data science, right. Why not leverage or to email? It will build a model for them. And then you could focus on the outputs of the model. What how well does this model actually perform? What is the confusion matrix that like what is a computer matrix is sometimes if you can get overwhelmed with a learning how to actually code a model active python, which is bandwidth for your brain, and then on top of that, trying to digest the actual concepts behind machine learning which already exist outside of coding, that's all kind of math and stats Speaker5: [00:57:36] Are automatic to learning about, I suppose, kind of like just learning how to drive a car and like taking the bus. Like, I think taking a bus will get you most the way. They're depending on the route. That's like auto metal, like it'll get you in a direction all the time. But it depends on what your goals are if you don't want to try. It's kind of like a picture taken. Fine. Speaker2: [00:57:58] So I think I think the main skills [00:58:00] I want to give them the skill set where they can know how to set up their own technical project and repeat this process over and over again and the future so that that way that I have an idea they have the toolbox to set it up themselves. Speaker3: [00:58:14] Just very briefly, guys, more than 20 years ago, engineering managers that needed strength of materials, analysis done or failure analysis done all excited. The finite element packages were getting easier to use. And they thought, oh, we don't need to hire an expensive engineer. We can just get someone to run the software for us. That mistake, it takes knowledge to clean Data the auto Emelle doesn't do everything in. The sexiest thing we do is clean Data and go talk, get up out of our chairs and go talk to the Data management entry system programmers to say, please don't let that field be nullable. I need that Data. Please do a type check on this Data. I don't want to continue to have to maintain my code to convert the strings. But then what happens when the mill fails and the person using it doesn't have a clue why? Yes, you don't have to be an automobile designer to drive a car. But what we're doing is not that. And if the email breaks down, it's sure better to have a data scientist at the helm that knows what the heck's going on under that would Speaker5: [00:59:30] Refer to the auto start startups for building auto immune systems. I mean, Ben and I have a common relationship in this regard. I will say that the problem, the problem, the main problem with all that has not disappeared since I was working on this back almost 10 years ago is the fact that your data set, the data set that you provided automobile. If it's a structured data set especially, it's that's that's the complete linchpin with what are auto models going to work? It's the input. If your data set is great, [01:00:00] you might get something. But at the end of day, it comes out of domain expertize. I mean, I've worked on more automotive projects with clients than I can care to count. I can play more machine learning projects and I would care to describe at this point, it's like one hundred percent of the time when you're doing unstructured data, it has to deal with the dataset that's been provided. So if it's been built, if that data has been built by somebody with the domain expertize to build a good data set, then I think you're home playing. But at the same time, if it's just some random data set that you throw in, I don't know if it's going to be a very random result. I wouldn't say if you if you if it if it predicts. Well, because you are lucky, not because you did great work. I'm dead serious about this. I've seen I've seen machine learning models where I'm like, it works because it randomly works. It works not because the Data had any signal in it at all. Quite the contrary. You're lucky it works and it will implode at some point, but you won't know that when it implodes. So anyway, so backs off. But I think that that you do need the domain expertize to Tom's point. Right. It never goes Speaker2: [01:01:03] Away. Absolutely. Harpreet: [01:01:05] Marc, I found others input. Speaker2: [01:01:07] My gosh, I'm set up for success now. I'm super excited this we don't Speaker5: [01:01:11] Get too eager Speaker2: [01:01:14] Right now. I'm right. I'm Brian Digitas. LinkedIn like this. I'm excited because I felt like I just I just avoided a huge landmine in teaching them and so very excited about that. There's probably some more out there. But all learning process, I'm just happy to. Let's be back and I I'm my girls, I'm sure, to make as far as possible for the students. Speaker3: [01:01:36] I'm encouraged to that. The Data robot, evangelist, evangelist, lest our current procedure not against Otto Emelle, but how to handle all of Bennett. Speaker5: [01:01:51] I have known each other a long time. He he he gets it. Trust me, Speaker6: [01:01:56] I don't I get I get a sense that Ōtomo is a bad word [01:02:00] for yourselves, Tom, and do Speaker5: [01:02:02] Not really know. And in fact I love Otto Amelle, especially when you're talking about images and I think things that are not like here's where I make a distinction. I think Automan really works well. I kind of write a poem out of that, but it works nicely on problems where you're dealing with unstructured data. So images, audio, basically things where you can take a set of, say, pixels and unwind them into a vector and then feed that into a neural net, for example. That's actually a much simpler problem to solve. And feeding structured data from a business system into an automated program like classifying images and audio and stuff I say is actually a far easier problem to solve unstructured data. And so I think all of our works great for a lot of us use cases, especially when you don't have pre trained models where you can just transfer learning, for example, going to a Google cloud, for instance, just using their autosomal image classification. It works most of the time. And if they're pre trained, stuff doesn't work, then they got enough robustness behind it. But I've not seen the same success with structured data sets, whereas tabulator that that tends to like implode in itself. For the very reason I describe it, every random, every structured data set is different. And for you to figure out whether or not there's linear AIs billion or Data, for example, whether or not you can pick up any signal with your features is a much different story just because it's inherently random, human generated Data that pixellated. So sorry, Tom, but I guess I Speaker3: [01:03:31] Was too busy making the fun jokes in the chat about Joe being a rapper and a poet and heretic and all of those fun things. But it's the spirit of the and I think Brandon was echoing what I was saying about the finite elements stuff. It's a spirit of what is auto smell good for John. It's good in the hands of the Data scientists to help them go faster because it would be like asking a non violinist, oh, this is the best violin [01:04:00] ever made. Here, play it. I need a lot of training. An automobile helps the good data scientists go faster because they know how to. Speaker6: [01:04:09] Yeah, yeah. I definitely agree with that. Like, that's kind of my sentiment on as well. Harpreet: [01:04:14] When you work with these tabulator structured data sets, like the reason classical machine learning works so well is because of the features that you engineer, that you're building out the complexity using subject matter expertize and domain knowledge to help a model learn. Um, yeah. So back in the Joe's point, there are some labi inside every moment of continuing. I got questions coming in from LinkedIn. This interesting question from D'Wayne, uh, might, might entice the right uh Data science or Data engineering, which is easier and why uh, if I could rip off them, post that thing on Burkhoff me a while ago, if you have to ask that question, then you probably shouldn't be in the field. Uh, it's this thing that that want to take a stab at this data science and engineering, which is easier and why. Um, so Speaker5: [01:05:05] I can I can speak to this because I've done both. So I don't know. I mean, I think it's a tricky question to answer because I think it depends on the background that you're coming from. So I don't know that I could you can't universally say, oh, that one's easier than the other. It would be like saying that. It's just it's a hard one to answer. But I would say, no, I won't. I'll use this litmus test if you have, like, I think a strong math background or a strong analytical background, Data science is going to be easier if you have a stronger software engineering background Data engineering is going to be easier because it has the word engineering in it. I'm kidding because you have systems thinking. Right. And it's a different type of thinking than analytical mathematical thinking. They're similar, but different. That said, I think that both of these fields, because of the way that tooling is being abstracted in both fields and maybe the the roles that you might enter as a data scientist or Data. And I think this is also changing [01:06:00] the role of a Data engineer, I would say, is becoming less about engineering and more about higher levels of abstraction and governing. Data, I was actually just talking to a CEO of a up and coming big name tech startup about this Data startup, and we're both in agreement. Speaker5: [01:06:19] I think Data engineering over the next several years is actually going to be more about old school stuff like Data governance, data management and less and less about data pipelines. That's a very much a social problem. You can you can find products that just do. This and so, you know, it's you're moving up the value chain in that regard, but Data sciences, it's the same thing we just talked about Auroville. And I think there's also libraries are increasingly making the role of a data scientist much easier. Ben Taylor, ventilatory cutting our teeth in machine learning. There are really libraries and that you have to you know, it's the proverbial hands on cars. It was a much different time. Now everything is very easy. And so I think that the role of a data scientist is going to change to it of being more of a domain expert with excellent analytical and mathematical skills. So to know where the puck is going, I would say in that regard to diagnose where you would want to fit into the world, but to make, to make, to answer it, there's not really an answer. I would say it more depends on what you want to do in your temperament. Speaker2: [01:07:17] So, yeah, I Harpreet: [01:07:17] Think that's honestly the only right answer to that question. Uh, I would have answered along the same lines. So let's keep it moving. A couple of the questions coming in. So D'Wayne, hopefully you you were listening to this great response and I'm sure a lot of us probably would have agreed on that. And if anybody wants to chime in on that particular question, let me know. Otherwise, I will move on to the next one I call. Let's move on to the next one, then. Next one. The question is coming from Mahomet on LinkedIn any recommended resources for people starting Data science and machine learning? And they want to familiarize themselves with computer science jargon, something that would help people with absolutely no background in computer science, the background in computer science to get [01:08:00] certain data science. But I would just say there's probably two books I'd recommend in your situation. One would be hands on machine learning, what they could learn and Tensorflow and the other one and ask for a data analysis that was written by Wes McKinney, the guy who created Pendas. I think you read those two books and if you really want to get you can talk about computer science jargon. Uh, grokking algorithms are probably a resource I recommend as well, Speaker3: [01:08:27] Uh, great recruiting days. And I have these feelings right now. Why sorry. I thought maybe you were seeing the chat. Someone mentioned pragmatic programmer Mark Freeman. Harpreet: [01:08:35] Oh, yes, yeah, yeah, yeah. So a pragmatic programmer. Yes. That is also a good book. Um, heads up. Keep an eye out for the interview I have coming very soon with the author of a book, Speaker2: [01:08:46] Uh, for I was just going to mention the pragmatic programmer. I read I go back to all the time just to pick up on some good coding habits, but more, more or less like that jargon, but more so like the thinking of a programmer. I read that book and I was like, oh, wow, I've been approaching my code the wrong way for a long time. So it's really good. Harpreet: [01:09:09] Yeah, that's a definitely a good book. And Andy Hunt is amazing. Um, uh, his other book, well, he's got many books, but pragmatic programing. But the other one was pragmatic thinking and learning, which is all about just how to learn and develop and the better. The book had an epic conversation with him months ago and that conversation probably released, I think, at this point, uh, a year after I recorded it. Uh, but soon the program to the podcast keep listening. Uh, John says computer science distilled by, uh, Waldstein Ferreiro, you know, um, might be good for you. Um, they're going to have it on that question. Uh, what can I chime in here? Yeah, absolutely. Please. Speaker5: [01:09:49] So I'll do my normal thing, which I seem to do every time here, is that if you can set yourself with a team with Data, engineers, software engineers, business owner, kind of [01:10:00] person, business analyst and yourself and maybe a few more data scientists. And I do most of my learning that way just because I joined an integrated team where everybody has the same goal as opposed to like you don't meet until the SBP level, which I've been in those kinds of words to here meet myself, the Data engineering, everybody. I just mentioned the report essentially to the same person and it just makes everything easier. I learned so much about software engineering from those people. I thought I knew about software until until I worked with them, that I realized I don't really know much at all. Harpreet: [01:10:29] Unless you can go to unexpended for your question, you have a great question. I feel like the spark off a really good, um, really good discussion. So please go for it. Speaker2: [01:10:37] Ok, yeah. So we were talking earlier about you start with a Speaker5: [01:10:41] Problem that you think is a Data science problem, and then everybody jumps in saying, you know, this is a people problem, this isn't mine. I jumped and said this is an organizational problem and this happens a lot. I think depending on your situation, if you're a startup or a big group, small group, etc., but I wanted to know other people's experiences on you joined a company or a project thinking it was going to be all Data science. And you're doing your work and you're thinking, you know what, this would be a lot easier if like, let's say, the health care workers entered the Data more cleanly, or in my case, it is the customer support people. You're supposed to say this is a duplicate issue when it is and they don't always do that. Speaker2: [01:11:16] It's throwing off my machine Speaker5: [01:11:17] Learning and it's like I can't even do it anymore because the data is too messy. In any case, there's like this debate about people saying, well, you know what, this is real life data science. This is data science in real life. And then there's other people. Saying just isn't Data science and maybe this company aren't ready for you. You should go to a team or something where they're more mature. So I just wanted to know at what point do people start thinking, you know, maybe this isn't Data science, I should look for something else? Harpreet: [01:11:43] Yeah, I mean, we're having quite a conversation about this offline in messages. We're talking about how I left the price just a week ago and it became that situation where it was like it was not literally not Data science anymore. Like I joined the company, I did a machine learning project. [01:12:00] And people love the results of the Machine Learning Project. It did well, but it turned out that in order for me to do any more machine learning, I would have had to done like the Data governance and all that type of stuff. And it was turning into a situation where, like, you guys probably hired me wrong hire in the pipeline like Data scientist should have been like the third or fourth Data professional you hired, uh, in this in this pipeline. And it became cool. It just it just became a rule that just did not resonate with me anymore because I did not want to do Data management or anything like that. And I don't know if I'm asking a question of just kind of going off on a tangent. But, uh, I guess once you get to the point where it's like, well, you know, you hired me for a particular capability and like now you're putting me in work that is not reflecting my true ability to execute and do something good, um, or something meaningful. So you need to be we to look for another role, that of thing. Speaker3: [01:12:53] I think this is common Harp. Yeah. Yeah, definitely. Speaker5: [01:12:56] It's super common. There's a lot of bait and switch I would say, and Data science. I mean it's, it's a popular honeypot, right. I mean what better way to get smart people in the door than just throw up at Data Science wanted sign and then let them all apply whether or not that's actually what they'll be doing. I mean, I I've personally been hired into roles like this where it was expected that you'd be doing Data science and doing everything except that right now. And so I think a lot of it comes down it I would say that, you know, it's it's a tough one, right. Because I there's some situations, I think, where people don't know what to expect out of a Data science, but they'll be ECAs Harp indicated that instead of being the third or fourth higher, that's like the first hire they make. We saw this a lot back, especially, I would say starting about twenty, fifteen, twenty, sixteen. But, you know, the high tech was like starting to take off and everyone felt like they needed a Data scientists, you know. And so you see a lot of hires, including myself and other people here. And I can't say if that was the best [01:14:00] hire at the time. You don't need that. I mean, I started a whole company around. This is fact, right? There's a reason I call myself a recovering Data scientists, because I don't think that at the time half the jobs I did, I don't think I was needed at that stage. Speaker5: [01:14:12] But I think if they needed a Data engineer or somebody help set the foundation upon which to Data science. Yeah, but, you know, that did say that you expect a data scientist to come in and to solve all your problems because it's been ludicrous. And thankfully, more and more companies seem to be catching onto this. But it took a long time and probably countless expensive to get to this point, this realization. But it still happens. I mean, we still rescue a lot of companies, you know, and help them hire Data engineers after they've hired a team of data scientists. They're like, oh, jeez, I know these guys have been the best decision. And the Data scientists agree that we don't know why you hired us. We're here. We'll happily collect our paychecks. But I mean, we want to be productive, right? I mean, because there's a sense of worth in your job as well. It's not just showing up to collect a paycheck, but people want to have an impact. So and I'm sure you saw this Harp, right. Like you went to this company with great intentions and whatnot. But unfortunately, it just you know, the timing is just wasn't there, so. Harpreet: [01:15:08] Exactly, a market has been committed in the chat go for it. Speaker2: [01:15:12] Oh, I didn't. Sorry. Speaker6: [01:15:15] Ok, I'm going to try this because like I said, I've just kind of recently started. Is that as a lead data scientist and I'm I'm the first one in literally building the team scratch. We're trying to digitally transform the organization. And yet I'm finding that four months into the job, I've not done a real single bit of data science in the traditional sense. A lot of it has been around regulation, governance. I try to set up a cloud platform and there's a lot of internal inertia because of just the web. We're operating in a regulated environment. Not that's that is the reality of a lot of data science jobs. That is that is what it is. And people have expressed [01:16:00] it is very common. So I think maybe you kind of almost need to redefine what you define yourself as data science and see whether that is realistic in the job probes that you're applying to or the job that you're partaking in. So kind of my new kind of definition of Data science is what can I do with Data to help the business achieve its goals? So sometimes it's not always machine learning. It could be some kind of some form of higher level statistical analysis that could help the business make a strategic decision. Right. So, like machine learning is usually automating decisions at scale. Usually if you wanted to just make one key decision, you don't necessarily need like a machine learning model operating at scale that you could do, you could do a Bayesian network, do some causal inference, and that could drive a decision as well. And that's adding value and kind of doing data science in the traditional sense that I imagine a lot of people a Harpreet: [01:17:03] Causal inference and networks. Cool, cool to do. Also shout out to the coauthor, Data McKinsey. You can tune in to that podcast episode with the coauthor of the book of Why Again, The Near Future. But it's coming out to talk to you. You're able to sniff out a situation and Speaker2: [01:17:24] Can save yourself from it. Yeah, I don't know for sure, Speaker6: [01:17:28] But it definitely once you guys started talking about it, I got an offer a couple of weeks ago, actually sought out Tom to talk through it. And there were a couple offers that came in all at once. So it was I was my twenty four years in the military. I was not used to any of that. So I needed some sage advice. So one of them was a it was a Data scientist position by title. And I Speaker2: [01:17:54] Went into the Speaker6: [01:17:54] Interview like the company, I applied for a different position that was an analyst position. The [01:18:00] company said, we'd like to put you in for this other job. I said, OK. And it was just like a whirlwind, like immediate. Next day I was in an interview with a board and I looked at job description. I was just like I might have like 30 percent of the technical skills that you're looking for. And so I just went into the board and just like, how am I gonna lie to anybody if they ask about these other skills? I'm happy to learn as quickly as I humanly can, but it's just not something I possess. And so they didn't really ask about it. And it quickly became apparent that the skills that I had that they needed was military experience, the ability to speak to senior officers and ability to lead and organize, which is fine, but work hard on those things. And they're the good skills and I'll use them against. The issue for me was it was not I was not going to be a data scientist. I was going to be in charge of some kind of low level staff Data scientists who were doing some stuff and some sprinkle some eye on things. But it was not not only was it not going to be me being a data scientist, it was going to be me in a supervisory position for which I didn't feel equipped. And I felt like there was going to be enough work, that it was going to rob me of the chance to to develop my own skills, that that everything was just kind of going to be panic mode and kind of the products that I got from my team, which I'd look at them and do the best I could be like, yep, looks good and and fired off. So all of that was it was just a situation I wasn't comfortable with. But it certainly was part of that carrot, was it, says Data scientist. Speaker2: [01:19:35] And it was it was slightly more money than the other two jobs I was offered. Speaker6: [01:19:38] So there was that kind of looking at me as well. But yeah, in the end, I just decided that the opportunity for me to learn and to to earn that title and to build the skills that I like now is more important and so tough. Harpreet: [01:19:53] Editorial on the other offers. The man that's the offer at the same time and [01:20:00] that huge Speaker6: [01:20:01] I was Speaker2: [01:20:01] Very blessed but Harpreet: [01:20:03] Or some good commentator and then realized horsetail here in. Speaker2: [01:20:08] Uh, we're still here. I'm not going Harpreet: [01:20:11] Anywhere. No, I was just thinking, you know, stage three, stage one right now. I've been looking at this one particular spreadsheet for the past Speaker2: [01:20:23] 15 years where they do complete planning for audits in Norway. Harpreet: [01:20:29] And, you know, this spreadsheet is Speaker2: [01:20:32] Collecting the correct data, but it's structured in the wrong way. Now, when Harpreet: [01:20:36] We're talking about stage one, stage three, you know, I have several ideas of what I can do with that Data if it was just structured correctly. So right now, I've developed a whole structure, a new method of collecting the data structure to Data the same data in a different way, which will then allow Speaker6: [01:20:56] Me Speaker2: [01:20:57] To Speaker3: [01:20:57] Do all the things that I want to do and the valuated to the users. Harpreet: [01:21:04] And this is tell me what it is. I mean, what are you going to a job? What are you going in to stage three or stage one or stage three or two? You know, at any given level, it is your job to bring people up to the stage three. If you come in stage three and you realize that stage one and two is not properly performed well as a responsible person on stage three, it's your job to make sure that stage one, stage two is working and God is part of your job, because once you get one and two working, then you can actually start performing. Right now, whether you are performing, performing stage one or to yourself, that's a different story. You may go in at the stage three and realize, oh, this is way beyond me. However, I am here to give the guidelines for stage one and two to work. And when those two steps are working, then I could come back in and then I can [01:22:00] really make value for stage four, five and six. So this is how I'm working. It doesn't matter. I mean, your skill set and what your job responsibility is, you have to tune it to the role and the function that you have, that that's how I view it. Or thank you so much. Harpreet: [01:22:18] Um, awesome guys. Well, that's some great discussion today, guys. I think we'll begin to wrap it up. Hopefully, you guys get a chance to, uh, to go back and listen to all the great advice that was given today, some firefighter advice, um, and, uh, hopefully get a chance to listen to the interview I released earlier today with Jaclyn Wells. She's the author of the book called The Fearless Factor. Uh, that was kind of a personal episode for me. I feel like I was very, very open with her discussing some shit that I was going through. So, uh, hopefully get a chance to tune into that. I really enjoyed speaking to her. Also want to show you guys some real quick. I am launching something very soon here, launching a course called The Employee Will Data Scientist. It's coming out and making this reality. And it's all about how to, uh, essentially how to turn your project experience into actual work experience. So it's pretty job and started creative portfolio in such a way that you actually get hired or complete a, uh, a portfolio project and or take home assignments that you get hired. So it's breaking it down like this. How to think like a project manager, how to work like a data scientist, Data that's project lifecycle, how to work in how I think like a scientist or about the scientific method problem framing, question design, how to ask good questions, how to make an effective analysis, how to create an analysis plan, where to find data for your project. Harpreet: [01:23:36] How things like an engineer all about how to code for reproducibility reusability when you shouldn't use notebooks and scripts. How to use GitHub. Quick introduction to doctor. What the hell does he even need to deploy a model to production then? How to think like a business person. We talk about the importance of an executive summary guidelines for creating executive summary, then how to tell stories after data science and then [01:24:00] I'll have a bunch of project templates. Um, through these project, templates will not be completed. They'll be blank templates, but be coaching along the way with their suggestions in the form of comments and stuff. Uh, how to navigate the interview process. That said so. Yep. I only did it when I took the course. Uh, the employable data scientist is the name of the course. Um, you'll be launching soon. Um, uh, I'll do a pre launch at some point. People want to join the pre launch. Ninety seven bucks after that goes up to uh, one hundred ninety seven dollars. Uh, just, just put it out there guys, that it's happening, um, superexcited to uh to do this and uh, I will be posting a few logos out there in the, uh, in the black channel that I would love to have you guys input on. Harpreet: [01:24:49] Of that three logos that I'm working with right now, uh, there's this logo, this logo, and that's a crepe. That's not a logo. Uh, this logo, this logo and that logo. Uh, and I'll put this into the channel and I'll have you guys out on that side. Appreciate that. Um, but yeah, I'm trying. Something different with my course, there's there's a lot of great content out there. Shout out to Andrew Jones, shout out to Avery Smith. They got those type of courses where they're just going to teach you all of Data science, um, from the ground up. My courses more geared towards people who are who've already learned the basics and who are maybe early career Data scientists, maybe just a data scientist or Data science intern who just are trying to figure out how to take themselves to the next level. Um, so, you know, this is kind of for you guys. So keep an eye out for that employable data scientist. Um, what do you think of the name and the name? Uh, Hebrew of me. People who me want some feedback on that. Yeah, I Speaker2: [01:25:51] Like it because Speaker5: [01:25:52] I mean, that's what it's all about, is making yourself employable or marketable like one of those too. But yeah. [01:26:00] Speaker2: [01:26:00] Getting yourself involved in out there Harpreet: [01:26:02] And make you some good comments in the chat. Thank you guys so much. Yeah. It's it's like one of those things I'm in the course of like this even good. I do it like, you know, I'm enrolled in like Andrew Johns course. And it's amazing. Like just to do a market research course is amazing. I was looking at every because like it's amazing. But then I'm like I'm doing something completely different from them. Like I'm not doing the same thing, I'm doing something different. And I feel like I'm addressing a mission and need that. That doesn't quite exist yet. Uh, then after this, I'll have another course that's launching that'll be all about how to learn more effectively. And that course will be super cheap. It'll be like fifty bucks for that course. Uh, an entire series talking about everything from how to read a book, how to write notes, how to actually learn effectively and quickly. And it'll be it'll be a lot of fun and I'll make that that's supercheap course so that, you know, it can be I guess I guess I'm jumping on the bandwagon, but hopefully guys reach out to a couple of folks here to do like a test run of it to get your feedback so you can expect some messages from you guys. Guys, thanks again for joining in. Thanks again for hanging out and been an excellent session. As usual, my friends, remember, you've got one life on this planet. One, I try to do some big cheers, everyone.