march05.mp3 [00:00:09] Oh, yeah, what's up, everybody? Welcome. Welcome to the @TheArtistsOfDataScience Happy hour. Super excited to have all the guys here. Man, I can't believe it is Friday. I really can't believe it's March already. It is Friday, March 5th, 2021. I don't know how the first two months of 2012, the one went by so quickly, but they have and I'm excited to have all of you guys here once again. It's definitely the highlight of my week to see all of you guys here. So thank you so much for taking time out your schedule to be here, guys. Hopefully got a chance to check out the episode that I dropped today with Dr. Sutton. Here she is a Gellatly colleges. I did not know what a Gellatly just was and I had to look it up. But apparently what a Gellatly just does is they study laughter and humor. [00:00:54] So we had a really interesting conversation about all the things that we can learn from stand up comedians, because a huge part of the research that she does involves her literally going to comedy shows and studying the way comedians tell jokes. So there's a ton to learn from from her and from her research on comedians. Super excited to have you guys here see a bunch of old friends. I see my good friend Accum right there, front and center. Quyen, good to finally see you in one of these. You guys haven't listened to the interview that I did with Quyen. Please do. She is a wise well beyond her years, definitely a future influence in this space. So check out that interview. Shout out to Christian Christians in the house. I see. Then I see Greg David Knickerbocker, Tom Ashan, Eric Altobello, me driving. What's up? It looks like a beautiful sunny day. We're a bunch of friends in the chat as well. We got Mikiko, we got, um, Carloss here. But wow, man. Good to see you again, Carlos. Right. Oh, man. Well, super, super happy to have you guys here, man. How's everybody been? [00:02:01] My Labra. I just started it just started. I was just like a punctual group of friends, bro. [00:02:07] They were calling you Carlos. They're not showing up for the last two, three weeks. Yeah, it's a long time taking a roll call. [00:02:14] I got five thirty meetings now. [00:02:16] It's it's a monster now man after the meeting on Friday, that does not sound like enjoyable catching every other Friday there. Yeah, guys. Well welcome to the uh to the outside. I see some new faces as well. Happy to see you guys here. So, yeah, man, how's everybody doing? The only questions, if anybody wants to take the floor, go for it. You guys know how it works. [00:02:37] So I have I think that probably a small question, but I was thinking about it earlier today. So somebody messaged me on my terms like, hey, I found this cause you think I should take it will teach me everything I know to be a data scientist. I was like, oh, so I looked at it on me. And it has a curriculum that was like a mile long, including this whole unit. I'm like more distribution's than I have ever heard. And so I was just curious in your mind, you could say in the last year or maybe even in the last five years of your work, many different kinds of distributions and you just haven't worried about, I mean, normal distribution for sure. [00:03:13] I remember I worked as a statistician for a while, so I've seen a bunch of different distributions and as an actuary as well. But ones that, like I could say, people should probably most definitely spend their time on obviously normal distribution, uniform distribution. I'd also say gamma exponential Pawson and Chi squared. Um, I mean t f I think those would probably be the more common ones to know that. I think we should just at least have an awareness of and what their shapes look like. I mean to a certain extent maybe, maybe the Tweetie distribution or any zero inflated distribution, because you'll see that a lot I think in e commerce most likely. Carlos, what do you think? [00:03:52] I'd love to lead a confession on this. [00:03:55] I was just on to chat. [00:03:56] My answer to go ahead distribution's there are great things to look up in your statistics book. End of story. I can't stand memorizing all the different types. It's OK. I'm going to need a distribution like this. What was that? Oh, I'm so glad I have these books to go refer to it. [00:04:13] Oh yeah. Definitely use. These are textbooks as reference. I mean I was studying for the actuarial exams. I took like four of them and I had to memorize the distributions and they were like just nightmares. But it's good to just have a couple in your back pocket. Um, just to just to know, like. Oh, that looks interesting. Well, would that be. [00:04:32] Yeah, I mean, this is and this is why, Harpreet, I save all of the most famous Data scientists student on the Planet Post's Greg Cuyo. He just shares the best post. [00:04:44] Someone he's learning is like, wow, I should try to remember that because I rely on you guys to remember on my behalf, because I can't remember any of it either. So I wouldn't dare use in any projects anyways. So it's not a compliment. [00:04:58] So, Carlos, you're saying as well you're mentioning some you put them in, the Chappells have that vocalized for the others and say one to the person who asked, hey, that's. [00:05:08] One course going to work for me. Just tell them yes, no matter what they send you, because it's going to get started and I don't like wasting time trying to find, like, the perfect one. So just tell them. Yes, no matter what they send you. But just to talk about Distribution's normal person, scared Waibel distribution, they're really useful for a B tests. [00:05:27] The thing to know is mostly like go ahead and open up R and just look at all the distribution that has come right out of base and just like go out and plot those. I'm like, OK, that's cool. And like they're different parameters than there because a lot of distributions are just like special transformations of more fundamental distributions. So, you know, to memorize them all, just like see a few of them and then learn which ones are kind of related to the other ones. [00:05:48] And then Google has made the I forgot about the Wible distribution. It's been a while since I used that one. But yeah, I mean, I like what Carlos said. If this is somebody just wanting to figure out how do I get started, where do I get started, just telling me to take that course looks good because see like and I like kind of so I don't like to argue so I'm going to put back on it just a little bit. [00:06:07] So he sent me that of course. And I have like there was like one specific course that I used when I got started and it was really helpful. It's like the Coursera and Professional Certification. It was a really helpful start. If I would have had to go through 20 videos talking about everything from the Bhanumati distribution and Data distributions. And I don't even know what else. Distribution's just to get to the first part where it talks about like how to write a for loop. I would have quit. I would have just quit and gone home then I am too dumb for Data. Right. You know, and so it's like I think it's yeah. I was like, OK, happy to see that we can maybe potentially find something that would put you on less of an intimidating, little intimidating path I think kind of sometimes sets people up to fail. Unless you're happy to jump around the table of contents, you don't have to eat the sandwich in order, then it's like dumb to do whatever you want to do. But I just think that that's almost it's like put there to be impressive, like, hey, look, you're going to learn all this stuff, which is why you didn't sign up for my cause. But that might be a little overwhelming. [00:07:05] I actually hundred percent agree with you on that because I think some of these courses do try to just boil the ocean. But I mean, that to me just sounds like all those different distributions. I didn't learn that in one semester of school, like three or four semesters of grad school that I had to go through to be exposed to all these different distribution with mean ones, you know, are you to course. Yeah, no, definitely not one to you. And of course. No, but Mikiko, looks like you have some great points in here as well. I'd love to hear what you have to say. [00:07:32] Yeah. And I think I feel like like at least 50 percent of the people on this call have gone through the gantlet of like, how do you offer it's like the bias variance trade off. Right? How do you offer advice? Says Generalizable, but also not like super specific to that one person. And you're also not spending a hundred hours a week. Right, where I agree with the whole like if they send something that's like above the bar, you know, like just say, yes, this is great. Where I like to intervene is in cases where it's like really costly, you know, because I think for me personally, I, I want to maintain like Data science and machine learning as being kind of an open space. So if someone's going for like a five, six hundred dollar course or if they're like, hey, this is boot camp good or whatever, you know, I like to either go, hey, there's this like zero dollar one that maybe you want to take a look at first just to sort of gauge your interest level. And if they've gone like, yes, I've done the whole gantlet of stuff and I'm looking for something a little bit more robust, then I might I'll take a look at the resource because I'm always on the lookout to feel like interesting stuff. [00:08:43] But then usually I would try to like, just gauge their interest, you know, like what type of work would they be doing? And then I, I will either then say sums up to the resource or just point them to like one other thing. I think they're like I've had this have in the past of like recommending five to ten things to people and then they just get super overwhelmed. And so for me, I'm just like, I'll go. That's the tried and true resources, zero dollars. And then just like an additional reality check with them, like, for example, if they want to be doing more engineering work, I would I'll just kind of steer them towards thinking about that. Or they want to be doing more research work, thinking more about that, and then kind of like leave them at that point to then like take on the next step, because there's no reason why, like, you should as someone who has paid, you know, a lot for, like self development, there's no reason why that should be like the first thing people go to unless they're like super, super committed to the path. [00:09:40] You know, there's a lot of few resources because most university courses will have their university syllabus and coursework just up and available for people to get for free class notes and all that stuff. But I mean, it's hard to kind of put on learning path as well, which is a bit of a challenge. But that is the excellent skill to develop is how to pay real learning path. A lot of great comments in the chat regarding, you know, all these different distributions, Russell. So you have some great points. We want to go ahead and share. [00:10:11] Yes, I was just talking about the more basic distribution with statistics, you know, for probability, estimating a single point, three points, etc., which which I tend to use a little bit more in the job that I do more commonly, which is kind of catalyzing the Data science and the business analysts and the business intelligence stuff. So so I'm kind of a little more deeply rooted. And those so those are kind of like second nature, but I just do more in those in those fields. So when you were talking about distributions, just wanted to kind of pull it back to that that basic level also. And I'm trying to remember the other point I said was that, I mean, when we go back up to the chart where there's been a lot of messages like what was the other thing I said that was actually just the point that you talking about trying to get to the kind of distributions. [00:11:00] There's relationships between them. Right. And I think that's. [00:11:03] Oh, sure. Yeah. That was the other thing. Yeah. So, yeah, just don't try to understand every single distribution at once, you know, start with some and try to understand them because there are connections and relationships with some. So there is kind of more of an organic learning path between them. You go from one and it's it's similar to another. So then learn the differences between that rather than take everything is not to say, you know, you can pick up a dictionary and try to learn absolutely every word. At the same time you go through sections and find the relationship and it helps you get that intelligence in the in the mind, you know. [00:11:37] So I don't think you. So before we go to Greg's question, I want to shout out Arsenis in the building. Have you seen us in a while? Nicholas, what's up, man? Good to see you guys here again. I want to ask you, man, if you were to relearn Data science machine learning, what what what would be like the first three things you would want to focus on? A learning? I'd love to love to hear from then on this. And Tom as well. [00:12:05] I you know, I don't know if I'd have three that were original, but I think I have one at least. That's interesting. I wish somebody had sat me down and told me how much of software development is B.S. and how much a corporate culture is B.S. and how much of you know the everything from research to applied, putting things to production is B.S. and how many layers of that you know, how much skepticism I should have had coming into the field. [00:12:34] I wish someone would have taught me to become a professional skeptic and have a way higher or excuse me, way lower trust setting so that when I heard something, because I spent a lot of my very early career and very early learning, chasing would very smart, well intended people put out that ended up being garbage and participating in corporate culture in a lot of ways that were not productive. [00:13:02] And so I wish like I said, I wish there was a class that you sat down and instead of teaching you what all this wonderful rosy picture of the corporate world and work world and the science world and the research side, the math side and everything else, I wish there was a reality check class. So I would say seek out people who have who have had enough of the garbage and like, listen to them for a couple of months because they will narrow down so much of what you have to learn to the very basics that will get you into the field and then you kind of get your own filter down. [00:13:38] I like that David wants to chip in here as well. So Data this year from you. [00:13:42] So it's just real quick. They're actually there actually is a course on that. And I'm reading a book right now called B.S. but spelled out and it's a bright orange book. You can find it on Amazon. And I started reading it yesterday. But it seems like the whole intention of the book is to call out B.S. in STEM jobs and help people be able to figure it out. And I agree with you as well. And that's actually one of my one of my biggest annoyances was that I actually thought that I missed the boat on the whole, I think because of how much people were talking about it a couple of years ago. And I was furious with myself because I was too busy uplifting ancient servers at the time. And I thought somehow I had destroyed my career by completely missing this whole thing. And then I was reading a book called Superintelligence that was saying that the A.I. revolution hasn't started yet and it's not really going to be here for a while and through ajai is not going to be here for another eighty years or whatever. And so there is so much there is so much hype and bull crap in this field. And being able to cut through it is important. And I sometimes feel like I come across trollish LinkedIn because I just can't be around BBB's. And so yeah. And part of your career, you're going to just become you're going to build up good defenses against BBB's. So there is a book on it. There's a college class on it that I've never taken. But there are people that are just as furious about this stuff. [00:15:05] So is that the book called Calling? [00:15:07] Oh, shit, I think it's just bullshit, but maybe it's calling bullshit. [00:15:12] Yeah, it's the art of skepticism in a Data driven world by Karl Bergström and Kevin West. Does that sound like that book? [00:15:21] Possibly. Is it bright orange? This was yellow or yellow? Yeah. [00:15:25] So I'm trying to get those guys on the show. Cool. Thank you. Good. Good looking out for that one. So that's Tom. Tom, do you want to talk about what you would study? The first two or three things appeared to go into Data science. And we'll get to Greg's question, I promise. Greg, little question. And while Tom is answering this question, if you guys have questions, go ahead. Take them out into the chat and I will put you into the queue. [00:15:49] I want to add that I've got a friend on here today. He's not very well known, but Ben Taylor was talking about the myth of job security, and that is a fact even for a PhD. In fact, this is perhaps just one of the many targets I've created for myself by being a PhD. But I felt like I got grilled unfairly, more hard on technical interviews because I almost felt like there was maybe a joy in taking me down, which I confess to everyone. I'm easy to take down. I don't know everything. All I've proven by getting my PhD is I can go deep and you know that that's all it proves, really. That's why it does stand for piled higher, deeper. But the real point here, I started doing Data science work my freshman year in undergrad. I didn't, but we didn't call it Data science back then. Why do I say that? I was learning least squares. What I'm getting at, it's it's not reasonable to expect your first role to be as a data scientist or a data analyst, get a good role that's close to Data where, you know, you can use Data to make a difference and build a name for yourself as someone that's really good at using data to make a difference and then build your resume saying I was in this role and I did this with Data. Now go find another role where you can advance from there. [00:17:09] I did this with Data before long. You've got a Data science resumé. You don't really have to have a Data science title to build Data science or Data analyst experience. So to me that's the biggest thing to look for. And so there's some great points in the chat here about bootcamps and various courses that weren't university and exploits good too. If you want to go back, get a master's so you can get a Ph.D. role excuse me, get a Data science role, do that. That's great. But if you just want to get into Data science without getting a master's or PhD, there are ways to do it. You just got to get pretty creative. And and I think to Ben's point, the way you create that job security is do a good job in your roles with Data practitioner type work, really solid work, and then build up that portfolio. It can it can be abstractions. If what you did on your jobs, you're not giving away company company secrets. It can show the spirit you have about learning the latest things outside your job in your GitHub profile and your stack overflow answers in writing for various blogs and creating your own blog. Really creating your own line portfolio has endless dimensions and I feel like I've talked too long. But it's there's so many avenues. Don't don't think there's one best avenue to live at. [00:18:39] Great message. Tom, thank you so much. I think Eric is actually someone who's really good at just making opportunities happen in any job. He is to use Data and just get after it. So he's definitely somebody guys want to reach out to. I just want to read a couple of things from the chat. Kristen says that it's tough as someone entering the field, the entry level jobs out there are asking for master's or Ph.D. seems to disconnect from the community, which really embraces the power of self learning over the degrees of these hiring managers, emphasizing the degree, not the passion and self learning. Excellent point. Think Eric Weber had made a post about this actually earlier today, and I thought that was a really good post. Somebody should link to it. You get a chance. [00:19:17] I would like to add to that. Yes. So I think like plus one is that the way I like to say, OK, I think this one is right, because I was surprised in a way that how people actually SQL talents. So before I applied for, like, you know, like try to make my resume may look nice and stuff like put it up pluses and there and then apply for a job job. But they don't reply to me. But like when I start coding my articles and, you know, like just post a lot, I would I receive like four or five offers just because they read my article. That's it. So I think now people feel more creative. Ken Lay rather than the resume. So I think they'd like to see them in the place that they can see how they can that can be they can continue the scale that they're working on, which they added to the organization. [00:20:28] Excellent point. Thank you very much. Really appreciate that, Greg. Let's get to your question, man. Sorry to keep you waiting. [00:20:34] No problem. I just wanted to build on Eric's question earlier. I guess my question is, how do you feel or is there any are there any courses out there to help you build intuition for connecting the dots between real life use cases and distributions? [00:20:55] So to me, I think, you know, you can be very good at the theoretical side of things. You can learn a lot of distribution and things like that. But building intuition for real business use cases. To me, it's a very good way to position yourself in creating value for for any businesses, any materials that you have for that would definitely help me in that sense. So just a general question for anyone at and let's hear from Ben Taylor on this then. [00:21:30] Let's hear from then and Mikiko Zagari guys put out a fire. [00:21:35] Can you state your question? Got my full attention. [00:21:37] No, I was I was thinking about. So what are the key thing for me is there is a lot of materials out there, but I do value being able to build intuition for connecting the dots between real life business use cases and say, a technical side, like understanding which the distributions might be a better fit to investigate and solve a business case. So are there any materials that we can use to kind of build that intuition or what is the best method to to be able to do that? So look at a problem and say, oh, this distribution. I can say that if we kind of research it and dig into it could be a good a good starting point to solve problems. [00:22:25] That particular problem, some more clothes in the applied gap of the. Yeah, that's a great question. I think there's definitely a gap there. I've heard people complain a lot about this because you have new students that read the content and then they're unprepared, but they read the content. They're still unprepared. And then you have people that have a lot of gray in the industry where we've made all the mistakes, we've paid the tuition. And so I'm curious what other people on this on the call think for applied resources? Because unfortunately, I feel like I just rely on my own mistakes and experience and for other people on the call sharing from their mistakes and experience. And that's definitely a gap that's been called out before. And I think actually that goes all the way back to that question on some employers want previous work experience. Like we we have roles open right now where we want four years applied experience where it's kind of hinting on this. I'm curious what anyone else has to say about this year from Whitecap. [00:23:12] Let's hear from then and then. Tom look like Tom had some great insight in the chat as well. [00:23:17] And then so we'll do then Tom and Mikiko think this is one of our biggest problems is there is no unified definition of Data science. There is no one way to do data science, everything from company size to availability of data for the company to capabilities to I mean, there's so many factors that go into how a company decides it's going to do data science that trying to design this curriculum around it is really, really hard to do in order to have the resources to build that comprehensive curriculum, you almost have to be a university. But the problem with the university is that their curriculum has to be much, much broader because they have so much, so much coverage that they have to do in a Data science program in order to attract enough candidates to make it viable. And so the ability for them to change and adapt as fast as the field is going is limited. So you have on one side bootcamps that really are under-resourced to tackle the problem. And on the other side you have universities, which just the reality of being a university makes it very difficult to keep up with the rapid pace of change. And then you have this final problem that all of us in the field haven't figured out what we really want to define, the Data science lifecycle, machine learning model, development, life lifecycle, the research lifecycle, the Data lifecycle. We haven't defined these in any sort of unified way. And so those three competing forces mean that there's there's it's mission impossible. We all say we need this, but you'll hear the education institutes, institutions kind of push back in the bootcamps push back. And I've had two different colleges ask me if if I'll build their curriculum for them. And so you can tell I've rehearsed this answer a few times. You can't there's no win here. [00:25:05] Well, is the gap reaction to. [00:25:07] That is the gap, the the missing applied data sets where the real world data sets are so much different, there's so much nastier, they have all these things that are missing from Kaggle and kind of the educational approach. Even in college, they may not be hammering on what we see in the real world where there's a lot of old crap moments and yikes. And why is this this way? The real world was was never nice, but in college, everything felt nice. Everything felt safe. [00:25:30] I don't think there yeah. I don't think there's any safe spaces left. A quick answer when it comes to Data. I don't think there's any safe space, you know, not from the social side of safe space. But I'm saying from the you know, this is a safe and easy route where if you do this, it's safe and you'll probably have good results. So it doesn't exist. [00:25:50] Yeah, it's there's no safety. Ben was actually giving you the best answer just because I have a few years on him that I can summarize it a little better. Greg, you've signed up to become is our dear friend Eric Sims, put it a mental fighter to get good at mental fighting. It's going to take blood, sweat and tendonitis in your tendonitis is going to happen right here and your mouse finger. But I can tell you something that I wish someone had told me when I was, at your point, more visualization. I can't tell you how many times I've wasted more time that I needed to because I didn't take the additional effort to put more visualization at each step in my process so I would gain more insight into what I was doing. And so you get in there, you work hard, you do visualizations, you ask yourself a lot of questions. What does this mean? It's like planning a seed in your brain. You might not get the answer till a day later to two years later, but it will come. Those insights will come. But it's just it's a long hike. And the more questions you force upon yourself, the faster you'll grow. [00:26:57] Absolutely. And one of the reason I'm asking is because I grew up in the French system and that system just trained me to memorize the theoretical world. And while I was cut off from the practical use of that theory. So even now in the in the industry now, it takes me a while. [00:27:17] And visualization is one thing that can help me pull like plug it in a little bit better with that theory in that real life application. So and to me, it's really helpful. [00:27:30] And this is something I wish that people who are actually going to school in that system can can can use to to be better. [00:27:38] So thank you for your responses, guys. [00:27:40] So if I can just go ahead, Harp. I'm just going to sorry it was from above Greg that just comes with doing real world projects over time and thinking a lot and doing tons of visualizations throughout the process. [00:27:52] Go to the office and say like in terms of like a book or resource. One thing that I've I've been reading, it's this is pragmatic thinking and learning. It's by Andrew Hunt, who wrote Pragmatic Programmer, who I'm interviewing next week for the podcast. And he essentially talks about this. This book is all about how to go from a novice to an expert. And then in this, he talks about how experts have developed this level of intuition and he really breaks it down. So are you in the black community, Greg, because you are I think there's a link to this book in there. So and sorry if I'm pirating your material and giving it to free for free to my community, but yeah, this book is really, really cool. [00:28:37] It's I highly recommend checking out. So I see tors hands up. I'll add you to the cutover, but before we get to Tó I've got up for a question. I go for it. [00:28:47] Ok, I have a question. I will freeze small biomedical company and there is no Data team in there. There is, there aren't any Data engineers or data scientists. So I'm trying to create a Data project myself and I'd like to find out from all of you or anyone how to create a Data project in my company. What do I need to do to the the management buy in? So the other thing is Data in this company, this is a family run company, so Data is protected. But I did bring up did suggest that Data can help the company. I want to create a project for them, but I need to get some Buy-In from them. So I need to know some steps as anyone who had this experience before, who can share with me a little bit of how you got the buy in and how you create this project, etc.. [00:29:42] Yes, that's a pretty big question, is like how do I convince management to take on a Data project and you have to be a salesperson in this case. [00:29:52] Right. So I'll open it up to the floor. But I think definitely start by maybe identifying a pain point that they are facing. Right. Connect that to whatever business strategy they have. Right. Because ultimately, nobody's really going to. Endorse or sponsor your project unless it really helps them achieve whatever objectives they have. So if you can tie it specifically to whatever objective they have in the next quarter, in the next two quarters, by the end of the year, and then sell it to them, you guys are trying to achieve this. Well, guess what? If we look at this Data over here and then I do these things to it, we can do this, which is going to exceed or at least meet your goal quicker. Let's let's open this up to, uh, let's open it up to Mikiko and then we'll go from there. [00:30:38] So just one question I have. So as a person who believes Data can do a lot of things, are you wanting to do a Data project for yourself or for the company? And what I mean by that is, is our company because so there are specific pain points that you are observing due to a lack of Data. [00:30:58] Yeah. So they use yeah. Yeah. I want to do a project for the company because I think they value from it and I feel like I want to add value to it and let them see a Data can help your company. So I see they do a lot of stuff on Excel for everything from customer complaints and they, and everything's on sale and creating documentation is all in Excel. And and you have they create essays because they're biomedical. So everything is done on Excel. So basically I want to take them one step up and show them, you know, hey, you can do this better than Excel. And I see a few things that I think I can help with. So basic. I'm trying to Data project for the company. [00:31:45] So, yeah, I know. Carlos, you have some good points, I guess. [00:31:48] Like what I would say are you just kind of reiterate Harpreet Sahota points, which is that at least from working with like start ups and early stage businesses and all that, a lot of times the minute you start bringing cost and sort of effort into the conversation, sometimes mom and pop and small businesses will sort of like shy away because they're like we're kind of already cash and energy strapped. So what I would first do is focus on the like, what are some like low hanging fruit sort of pain points, because a lot of times if you serve come in with like that first huge Data project, there's a chance of there's a risk of failure, but also, you know, it if it can be kind of challenging. So I would say first, I don't like kind of with some low hanging fruit problems, focus directly on what sort of they will get out of it. What is the specific pain points that you are solving? And then at that point, Carlos, you can chime in like here. I just feel like usually it's like you want to center the value, the conversation on like value pain point. And then after that, then you can go to like strategy and action because there's a lot of tools out there, especially open source and all that. [00:33:03] Let's hear from Carlos in the door. So I had this problem, but not in the biomedical context. Like, I was at a terrible job situation and they were just like I was working with word documents and excel documents that like not structured at all, no template. And everything was done by hand pretty much. And I wanted it to be a Data problem, too. Like I was like, I can make an app out of this. And I did eventually I want to make an app out of this bla bla bla. The first thing I did was just what's like in my control and like my input. They don't care how I do it as long as the right output. [00:33:31] So I just started there and my own templates studied how to get like Word and Excel documents but be read by my R script consistently. Once I got that figured out, I just ask for more work that I could just I could just eat because I already had it all scaled up. And once they saw that it was easy to say, Oh, I want to do like Data and all this fancy stuff. But let's start with the obvious process improvement template. If they're using Excel files, can you get them to instead of opening a new Excel every time, can you get them to maybe upload their Excel files to an access database that you make or can you get them to use Excel templates that you designed for them and then you just do it and only tell them once you generate something of value, like just do it yourself on the side a little bit to prove it out, because otherwise you're going to have this whole thing of trying to sell them something, but not in the mindset of by. [00:34:19] Ok, thank you, Carlos. And it's always a nice thing, right? [00:34:22] Like when you sort of build something on the site as a pet project and then you kind of present it and it's like wonderful to like your boss are like the key stakeholders and they're like, oh, will take forever to build up. And you're like, no, it's just, you know, it was like a week or two. Good. [00:34:35] I like how much value I could add something real quick unless you want you to go before me. [00:34:41] Just briefly, I mean, Carlos, pretty much kind of a strip right there, because to me, the the key here is that if you're sitting in an organization like I'm not a day, then this sucks. But I like to make my day more efficient and, you know, creating these standards, etc. I would start to go and talk to certain people that, you know, have a huge workload where they use in the. Tools, weather or forms or methods, and take that and start there and maybe take four or five samples and just start building something for yourself on your own, and when you have a result, then go and present it, because now you have to use case. You actually can show that if they were using this method, they would then be more efficient. And then you can start indicating the other benefits that come along with it by now getting centralized Data it's more in control, you can manage it better, etc cetera, et cetera. That is the key. Whenever I've done something myself, I've developed a bunch of models, I developed time sheets, et cetera, et cetera. Unfortunately, I've always done Excel for the last 20 or 30 years. But when you can go and show that instead of spending a week to two budgets, you can actually do it in one day. We using Excel and interlinking documentation and standardizing the format that's being used for input, that the feedback you will get is the question of how much is going to cost is just going to be a question of what can we do? Can you run with it and make a presentation and the budget for it? And at the end of the day, I don't know what kind of turnover you have here, how much Data, but start there. But that's highly what I would recommend. [00:36:25] Exactly. I mean, we do a lot of product documentation. I do a lot of internal audits and there's a lot of customer complaints and that everything is everywhere. And I think we can definitely leverage Data to kind of help those. I want to help them because we of is really good to me and I want to help her out and make it better for her. So, yeah, and I do have a question because during the Sunday mentoring call, I believe someone Mark Freeman, I guess he mentioned Eric since might have some ideas. So he told me to reach out to you too. So. Yeah, yeah, I see all that. Yeah. But, you know, your name came up and apparently you were doing a lot of Data stuff or creating Data some of your companies and wanted to get your feedback. [00:37:10] Cool, OK. And I didn't expect that to come along. Maybe we should chat after like totally caught off guard as I'm I mean you want to chat. [00:37:23] Yeah, no problem. [00:37:24] I'd love to hear from. So Greg, let's hear what you have to say. Yeah. I definitely hear from then on this and and then around the topic of how to get stakeholder buy in for Data projects that go figure. [00:37:35] Yeah. I think we all agree that the low hanging fruit are definitely the way to go in building your own tool. [00:37:43] Run that in parallel to what's currently being used by the many is probably the best strategy. [00:37:51] You build a scrappy solution and you showcase how it's running better. Now, once you prove that it's running better is my favorite way of convincing business folks when you want to be sponsored for a bigger project. If you want to scale your scrappy solution, you have to gather data for sure. And on that pain point, you have to showcase what is the risk. So you have to put the fear of risk into the business person responsible for that product. Because a lot of business folks don't want to say, OK, if I give you money to execute this project, how much are you going to save in terms of headcount, et cetera, et cetera, or how much more money you will generate for the business? It's not always about that. There's also the underlying risk of not doing that project because it could be lost. It could be a human error based on the current processes that you your team is using that cause. I don't know, a ten million dollar loss. If you can get a Data around that and put the fear of risk into them, they will get alignment on your scrappy solution if your scrappy solution is already working and showing better performance than the current settings. [00:39:02] I that. Greg, thank you. So let's hear from from Ben and then Ben. [00:39:06] I usually approach it this way. Greed, for lack of a better term is good. So there is someone along the way who makes a bonus every year based on some metrics or measurement, someone bonuses of of something that you can change. And if you can figure out what it is that incentivizes a person who might be one of the gatekeepers, who helps you get access to whatever resources you need or green light or, you know, maybe it's funding, maybe it's Data. I don't I don't know which ones really the biggest barrier to getting you to be able to move forward, but figure out what it is that they care about from a monetary standpoint and a lot of times that can be tied to compensation. So you've heard a lot of great suggestions. So to pretty much everybody's feedback so far into account because it's all really good. But if you find yourself struggling, occasionally, the bureaucracy can be moved forward by figuring out how you. A bonus and figuring out how their compensation is figured out and how their promotions are figured out, there may be some devious, but it really isn't. Everybody wants to make more money and you helping them is not a horrible thing, but that can oftentimes get things moving forward rather quickly. [00:40:23] I love what you just said then. So I one of the things I've heard people talk about is statistics have to move to CPI's. But what you're talking about is the estimate of past KPIs and the hours. So literally quarterly bonuses. So I've got quarterly bonuses for me and I can't exactly measure what matters. And I can imagine if someone's pitching something to me, the reaction would be I'm too busy, not a priority, too busy. But do they pitch something to me that fills that lines up with my hours, they suddenly have my full attention. And that's true of the other thing I was thinking about is you don't actually need their buy in necessarily. And maybe it just goes back to something that maybe Graig hinted that was or I was thinking that when Greg was talking is if you have a good idea and if you think you can get to success or something that will get their attention, you don't need their approval. Like you can go do stuff, work, work on the weekend, but really try to align with the problem. I think a common mistake in the industry is it's I looking for a problem and that's bad. That's really bad. You don't want to be there. You want to find a problem that matters and back into the appropriate solution. So start with the OK, start with the greed or what matters to them, back into the solution. And if you can compress that into 30 days and show a proof of concept and then have confidence behind that, because confidence is kind of a tricky thing, because if you're very timid and you're kind of suggesting like, oh, look at these results, it could help. They're not going to feel confident betting on you. But if you have some signal of success and it aligns with their ours and if you have confidence that this will work, these are the time scales. I think that's I think that works. [00:41:51] Then Nicholas has some really great points here, too, marketing Montreaux, because you want to change those. All right. [00:41:56] Thank you, everyone. And the market munchers marketing, they're really applicable to trying to get something across the line. I'm selling the benefits, not the features. It can be white. We typically try and fall back on the features of what it is that we're trying to sell. And we'll talk about things in terms of technology. But actually, if we can get as close to the wealth as the people that we're trying to sell to, we can talk about the bottom line, who can talk about cost savings. These are things that are going to give people who've got decisions to make. And and secondly, it's it's a bit of a funny one. I'm probably butchering the site because I'm not in marketing. But you often hear that people will walk to pleasure. But from Pan Am, often the best time to get somebody to act. I think it's a little bit similar to what Greg was talking about. If you can frame your idea in terms of refusal to get out will result in something terrible happening that will often strike action right on. [00:42:49] So I think that everyone looks like that. A ton of great advice. [00:42:53] Don't worry, this will all be put right up on the podcast and on YouTube along with everything in the chat. Speaking of the chat, shout out to somebody who went out of his way to do something awesome, and that is Emmanuel and want to butcher his last name. I said, that's a delicious yes. So he made this thing that just passes the text sorry, the the chat transcript text and like, pulls out all the links and stuff like that. Emmanuel, if you could link to your project that you did for the @ArtistsOfData right here on the chat so people can check it out. I really appreciate that. Thank you, by the way, for doing that for me. Let's go on and move to talk question after tour. We'll go to Akshay then after you would have Vivian Sartore go for it. [00:43:36] Actually, the question wasn't really question is more related to previous topic, but I just kind of want to follow up, but. Well, I'll I'll let it go for the next one. [00:43:47] But I don't think you think I taught literature. [00:43:50] All right. First off, I want to thank by sharing the news that last week we had some participants show their projects out of Data challenge. And with everyone's feedback on this call, they were able to make enough changes and present their analysis to back and the war or the best project that drove results. So thank you, everybody, last week. And I hope that students can participate in our college more often. Moving in to the question that's been bothering me since afternoon. I'm working on a project where I'm scraping off data from an unstable file I'm using beautiful. So just kind of make the JSON and everything look pretty and just output the files. [00:44:30] What I want from the output is just numeric text, which follows a certain pattern, which is basically an order number and the name associated with that order. Some try to output this into a dictionary. And so far what I have is an output from the demo. But I'm trying I'm kind of struggling, parsing out the unwanted tags or unwanted text that's in the output. So anyone has any suggestions on how beautiful. So I can combine with the regex and get what I want, then go for it. [00:45:02] So with these scraping problems, they can become. They can consume a lot of your time, and I paid a woman in India ten dollars to scrape the entire US sex offender Allstate's ten dollars, it would have taken me half a day to figure that crap out. I'm smirking over here because I'm laughing because what you bring up is actually a hard problem. But if you go on oDesk or Elance, hopefully that burns a hole in people's brains. Ten dollars to do something that I don't think I was capable of doing in half a day's work. [00:45:31] And she did the work before I paid her. Sorry, I didn't mean to ruin that question. That's what I always think of when I think of scraping. [00:45:39] Scraping is like problem after problem after problem after problem. [00:45:43] Sorry, can I have a bit of a lot of people say this and consulting you like we hear these problems all the time, like, oh can we use RPA to do optical character recognition, blah blah blah. Are you sure it's big enough to where you just can't have someone do it? And like for commercial clients, we say Amazon, Mechanical Turk work Fiverr. Like we don't we don't shy to say that it's probably not a problem bad enough to warrant programing. Like that's just reality. Like Amazon Mechanical Turk. [00:46:07] Whole premise is that I ask if it upsets you or if it brings you joy if you enjoy doing the work that do the work. But for me, it would make me mad. I hated that work. [00:46:16] I just get so frustrated that's happening with me right now. Like I've been I've spent like two hours looking up the code and I think I have the right direction, but it's just not working in the code. So I need someone who's done something similar and just tell me where I'm going wrong. But I would love to know who that person is like. Ten dollars. I would do it if I can get the work done. [00:46:36] So I have a question. So are you just trying to get the text out of a page of HTML? Is that what you're trying to do? [00:46:42] Yeah. So not just the text. There's two parts I'm looking at an order number which follows a certain pattern. So there is like a text starting with a few alphabets which have a pattern, and then there's a number which is an order associated with each person. And then I also want the name of the person which is in that tag. But again, I could pass that up. So in my output, I see the order number and I see the name of the person. But in between there is a lot of other garbage. So I'm trying to clean that out. Even though it's part of the same PD tag, I'm not able to filter it out. [00:47:14] So thinking of how I can maybe use regex or some other approach to get that OK, because I, I was working on a classifier last year where I had two pieces to it and one was to extract the raw text. I mean only the text and the other was I was actually interested in the tags as well. And so if you only if you want to pull out the text, that might make it easier for you for like entity extraction or something like that. Just disregard the HTML entirely and you can do that with beautiful get text. It's just called get text. And so that will throw away all of your e-mail if you wanted to just try that approach. [00:47:51] Ok, of course. And there's also another approach I saw like x2 XHTML to a text and it's like a Python library that you can add in. Has anyone use that like would you recommend that over a beautiful soba is beautiful. [00:48:04] So the better approach to this, just this beautiful super also end up in a weird thing where you're running like a V8 chromium thing to like you don't just use. [00:48:15] Yeah, you can also do like a combination of like I mean what I've done for them is like requests, Jason normalize and then test out the patterns on the regex or cookbook and then essentially like you can do like some kind of enumeration, do the lines. I mean, if you're doing this for like a commercial project, I would just definitely just pay someone to do it. But there is value. It's like learning how to solve the problem just because, like in a lot of like in some of the like the like the fēng interviews, for example, they're like hard core on data structures and algorithms and things like that. So it's like useful as like a toy project to do it. But that's kind of because I for me, I found I found people to like to be a little hard sometimes. So but the combination of requests, Jason, normalize enumeration through the lines from like standard python is actually really, really good. And it's forced me to get better, too. I like my my best scripting as well. So a lot of good things. [00:49:09] Ok, I'll definitely check that out. Thank you, everyone. [00:49:12] So one of my more one of my more repeated phrases on this show is what Mikiko said. Seriously, think about it this way. If you expect that you're going to need to do some web scraping in the future, get into beautiful soup, get good at it, it's not that hard. But if you're doing one offs, it's a pain in the ass. But it's worth learning once. And then once you have a library of routines, you'll be able to leverage from those in the future. And it can be quite powerful. But again, I like Ben's solution for one offs. And I feel your pain, Ben, for not being able to verify that. But we do believe you've been. We really do. [00:49:52] Yeah. Then also Cosigns Ichigo's approach. So there you go. Try this approach. And I think there's a text in the channel that's like a web scraping with Python, I think. Which is all based on beautiful sleep. So check that book out in the channel. I'll I'll try to find an entire unit. So the question goes to Vivian. [00:50:16] Ok, I put this in the chat, but I just would be curious to hear from anybody. What are your oh, crap moments like? Do you have any times when you did a project and then later realized that you made some mistakes, but it was already too late? Implementation began or completed and what did you do? And bonus points for whoever has the biggest mistake, who was to go? [00:50:40] I think my biggest mistake I've got a short one. [00:50:44] I know everyone's thinking they're going to talk about code problems 3:00 in the morning. I'm in our data center and I'm tracking a server and I don't know how to rack a server. And it falls out of the cabinet and 20 terabytes of drives go spilling out and land on the floor. And these are physical drives and all that data is bad. So I destroyed 20 terabytes of data at 3:00 in the morning and it was important data. I can't remember what I said, but it wasn't good. [00:51:07] So if if any of you've seen the movie Anapolis Bin, I want to thank you. I thought I could beat you, but you're my Mississippi dude. Thank you for sharing that. [00:51:18] You wanted a physical failure, not a software failure at all. I love that. I like all of us on this call. [00:51:25] Sometimes we think like Data scientists were so smart and racking a server. If you don't know how to do it, like when someone shows you how to do it, you're like the like, obviously. But if you don't know how to do it, I felt like a real dumb ass. And plus how much was that server is like one hundred twenty thousand dollars server that I dropped to like. Luckily it worked but the Data hurt the most. [00:51:42] So Dave Knickerbocker go for it I think immediate. [00:51:45] Ok, so this was like twenty years ago and it was something that I created and we still had very, very few users at the time. So this was like right after version one even hit production. But at the time it was a MySQL database and I was using a timestamp as a primary key, which was a really stupid, stupid idea. And so eventually I saw that I should probably just make the primary key in order increments integer. And when I did that, there there was another I kept the timestamp field and I just added the primary key. But my old method of updating Data was off of the timestamp. And so somebody asked me to update their data for them at the time and updated every single row in the database to it's just updated every single row in the entire database, which is the same thing as destroying all of the data. And this thing was so new that I didn't have good backups at the time or anything like that. And so we just had to learn from our mistake. There was only a few hundred, probably a few hundred rows in there at the time, but it was pretty embarrassing having to explain it to my boss and there is no coming back from that. So that's to this day, one of my horror stories. [00:52:56] Yes, I've had some good ones. I would definitely want to hear from anybody else who wants to go for it. But that's I guess my biggest fear was it came at the end of my tenure at a job that had a couple of years ago, which I friggin was hands one of my favorite job ever had. And I had worked on some stuff that I did not yet push into a repository yet. So there's a bunch of unchanged work. And since I was leaving the job, I just wanted to remove all of my stuff from my laptop. And I did a R.M. Dash are on my workspace without pushing my changes and it completely left them hanging. Had to redo a bunch of stuff from scratch. I feel bad for that one, but I was like, oh shit, Rodney, let's hear from you. And then anybody else wants to go go for it. [00:53:44] Yeah. So my biggest mistake was while grading a stats class of around eighteen hundred undergrad students, we had a sorting error when we were finalizing the grades and as a result of that we had to rewrite everything by hand. So that was that was in Excel. [00:54:04] After that, I switched to using databases to keep the data, which is a bit a bit safer also. Twenty years. Who'd like to go up next. Carlos was less than you fucked. [00:54:16] I had a really bad one. It's actually funny. And the winning work now. But I had been doing this modernization project. I'm trying to get a bunch of really smart people to agree on how to make a decision. Very hard to do. They're all very smart. We're like, OK, let's use analytic hierarchy process and explain to them what it was like, the YouTube videos and everything that I get to do it. So we spent weeks of time burning like hundreds of staff hours on this, like a thousand dollars hours at the very end. [00:54:43] We shovel ready to like make the deal Qualtrics survey and like implement the process. So they have like a way to make a decision, like comparing apples to apples across all these very different variables. And they were like, you know, this stuff distresses me out. Like, I don't think we're ready to make a decision, bla, bla, bla. And then I said something where I thought I was like me and I was like, this guy's such a douche, like always like saying. And it's like, well, that's because I don't agree with you, Carlos. And then the project ended. I think the manager quit and then like it's like we never did anything with that project, but I felt that way. So, yeah, we just wasted hundreds of hours and, like, destroyed our relationship with the team. But I think it is worth it. [00:55:22] That's hilarious, man. Let's hear from Toure then. Vijay then. Then. [00:55:27] But I mean, we all had our experiences, I'm sure, but, you know, getting training on using the backend and, you know, SQL and et cetera. But it was a program that's being implemented technically in the back end. They have a delete button. And well, while I was getting training pretty much so quick on the trigger, a little bit too quick that I figured I knew. So there were just about to talk about the delete button. So I hid it and deleted pretty much everything that was there. That took us about three weeks to get it back up and running. And luckily there was a backup from a week before. [00:56:06] So those things happen, you know, it's like we say should happen this year from you. [00:56:17] All right, guys, I was just trying to plug my camera so it doesn't want to run, so that's OK. [00:56:22] So this is roughly 10, 15 years ago, and I will share my experience with you. So it's part of communication also and how bosses should behave. So I got a project at the time I was working as a Web application developer, so I got to design the intranet for our very large local bank in New York at the time. And I was given the project specs. Somebody has already worked on some part of it and I had to just do some exercise packages, transfer Data, do some Etel. So I does that, I do that, test it. Everything is good on my local machine. I am signed off to go and install everything which I did on the production environment at the bank. I check everything in the bank, check my database, copy everything, run my scripts, everything is good. That's OK from the bank desk team that yes, the new functionality is working fine. I come back to the office, I ask them next day how things are going good or not. They said everything is kind of not good because they had to do a restore. I didn't get why they had to do a restored. So after three, four days I'm called back again. I do the same thing. Steps which I had done to make a backup of their database, run my scripts and then do it. And that the second time they also they had to restore the database again because my scripts had wiped out their banking customers information at that time. The reason for that was that I was told that this is a separate application intranet application, which we are running separately from the regular banking information. So I was not aware of that. That was not communicated to me at that time. So it just like sometimes you do the things which are right, you check everything out. But if you don't have the right communication right expects you ultimately have made a boo boo boo in that sense and somebody has to get fired. So the boss couldn't and his son in law. [00:58:26] So he had to fire me at the time and the same thing happened. [00:58:31] I just like as we were going to the bank. Right. I did. I knew that my boss is my boss's boss. How were they related? So I was just telling them that I like the Japanese way of promotion. The first way is getting married to the boss's daughter or the second one is a get your boss promoted. So I tell him that also while we are going towards the bank, they thought it was funny. [00:58:58] It was not. It was funny. Now, when I'm telling you a couple of years after that, but when I replay the whole scene, it was not funny at all. [00:59:05] And that's that's a good one. Before you get to the end, I like this message here from Ben. I've got plenty of Lennix. Camastra I thought Lennix bad ass and executed terrible commands. Yes, that's happened to me many, many, many, many times. [00:59:19] Then is it bad that everyone's mentioned something that I've done it sometime I've learned the hard way about rales on Cervarix. I have also made some mistakes with primary excuse that I regret at this stage in my life. I have deleted things. I trained a model on production Data once that that tends to slow web ups down. I just want to put that out there in case anyone doesn't know that because I did that once, know my worst. Oh, no moment was so I was working a client back in twenty sixteen and they didn't have a data set that they needed. They didn't, we didn't really even know what kind of data they needed. They didn't have the data set. We weren't sure. They're going to get access to it and they said, what can you do that? I said, Oh yeah, no worries. I'll just go find it. No problem. How hard can it be? It's hard. Data sourcing is really, really hard. There's no such thing as Google. Where am I going to find this Data? There's no stack overflow for data brokers and Data sourcing that. So never, ever, ever promise that you can find Data because you may not be able to. And it's so hard. I learned over three very painful months how hard it can be to source data that's high quality from third parties. That doesn't cost a billion dollars. [01:00:36] And this is getting really good when you've kicked off something awesome. Probably the best segment from Happy Hours as well. So let's hear from Marc and then from after Marc, we're going to hear from Greg. [01:00:48] This is actually pretty funny because I broke our product this week with the refactor I pushed in. And funnily enough, the fix to the refactor actually broke something else. But I'm almost save that story for another time when the wounds aren't as fresh. But I'll tell the story and something I'm really big on my LinkedIn as a stakeholder management. And the reason being is I've been burned very heavily on this, the biggest one almost losing a one of our largest contracts at the org. I was that now my current one by a previous one. Long story short, I used to work in health care for ophthalmology, so they had opthamologist records for all the United States. And interesting about the biology is we have to AIs at least most people do. And so that makes really funny thing for denominator's when counting diseases. So I had this project where I need to count a certain type of procedure for AIs and basically you get the counts. And the way this worked is I worked with sales. Sales of the design seems to like, hey, this we're trying to sell our Data can our data assets answer this? And many times the people they're selling to like PhDs professions 20 years. So this testing like, will we get the right answer there? You know the answer once we get the right answer. [01:02:04] So I pull the numbers. And this is my first time working at opthamologist, completely unaware of the two, I think it's called by laterality. And so my denominator's were completely wrong because I was counting per person by need to count per. And so I essentially provided these numbers. But in big, bold letters on the top, I said do not share until reviewed and approved. But like the salespeople are super like, cool the numbers in here. I can finally get this client off my back and give these numbers. And the numbers were shared and the shitstorm that pursued happened also right before I went on vacation. So I deliver this, went on vacation and got plenty of calls up like, yo, what happened? So essentially, I've really since that experience really put in a process of like really setting expectations, setting like do not share, also like knowing when to say no to things for like a crazy, crazy task. And then finally, just like put in processes where things can be documented and be like cutting steps. And so like from that specific experience, I like taking stakeholder management to the next level because like that was an intense experience. [01:03:20] I did not want to experience again, that that that's a good one as well. So my wife listens to every single one of these officers for I don't know why she does, but she does. But she's also a doctor and I know she'll find that story hilarious. And also it's interesting now that you've connected the work I do to work. She does. So that's that's pretty cool. Thank you for sharing that story and appreciate that. [01:03:44] The only one I can think of and I try to keep it deep into my memory don't to live it again. [01:03:50] Earlier in my career, I was in Supply-Chain and I was this cocky young Haitian trying to make a difference. [01:03:57] And I decided to automate scheduling for my guys who are actually producing the products. We sold and ran it all weekend. So he ran five million dollars worth of products that we were not supposed to produce and sell. So came Monday after the weekend to a floor full of bad materials that we're trying to figure out what to do. It so blamed a new guy, five dollars down the drain. And we're trying to figure out actually simple because it was a labeling industry. We're trying to find schools who would take these labels because the glue reproduced was just trash. So worst memory ever for me. [01:04:42] You say five billion or five million. [01:04:45] Not a billion. Billion just for a weekend to him. [01:04:48] That's crazy, man. [01:04:49] Wouldn't be here if it was five billion. [01:04:51] No, I want to be absolutely not alone. So someone who we haven't heard from would absolutely love to hear from. Tom, what have you done? [01:05:03] I actually was trying to get a child's attention off me because I thought Ben's story was so good and Ben, I believe in you. You can top my Transformers book commercial. I appreciate the praise, but here it is. Every time I've made a mistake, one of my hair has turned white. I think that that explains it. No, Ben, I got you beat, buddy. Don't even try. You still have dark hair. But I think now that this is kind of embarrassing that I'm going to blame you for this, I discovered a very some very dark things about myself in the middle of my career and I had to deal with them. And the mistake character wise was I was really upset with the way things were going at a company I was at. And even though I think I knew better, I just started allowing their bad actions to justify my own bad actions. And that was really hard to recover from, attitude wise. And in retrospect, I wish Data had a spirit of it. Doesn't matter how bad a situation you're in if you're not trying your best to be your best and make situations better. You're wasting your life. You're wasting your time. You're you're hurting your teammates. Even if they're all toxic by you not trying to make a difference, you're really hurting yourself. Because if you can just keep doing your best to be your best and try to make positive changes when you're in a toxic environment hellhole, think how much better you'll be when you're out of that environment. But, Vivian, that is worse than Ben's mistake. I tell you, I'd rather have made Ben's mistake than that mistake. But I'm glad to say it's the hardest lesson I've had in my life. But I did recover from it, and I'm glad for that. [01:06:54] Thank you very much, Tom. And a lot of a lot of great stories. And I guess since we're talking about white hairs, Ben being 37 and having all this great, I've got a confession to make people this. See, on me, this is dyed my my beard is actually quite white. [01:07:10] It's completely white right here. But then the rest of it is like soup, like patchy and white and weird areas. Once it is entirely white, evenly, I'm just gonna stop coloring it. [01:07:22] But until that happens, I you know, Harp, you're making a big mistake from what you're describing that would look so incredibly badass. You should just let it go. [01:07:32] Dude, I'm all about being badass, so I think I might do that for us in style. [01:07:37] Let it go that. [01:07:39] So I got a question here from Waco. Waco could not join us today. He's actually closing on a house. So congrats to Waco. I'm going to pull up his question here, because it's where he is. How do I got to make it? So where do we go? Like, just two lines looking great. But let's see what he says. Am I wrong in thinking that monitoring memory utilization would be one of the top priorities in selecting the right instance or instances to cater my workload? My experience with selenium is that it eats up quite a bit of ram, maybe due to using Chrome. I started working with auto scaling an elastic load balancing, but figured since my load doesn't change drastically from day to day, that a amount of flexibility might be a little bit of overkill. So now I'm trying to focus on simply getting the right instance to fit my needs and happen across a white paper. Addressing this monitoring memory doesn't seem to be a standard cloud watch metric, though. I wanted to make sure my logic made sense before writing my own custom metric to monitor my RAM usage and use that as a measurement to scale up or down as needed. Also, I was wondering if there might be any other things to consider going down this route that you find folks might be familiar with that might shed some light on any potholes before I take this path. Thanks again, Homy for everything. A thank you. You're welcome. I've got no clue what that question is. Even trying to ask some leave it up to people more smarter than me like Ben, go for it. [01:09:15] So the first question I'd ask him is, is this a personal personal passion project or is he doing it for his employer? [01:09:22] So, yeah, this was that question that he brought up the last couple of weeks when he was scraping. Yeah, yeah. Like like Cristeta from a website or something like that. [01:09:30] Yeah. I'm going to assume it's a personal. Ah well I do remember, I do remember him saying that this was for a client. Oh it's for a client. OK, because scraping with Amazon instances and some of the complexities talk about setting up that's, that gets complicated. Where I've done, I've scraped at least a terabyte of data I've downloaded before. [01:09:49] And I did that at my personal home where you have a machine, you have fast Internet, you're not nickel and dimed. And so that's the question I'd ask him is does he? And maybe he would say he just doesn't want to do that. He doesn't want to have a dedicated system because if you have a dedicated system. Memory isn't an issue. It's so easy for you to have one hundred twenty eight years ram and have a gigabyte connection. So I think I would just experiment if I was him. So I'll just wrap up. That's what I'd do. I'd experiment with because this issue is he was crashing, right? He was having issues where the instance would crash and I think is way too small. So just experiment, find the minimum instance you can go with where it doesn't crash. It runs four hours without going down. And then there you go. That's your instance, figuring out auto scaling. Some of the stuff he talked about, I'd say that's overcomplicating it does. [01:10:35] Then want to add anything here. Can I see you in the chat now? [01:10:39] Nothing really. Just those three off the top my head. And I'm I'm kind of shotgunning this. So I assume at some point memory and then eventually no bandwidth and then eventually drive. Oh, those are going to be three constraints that I can think of from scraping. [01:10:55] Why don't we go? Hopefully that that help me out. If not, we'll be back here next Friday. Any other questions from anybody we ask them to see you having trouble in? Do you want to have someone in there or. [01:11:07] Oh, I was just saying that because he was doing this for work and he was initially I think he was on the free plan. And so he was kind of asking, like, does he just need to scale up because he was kind of running into issues and maybe he's trying to figure out, like, is that his memory? Is that the main thing he has to worry about when it comes to this scaling? And I think, Ben, talking about find what is the minimum instance that you need to be on to get your scraper to run and then you have kind of a level from there and kind of experimenter's. Tom is saying, just figure out what works for you to get it to fully run. [01:11:43] Right. Awesome. So if the guys got any last question, go ahead and put that right there into the chat. What I did print that right now I'm reposted again. This is a link to go and essentially vote for the Data Community Content Creators Award. So I'm to tell you how this this came up. Right. So I saw some random, like, awards being handed out on LinkedIn. And then I thought to myself, I was like, wait, hold on. You can just give awards to people like a nobody gives you power. Yeah. You can just give awards to people. So I was like, shit, I want to give awards to people, but I want to treat this like like it's, you know, the People's Choice Award. And since it's me and I got swag, I was like, all right, let's make this like the MTV Awards as well. But I figured no one would do it if it was just me. So I reach out to Kate and I was like, look at this crazy idea. Let's have an awards ceremony. And she was like 100 percent on board with it. And it's happening April twenty seventh. This is powered entirely by you guys. So if you guys do not go and vote, then we're going to have like a shitty award ceremony. So please help us make this awesome. We got a bunch of categories here that I hope cause I know like as a consumer of information, as somebody that self-taught, um, these are the places that I go to for learning. And I know that a lot of you guys go to the same type of places for learning. So, you know, these are some of the the categories that that that I think will have some good votes. So please help us make this happen, guys. Yeah, I see somebody is unmetered, so go for it. [01:13:09] And this unmetered or Tom, I just want to say I got out and I just forgot to put on my I voted sticker for the day, but it's a great initiative, thanks to you and Kate. [01:13:20] I think that you guys and if you guys could just help spread the word posted on your LinkedIn rather than sharing the post just posted, because that'll get more traction. If you have anybody that you follow in particular, send them a link to this thing so that they can then send it to their audience so that we can get more and more people. What I'm hoping happens from this is that we just get exposed to a bunch of people who do stuff that we haven't heard of before. So that's like my biggest goal from this is let's hear about people that we have not ever heard of who are out there doing some amazing things. Um, so that's like the biggest hope I have to say. AIs guys vote. I'll post it again. Share it with your friends, family, Data lovers, of course, only for Data lovers. Let's make it happen. It's happening April twenty seventh live on LinkedIn powered entirely by you. I think it's gonna be awesome. That's going to be a really, really good time. Ben had a fake social media account that was cleared by USA Today. That's a good one. I don't see any of the questions in the chat. So if anybody has a last minute question, now is the time to go ahead and meet yourself and ask it. If not, we'll go ahead and wrap up you guys. Don't forget that we also do this on Sunday mornings. Um, it's just as awesome, but I know it'll be more awesome if it was all these guys are. So please do come off and go for it. [01:14:39] Yeah. So my question was just kind of around what's helped people kind of develop themselves and the success where they find themselves in the Data community. Have they had mentors been the biggest thing? Have has it been managers that have helped kind of let you explore what you want to do and where you want to go? I'm just curious what has helped lead people in this group to be where they are and where they kind of see themselves going, they will start with Mikiko on this one. [01:15:15] So it's how did I wander into the Data community Data group? So it's funny, everyone so everyone earlier was saying about their screw ups. Right. And I think my my my biggest screw up at being my career is actually what got me into the Data community indirectly. And you know, my first job graduating college was working at a hair salon. And the day I knew my time there was limited, it was when I actually spilled my coffee over the entire Internet. And so all the planning system sales are down. Their hairstylist wandered around going like, I can't get to my clients. My clients are canceling on me. They're trying to make appointments. And like most of my clients were tech people that time. And I was like, you know what, I'm tired of living here. I want to go into tech. You know, I want to try this new thing. But it was hard because I didn't have a Masters, didn't have a PhD like I, a pretty nontraditional candidate. So I kind of had to find my own way and I had to find. And the reality is that the thing that will always push me is challenging myself and doing the things that people did not think was possible for someone like me, you know, and that's ultimately what got me in today's science machine learning was because it seems very much so filled with like Stanford FC2. [01:16:30] It's like going out there doing like image like projects or whatever, deep learning. And I want to go on and explore and I come into space, you know, taught doing the boot camps. And that's still the heart of me growing in this career is the person who always wants to just kind of break the new glass ceiling, break the glass barrier, but to keep learning. And so, you know, now my work my focus is on engineering as opposed to strategy and analytics. But that's the thing that kind of keeps pushing me, is people within this community being entirely unlike what I thought the community was, but also being constantly supportive of my efforts to constantly push myself. And I find resources in books or classes and, you know, in communities like our city of science. But it's the people. And so I think we should all take upon ourselves to always encourage people to do their best. Sometimes that can be loving and kind, but sometimes that can be like a boxing coach yelling at you say you're not you know, I know you can do more. So I would encourage all of us to do that because it helps people like me for sure. Yeah. [01:17:38] Yeah, different. Want to hear from from some more people for sure. But for me it's been all the people out there who make the videos, who do the hard work, who write the articles, were posting content on LinkedIn like those are all the people that have helped me. And then I mean McCanless or podcasts, because I needed to find more mentors and I need to find some way to talk to people and make it worthwhile. So I like, oh shit, I'll make a podcast. People talk to me then. And that's the reason I did this and is blown and blown up into this, which has been absolutely beautiful. But I mean, everybody here has definitely been a mentor to me to some extent. But what else? Who else? Let's hear from a listener, from Ben and then Craig Venter. [01:18:18] Ben, Ben. Ben. OK, good. Sorry. I don't know why I have a hard time hearing the difference between my name and Ben. Wow. How did I wander into Data science and what was hopeful people being really understanding would be making mistakes along the way when we talked about all the mistakes and what was our worst mistake in that. But I don't think any of us would be here if a lot of those mistakes hadn't been accepted by somebody. I understand some of them have led to other people being terminated, but I had a lot of good mentors and I well, not a lot to good mentors. I took them back and a lot of good mentors led to good mentors. And a lot of people try their best to mentor me. I'm not the easiest person to mentor. So it was a whole lot of people putting up with me. So find people that don't mind when you make mistakes. Those are really the best teams to work on our teams and people to talk like everyone here. I mean, you've talked to each other, talk openly. [01:19:13] All of us talk about mistakes. Find a team that doesn't just talk about when they messed up, but they have that culture of if there's been a mistake, we're all going to fix it. Why? Because we're probably going to be the next person to make a mistake. And so we're going to hop on and help fix because we know you're going to need to next time and we're going to need you to keep quiet and cover it up. So, you know, find those teams that are not only mentors, but also not just supportive. Yeah. But like when things get ugly and you do the dumb things that I've done, there's there's a team there and they're really a team and they're really there to support you. And they're there for the weekend because I may have made some mistakes with primary keys in the past that I regret. So find that team those and just talk talking to you about support. But really. Really is willing to teach you when you learn and sometimes when you don't have that psychological safety is for sure. [01:20:09] Super, super important, Tom, because time is like the discos on that by the on here, for sure. [01:20:17] Well, it's just that in the beginning, God created dirt and then I was born. So, yeah, no, for me, it was in high school mice. I was an awkward jock, a swimmer, competitive swimmer in my swim coach was like a second dad to me was my biology coach. So obviously made A's in that class. But what did I suck in physics. But it was so cool. And so yeah, my first C I ever got in high school was in physics. So what I do, I went into mechanical engineering and I loved it. It was a struggle. I actually struggled a lot with statics and dynamics and later I could just think about it like it was a second brain. But I didn't really there wasn't really Data science when I started. But I fell in love with predictive analytics and first physics principles. But in grad school, we were starting to see serious limits to traditional methods. And so expert systems, fuzzy logic, neural networks came on the scene and I just latched on to that. And after knowing Fortran and basic and stuff like that, I learned see on my own and was doing some pretty cool advanced neural network programing back in the mid 90s, not at the level that John McCain was doing, but just fell in love with it. [01:21:40] And, um, but there are two very close cousins, the the multi physical system modeling and control system design, all the simulation you do with that. It's just that its first principles based in those first principles are based on, you know, basically Data science principles. But then over over the years, I started to see the shift in. Wow, wouldn't it be nice if I was thinking this, working with all this data now that I didn't generate? And what if I could create some visualizations and some simulations would if I could help my mechanical engineering brother over here that's got these great statistics models, you know, left this huge statistics models. He was trying to run an Excel, said, no, Marcos, we're not doing any more moving over to Python. But I started using Python twenty plus years ago and I felt like I was set free after and I would still love C and C++ to this day and even Fortran. But boy, Python was just this brave new awesome world is like a work drive for coding and all the rest was history. I think the python, when people that are new asked me what should I focus on, fall in love with Python and get really good at it. It's my top advice. [01:22:54] Thank you very much, Tom. How about that from Curtis, our Greg Desmond share here. So it sort of criticism of me. [01:23:05] I'm I'm very, very good at finding people that are ahead of me in areas that I want to go into. [01:23:15] And I'm I'm I'm very, how can I say, inquisitive by nature. So I would do as much research about, like, someone who is where I would want to go. [01:23:25] And just when I'm ready, I'll ask them questions about what also helps is just writing has been my biggest thing. [01:23:32] So I like I took a lot of courses in the beginning, but I found myself getting into a loop of where I was just taking courses off the course of the course of the course and not doing anything practical. Then when I started writing, it was almost like article, and I feel like I've got a little bit of responsibility because I started to realize people were actually listening to while saying I was like, what the hell are you listening? And yeah, I just from there, I guess it was kind of like the Mikiko effect where you're just thinking right now I need to just break new glass ceilings and keep getting better and better. And that's literally it. [01:24:05] Oh, I think you could Curtis. Yeah, I, I have I guess I have multiple things that that drive me a bit of it is a little bit clichéd. [01:24:15] So my parents perceived my father as an economist. My mom is a chemical and civil engineer. So I was born in a family, you know, if dealt with numbers have dealt with, you know, a lot of things, they pushed me to to be the best because they knew somehow they would send me to the US to to evolve. So the first half. So I just closed the first ten years of my of my career when I when I look back at it, it was based on, you know, this guy who shows up very motivated to to do the job well and kind of leaned on the manager to push them upwards when that wasn't even the case, did not take charge of my next level and got frustrated because of that, because it was every year revision, you get maximum. But then again, it's three per cent, five percent being. Would the same job responsibility? So one thing for sure I didn't know how to do is to network because I've watched people who I felt were not necessarily as smart getting ahead. So which means I was missing something. [01:25:17] The networking and the understanding how, you know, what is the gap between, you know, me being stagnant and knowing what I know versus moving around, getting in touch with people who look at me, move me the right way. So the next 10 years, what I've been focusing on recently, even prior to coming with me joining Amazon, was kind of like being very curious and start asking questions. And you do that enough, you'll find people who will take interest in you. And that's how I was able to find mentors who believed in me because I was interested in solving issues. And that led me to create, you know, cool tools that tackled a lot of stuff. Some of them was even like creating a Web app for customers that was simulating pricing of products so they can kind of shop around and build contracts with with us. And I was a product manager then not knowing too much about Data, but I forced myself to learn it, did some some some coding. But the things that I couldn't do, I got the experts in. Because if we think about simulation, we're thinking about a lot of statistics and data and being able to rally people around an idea that could really do some good things for the business. [01:26:36] Now, Amazon is kind of like I'm focusing on looking for people who are willing to see what I can do and help me transform my trajectory for the next five, 10 years. And they're really open to seeing what I'm writing right now, guide me, make corrections, etc., etc. And those are folks that I've really been pushing me in. The last one is with social media right now. I feel like this is my main source for getting so many information that I can gobbo, whether it's articles, the content you guys are putting out there, and me trying to put the pieces together. And I think that's what I'm better at or best at, which is understanding industry trends and where to plug in, especially in the world of data science, how to plug that in and to make a business work. So that's one thing I know for sure I want to continue doing and continue working with you guys, the professional ones. [01:27:34] So thank you that I'm an absolute love that some great responses there. I see somebody else. OK, well, guys, thank you so much for sitting here chatting. And apparently there's a recipe out there to clone Mikiko. It's thirty percent Netflix, three percent D rom com and forty percent personal self-help section. So if you guys want to clone her and her spirit, that's what you need to do. [01:27:58] I guess we need her reading and watch lists now. [01:28:01] As I say, swap out fifteen percent for the Art of Manliness podcast and the Zen of motorcycle maintenance. [01:28:08] So what's the percentage that goes to the @ArtistsOfData? AIs. [01:28:12] Yeah, I mean, what's the referral rate, right? [01:28:17] Yeah. This is my podcast. If you go to hell I guys. Well take care. Have a good rest. We can help you see guys here. Don't forget to vote for your favorite content created or forget to help us spread the word about this April twenty seventh you're going to see me like dressed up for the red carpet. It is going to be amazing. So guys vote, share with everybody and love. Hope to see guys. Also on Sunday for the comet Emelle sponsored office hours. That's Bitly. So be at the NY slash comet dash Emelle Dash H Register. Come through, hang out together, take care and have a good rest of the weekend. Remember, you got one life on this planet. Why not try to do something big? Cheers, everybody.