comet-ml-feb28.mp3 [00:00:09] Welcome, everybody, welcome to the Comet Emelle office hours, powered by the artists of Data Science. Super excited to have you guys here shout out to I totally get to see you again. Not in the house. We got Christoph Tor, Barbra and Robert Superexcited have all you guys here. So, Odali, how's your how's your week? [00:00:30] It's been good. It's been a busy week, but it's been awesome on my end. Just lots of moving parts wrapping up anywhere and moving into a lot of work and has planned for March. So there will be some comet news soon. I can't really discuss it yet, but I'm I'm really excited when we are able to share it. [00:00:48] All right. I'm and I'm looking and looking forward to hearing some awesome news. What else do you add up to this? We can learn anything new, do anything interesting in terms of work or or just just big deals, that comment? [00:01:01] Yeah, I'm mostly just trying to create webinars and get some of our more educational content out there. So we're really working on some new product videos and some guides to help people get started all. [00:01:17] And I can't wait to start sharing that because, I mean, having an experimentation platform for machine learning project is super, super important, especially, you know, having had to do it by hand and manually previously. So this is awesome. You guys are coming up with. So guys, if you guys have any questions, go ahead and put them right there into the chat. We'll get you guys cued up for questions. But in the meantime, man, I really I was wondering if you'd be interested in talking about Data science hiring process and what are your thoughts on it are so so I want to caveat this by saying I absolutely love my current job. I love my company, I love my job, probably the best job I ever had, best company ever worked for. That being said, one of my other jobs, you know, so my entrepreneurial endeavors is being a mentor AIs dream job and part of being a mentor ideastream. That means that I'm mentoring and coaching people through the job search process. So in order to keep current on the job search process, I will randomly just apply for jobs. Just one click applies on LinkedIn easy applies. Whatever, just just to just to do it, just to get interviews so I can practice interviewing so that when I coach people on interviewing, I'm fresh. [00:02:29] Last couple of jobs I had an Data science. I get a rigorous interview process, but nothing really crazy. And I think now that we're in this remote world, I've been just applying for jobs kind of everywhere, just getting to see all across North America, hiring practices are like and it's been interesting for me because I'm like, damn, some of these questions are insanely difficult. Um, you know, I'll be getting these take and not necessarily a take on science, but the the initial I was like a text me. I got one of those things you do on quality code hacker, what have you. So I've gone through a few of those and you know, they give you a generous amount of time to finish it. And I'll probably get through maybe four questions out of five. But like, I'm my brain is sweating at the end of every question. They're pretty difficult. And in the years I've been working as an actuary, as a statistician, as a data scientist, like I don't think I've ever dealt with, like that kind of stuff never came up in my day to day work. So why is that? Why are why are we making it so difficult for people to even get considered for an interview? Like, what are your thoughts on that? [00:03:38] Yeah, I think first, it's it is a really competitive space as a job that's kind of seen as sexy and seen as the hot new thing. I think that definitely has a large role to play. And I'm with you. I mean, even in the past, I've gotten technical screens that were maybe twenty five, like deep machine learning questions like I can't answer these in a row. And then obviously it makes you feel really like insecure. It kind of brings up those feelings of imposter syndrome. Like I'm not supposed am I really supposed to know all of these. I do think part of it is having and what I've experienced and learned from all of the like interviewing is it's also OK and more accepted to say that you don't know something. And even in the moments I've done that, the person interviewing me has been like, oh, you know, when I interviewed here, I didn't know the answer to that. And I'm like, oh, OK, I totally get that. I think sometimes I just preface that with, you know, maybe I was doing more traditional Emelle and they're like a deep learning shot. And so it's understandable that my experience has not been all deep learning. So being able to kind of assess why and separate that from the feelings of like being an imposter. But I think, like, we're in a incredibly competitive space that really an industry has, like Moorestown. Software engineering, so I think they've just kind of doubled down on what those practices look like. So instead of whiteboarding, it's take homes, but it's really difficult and I think so many times hiring managers and well, really, people in H.R. don't totally know what they're looking for. And so you read a description and think it's the right kind of role for you. And then you get to that tech screening and you're like, is this like all of the things I was reading when I applied to this job? So there's also a big mismatch there. [00:05:41] Yeah, I mean, definitely I get the Take-Home assignment like that totally makes sense. I think that should be a part of every job screening process. But having these, like, crazy teaser problems like that, to me, it does get discouraging when I come across one and I'm not solving it as quickly as I think I should. Or maybe I can't pass any of the test cases and, you know, I'm just throwing in the towel, but it just becomes really, really discouraging. The imposter syndrome kicks in, like you mentioned. And this reminds me of back when I was, you know, about 10 years ago when I was studying to be an actuary. Actuarial sciences, the actuarial profession. They've got a series of exams that you have to take in order to become like fully qualified as an actuary. So I'm wondering, do you think Data science can benefit from something like that? Like should there be industry specific type of examinations that we can go through so that we can minimize all this this kind of work up front? I don't know if that makes sense of my question or what you think about that. [00:06:41] I definitely think that would be helpful in that for a lot of other roles. Like you mentioned, there is certifications once you pass them. It is basically a checkbox that you have a certain baseline understanding of specific concepts. So you wouldn't have to be asked like what is the central limit theorem? Things that are fairly standard across the vast majority of job rules. But I think the biggest thing holding us back is that from one job to another, Data sciences looks drastically different. So even though we can have these baselines, how we're kind of working in organizations doesn't necessarily match up for every job title. That's data scientist or even data engineer at different companies. [00:07:30] That's a very good point. Like data science is different regarding the company. The different industries know to Data research rules I think are actually the same. And I think that that's an important point to make. And it's likely why the hiring process is as intense as it is. Um, so we've got a couple of questions here in the chat that I'd like to turn over to the audience. I just want to go ahead and go for yours. [00:07:53] Yeah. I want I have two questions. One question is, how does one prepare for these technical questions? And, you know, how do you study for it? How do you I mean, I don't want to memorize, but I mean, I guess some you have to memorize. I mean, how do you prepare and how long does it you have to take to prepare for it? So this is where I struggle. I know. And it seems like the asking you for everything and when you're studying, you know something, but not everything. So how do you prepare for this kind of questions? Like what Harp you're saying? Some of them are pretty insane. And I was like, by I mean, this is like there's a mix. I mean, it doesn't make sense to me. [00:08:33] So, yeah, I know what you mean because it's like these questions that we get in terms of, you know, some of these hacker challenge questions, coding challenge questions like unless you take one of these specific hacker courses or whatever it is, all those practicing medicine and do a bunch of those, you really won't know how to do those. I mean, I don't know if anybody goes to school or comes out of school just knowing how to solve some of these problems. Maybe they do a computer science major, I'm not sure. But I mean, that that's a very good question. Like for me, when I when I think of what I should know from I guess it's statistics and machine learning standpoint, I kind of just focus on the basics and the fundamentals. Um, you know, those are the things I want to concentrate on, like, you know, core statistics, foundations, classical machine learning foundations, the Data science process. Right. How do we go from raw data to to decisions and what that pipeline looks like? And do I have a principled approach to to make that happen? So I feel like it's important to know the process. Right. And how you go from Data decision. So that's what I like to study for. [00:09:36] But I don't know I don't know how to prepare for some of these coding challenge questions apart from I think it's it feels ridiculous because in real life, I think we are pretty much Googling stuff in our code and stuff like that in real life. I just don't understand why they can they cannot emulate that. I mean, I guess you need to answer this interview questions, but yeah, it just seems ridiculous. Yeah. [00:10:00] I think it also depends on the kind of company that you were targeting. So when I have interviewed I. Notice the starkest difference between going for like a Google or Netflix in comparison to even a mid-sized organization or a startup and drastically different kind of reprocesses. So I would say for a lot of people who are looking for like job title, job number one have like Data scientists, it's often better to scope like the like middle midsize companies. And that, in my experience, the hiring manager has a better idea of what the day to day is like. And when I've interviewed it, those the size of company, it's been more basic questions about statistical understanding. And then they've had me really dig into past projects and really assess my ability based off of how I'm able to communicate, why I made certain decisions and in feature engineering, or why I chose certain modeling architectures over others. So that's my two cents. If you if you're targeting like one, I'd like the big thing companies. I think the most successful advice I've seen is to like sit and go through, like knackering challenges to get past that code screening. And then you can show off your, like, little abilities when you're talking about past projects after you've made it past that. [00:11:35] So how long do you have to spend time on, like, two days, three days practicing this? Or I mean, what is your I mean, I don't want to go back to early because you might be too late and you forget everything. So, I mean, how long you have to prepare and what you need to prepare for a statistics more as a programing or things like that. [00:11:58] I would say from what I've seen and this is when I see what I've seen, this is like reading a ton of those articles, hey, here's how I got my job at Netflix. Here's how I landed the job at Amazon. It seems like most people tend to study for this for a couple of weeks to a couple months. And I think that's from a lot of the advice I read. If you are going for like a Data analyst, Data scientist job, you might be going through the medium and hard questions on hacker rank. So making sure that if you can tackle one of those medium or hard questions, you can probably tackle what they throw at you in the interview. And same thing goes for like doing the Python style questions, being able to tackle the like, creating a palindrome and those kinds of just general coding questions, at least for the technical screening. That's what I work a lot quicker than other strategies. [00:12:58] And something I like to always tell my mentees is use the job posting itself as if it were a syllabus for some exam that you're studying for, because that will likely be an indication for what you might get ask in terms of questions during the actual face to face or during a coding take home challenge. So that's one bit of advice, too. But ideally, a question for you here. So when we talk about medium hard skill questions like what constitutes a medium or what would be a medium concept, right. So I mean, we can I think we can safely say that maybe selecting Data from one table might be a easy concept, but maybe doing a self drawing on one table might be a bit of a harder concept. Right. So what else are some some some medium to hard type of concepts that we should prepare ourselves for? [00:13:50] I would say at least for SQL doing things like select with a lot of filtering or having to join on multiple tables. Those are probably the things I think initially are in that medium ish in that the vast majority of roles will expect you to know how to do it by the time you by the time start. [00:14:12] Yes. Or stuff like maybe window functions. Yeah. Yeah. Aggregates and Wasey, there's there's window functions but then there's like Reinke functions as well. S.P. So this common table expressions and sub queries and things like that. So before I get to I've got a couple of questions here in the chat, but part of the reason I really want to bring this up was I got an email earlier this week from one of our community members who is actually in the chat right now. Our name is Asha. So I hear this kind of was the reason I kicked off this conversation was based on the email that you sent me. So you guys if you guys want me to if you guys want us to talk about something, make sure you send us an email in the week to let us know. But also go ahead and tell us about your situation that you're in. [00:14:57] You can go ahead and you can. Can you hear me? Yeah, I have an interview tomorrow and it's all I need. Data analyst role. I have. I do. AIs have much experience in being the lead. I do have experience in basic day to day operations, but never for the lead. I applied for the role, didn't even think I'd get this far. And that's when I sent the email. I was actually panicking. How do you know what to check for? And especially so I had multiple jobs open because in my mind, I've been going through so many machine learning examples that everything that might come in tomorrow and it's a find that it's a fintech company. So on top of that, I have been doing the finance. I have no background in finance, so I had to do a lot of those classes. [00:15:47] So my question, I mean, I would say, sorry, I didn't mean to cut you off your main question, but but definitely if you can share the job posting with us so that we can maybe look at it, if not, that's completely OK as well. But that would help us a little bit. But yes, sorry I cut you off. Forgot your main question. Go for it. [00:16:02] Oh, I'd love to show that. I will. I will definitely do it. Let me just copy it. But a lot of the tasks, the person who's interviewing me after checking a lot of the a lot of the things I've been doing have been pulling and manipulating data as much, but they expect me to have the finance side of it down. So my main question was, how do you prepare for interviews like this? Think I got through it and asked the same question. I've been preparing for weeks and I still come up. I still come across things I do not know yet, still come across new things. Am I expected to have an answer for everything? [00:16:37] I don't I mean, I don't think you need to have an answer for everything. Don't don't feel like you have to. I would also say that if this company has progressed this far, along with the process and selected to be interviewed for this role, they obviously think that you at a bare minimum, meet their qualifications on paper for this role. Right. So that should be a boost of confidence there. Like nobody would block off time from their team's calendar schedule just to bring in somebody that they thought was completely unqualified for the job and then, you know, waste everyone's time in that respect. Right. So they they brought you in for a reason. And that's because you probably have demonstrated that your past experience and maybe through the initial phases that you cut out for the role. So that's a huge plus. So we use that as a bit of a confidence booster. And again, like I saying, use the job posting yourself like a syllabus. I go to that job posting. Are there any words on there that you don't understand any combination of phrases? Right, that you don't understand research and look it up and try to go to the company website to understand their products or what type of products do they have? What type of business model are they? And if that if that makes sense, I really got anything to add to that. [00:17:46] And then, yeah, I will see you. I would say to you, don't be afraid to say that you don't know something as well. So I think a lot of hiring managers and people who are interviewing you would rather hear, you know, I don't have a background in this. I'm unfamiliar with that concept. However, I'm really interested in this team and then working in FinTech. I'd love for you to explain more about that to me. You know, that I think has gotten me out of especially not having experience in a specific industry, but still showing that I'm interested in this thing and I want to learn this thing also kind of is a good flag for them. And they're like, OK, you can you're self-aware enough to say, I don't know this. I'm not going to pretend I do and I'd love to. And and hearing the enthusiasm that you want you want to learn more despite the fact that it can feel really bad when you don't know the answer to something. So I would air on the side of it's OK to not have all the answers, especially for more of the finance stuff that you don't have experience with yet. [00:18:58] So it does come down to just a lot of research and it just really, really looking at that job posting and connecting what you've done previously to the line items on that job posting, making sure you've got storys in the sense of of narratives about your previous work experience that you can tie into how that previous work experience will be successful in this role. Right. So don't don't forget that a lot of the interview, it's not just about all these random technical questions they want you to answer. It's also an assessment of how you you know, what type of scenarios you've been in in your previous work experience. Right. So be prepared to adequately demonstrate your capabilities through stories about what you've done in the past as well. Right. So be sure to spend some time brushing up on on some of those those narratives in the stories I was talking about. Oh, thank you so much. Was that helpful to that? Answer your question at all? [00:19:49] It's helpful, although the numbers have really kicked in. Hopefully it'll be gone by tomorrow, but that really helps. Thank you. [00:19:56] Yeah. I mean, again, they're bringing you in. They're scheduling time off of people's, you know, calendars to interview you. They think that you've got what it takes. It's just a matter of you have to think of. It like this, right? This is an organization, they've got a role, they're trying to fill this role. They have some requirements for it and they just need to find the right person who will either match those requirements or not. It's not an assessment of do we think this person is smart or do you think this person is awesome? [00:20:21] It's just not let's let's see if this person has a friendly attitude is looks like they're pleasant to be around. Obviously that's important. But also, have they done work that will make them successful in this current role as well? Right. What do you think? Ideally. [00:20:35] Yeah, I agree with you. I think you kind of hit the nail on the head with that. [00:20:39] But honestly, it's another question we got from Quentin. Quentin, do you want to go ahead? Yeah. Hi, everyone. [00:20:46] Um, it's in the line of what you guys have been talking about. Basically, it's what you focus on, because right now, like I took two weeks of vacations on my job to focus on putting everything in order and prepare for a search for something else. But I don't really know how to focus on what I'm doing. So should I focus on doing more projects and add them to my GitHub and build narratives to it that I can explain to, uh, to the next hiring managers? Or should I focus more on, like, answering questions and be more technical and look at the syllabus like the job offers that you're mentioning and maybe try to answer these technical issues? Like it's very difficult to know, like I think for everyone to know where to go. Like I add more projects, like how many projects that I have in my GitHub or should I focus on more technical questions? [00:21:40] I'll take the part about the projects and then I'll hand it over to identify the questions and stuff. But when it comes to projects, don't have any more than like two maximum three projects on the GitHub. Right. That's the most you need. And the reason is, let's say that I'm a hiring manager and I get your résumé right and Electra's military cool looks good. I'll go ahead and I'll spend some time looking at projects and I click on the link. It takes me to get her profile and I go to get her profile. And there's like thirty repositories on your GitHub profile. Right. And I'm like, all right, I don't know which one to click, so I'm just not click on this random one. [00:22:16] What if that random one I clicked on is your absolute worst representation of your work? [00:22:20] I would if I if I don't leave this kind of work in my. [00:22:23] Oh, I keep them private anyway. Yeah. So I mean that's good. But a lot of candidates don't do this right. So you just what you want to do is have only your absolute best to work on. You get her profile so that when a hiring manager or a reviewer goes and randomly clicks on one of the repositories, that is of equal quality as everything else that's on there. [00:22:43] So everything you have on your profile adequately demonstrates your abilities as a candidate. So, um, that that's going to be my my big point there. So I would say definitely two or three projects at the most and then spend some of your time going through practice problems like on a hacker platform to go totally code. Um, I'll drop a couple of links here as a as I totally goes and talks about the questions. [00:23:12] Yeah. So I would say again kind of based off of the actual job description, but you might want to take a slice from a couple of different categories. Right. So you may be asked a couple statistical questions, a couple of questions on Data engineering, Data Ranglin. And then I've gotten a lot of obviously a couple of questions on machine learning as well as I would prepare for having something around Data visualizations or communicating results. So one of the good ways to prepare for that, there's a lot of good text out there about Data storytelling. What I would try and prep a couple big areas in each of those. So for each stage of the interview process, so you may come across a company that they they do a hiring, H.R. screening and the technical interviews. So you can just work on the technical, you know, questions and then going into your past projects, the statistical questions for the later years. So that's kind of helpful. [00:24:23] Yes. Yes, it is. It is. [00:24:25] Yeah. So in terms of how to allocate your time, like you really do have to allocate it equally to each part of it. Right. Because it's all of that stuff is, I think, equal weighting when it comes to the interview process. I mean, the projects realistically, why are you doing the project? The project necessarily isn't to impress hiring manager or impress external people to the project. You are doing primarily for you to think like a data scientist, for you to put yourself in the process of thinking through problems so that when you're given a take home challenge only given the hypothetical scenarios in an interview, you can go, OK, well, you know that I've done a few projects. I've got like a framework in my mind, a set of principles that I can use to to tackle this problem. So. Objects you want to think of, you're doing these for yourself so that you can develop and mature your mastery and progression towards mastery on top of that. [00:25:17] I think that's a very interesting point of view. Oftentimes, we focus a lot on the technical things, like, as I mentioned, we can look for answers in Google. I mean, the most important is what you just mentioned is a thinking process. Do you have any recommendations as of books that explain better how to think about any business problems, issues? I know there are many case studies on the Internet. That is nice, too, like I'm reading some of those. But do you have a book about the mindset, like the thinking process that you have to go through to to really improve on that thought? [00:25:56] Yeah, that's a great question. Couple of books that come to the top of my mind. There's one book I've got on my bookshelf here. It's called It's a book to prepare people for consulting interviews, which are heavily case study based. And that one is called case in point. So that's that's a good one. Case in point was the book I was recommended to me by Brandon Kotch, who was a regular at the Friday office hours, and somebody I've interviewed for my podcast. He's a Ph.D. from from Caltech and head of Data Science Department. And he said that book really helped them to think through how to go through coding, not coding challenges, but take on challenges and things like that. Like it helped him develop his Problem-Solving strategies. So that case in point is a good one. Another one is a book by Andrew Hunt. And this is it's called Pragmatic Thinking and Learning. And so this is a book more about how to think clearly and how to learn. He's also the author of The Pragmatic Programmer, which is a great book. I'm also going to be on my podcast and a couple of weeks I'll be interviewing him. So those two are great books, ideally, or if anybody else here has any recommendations, I'd love to hear as well. [00:27:07] I would say first my recommendation. It's Data science for business. I really like that. It goes through why businesses are even investing in Data science machine learning, as well as some really good case studies in there that help you see Data in a business perspective. [00:27:26] That's by Foster Provos and and the late Tom Fosset, who passed away, made a jump in July 20 20, sadly. So that's an excellent book as well. Yeah, great. Great tip. I got another book here that is on my bookshelf. I bought it, but I haven't really looked at it yet. It's been sitting there for a couple of months. It's called Heard in Data Science Interviews by Carl Mishra. I thumbed through it. I flipped through it. It looked kind of interesting. Um, I realized that all of this coding examples are done in sea. So it's like, okay, well, I don't know how to do see, so whatever. [00:28:01] But what I've got and it's it's a great one. I like the the answers a lot in that book. [00:28:08] Yeah. Yeah, definitely. Check that out because it looks like the questions were really good. Sorry. [00:28:13] What's the author for the last book that you mentioned. Harp. Oh I don't. [00:28:17] Yeah. Khal Mishra. Yeah. And my S.H. are a and what's the title of the book again. [00:28:23] I heard in Data Science interviews. [00:28:26] I'll be chopping some things in Metaxa. [00:28:28] Thank you so much. Yeah. [00:28:30] Mark, that's great. By the way, I didn't even know what Marcus year mark was going to. And with hiring managers prefer Jupiter notebook or scripts or both. I think hiring managers like from my perspective would would prefer a well organized repository. So making sure that your repository is clean, structured and making use of notebooks when you need to use them and maybe abstract away a lot of code in terms of like helper functions and like scripts. Right. So in a Jupiter notebook, you want to keep it clean and maybe you're importing your own helper functions or your own modules. Um, into that you put a notebook and keeping the code out of the Jupiter notebook, if that makes sense. But obviously in some some situations you might need to leave the code in the notebook. But I think a mix of both. But what's most important from my perspective, with respect to notebooks or scripts, is a very clean, well organized repository structure. What do you think? [00:29:22] Ideally, yeah, I would say I agree with that. But I think in in the past, when I've been asked to go through my projects, there has been a strong affinity for G.P.A. books just for presenting. So even if you are creating a lot of scripts and calling APIs in the background, just even having a separate, you can start in a and deeper a book and create all your functions and have an awesome script that works well and then have a Jupiter no match for just presenting the project that is super high level. Doesn't necessarily. I think I've gotten feedback that sometimes I include too many visualizations I didn't pick. Some of the best aspects from all my experimentation. So if there's any tech, you can have whatever you experiment in and then create a pared down version just to show off. But making sure you're in general is organized. [00:30:23] Well, as you see, also, if you take it a step further and even just record yourself giving the presentation, put it on YouTube, have a link and embedded link when you get to have a video embedded there, that would be awesome as well. I think that that's doing something that competition is doing. Also, Mark, you can check out the talk of a dedicated maybe two or three weeks ago now called tips for creating a portfolio project that will get you hired. So look at that as well. So there's a couple of questions here from Nissan. [00:30:53] There are two unrelated questions I'm going to pick the one that I think most relevant to we're talking about right now, and that is, could you please share the way of explaining projects in both the resume and the interview staff format, I think is the best way to do it. Situation, task, action, result. And you can you can make a narrative and start format, you know, with maybe five lines on a resume. And it's like the situation for this project was dot, dot, dot. My task was to do that dot, dot, the actions I took or the analysis I performed was dot, dot, dot. And as a result, I observed dot, dot, dot. And you can use that same framework in a resident like starting interview as well. And like, I'd be more than happy to go in-depth on any part that feel free to ask. Right. So that situation task I result. What do you think? Ideally. [00:31:42] Yeah, I tend to have kind of two buckets of advice, at least for your resume and depending on how much experience you have and how much in depth experience you have. So when I say that if you were at a company where you had one main project you worked on, I would do just like Harp mentioned and go through the star format for one project. But when I've had travels where I've had I've been there a year or a couple of years and have had a lot of different projects, I would basically write a good short one sentence. What I did with is something measurable. So this increased profit by X, whatever that is, if you have four of those, that that can be a four bullet wounds on your resume. But then when you get into the interview for each of those, you're going to want to start out and explain. And that's that has been the method that I found that has drastically changed my the outcomes of my interviews, because it's really easy when you're nervous and to me to get flustered. And I think a lot of people tend to deviate from actually like checking the boxes that your interviewer is looking for and looking to hear something that can be really helpful for you guys is, you know, just take some time to imagine yourself in an interview. [00:33:03] Right. So maybe spend 20 minutes a day or maybe when you're on a walk, spend 20 minutes just thinking about, OK, here's this project I did. How would I explain this in the interview? All right. And based on how I would explain it, what are some questions people might ask me and how would I respond to those questions right now based on my responses to those questions? What are some questions that somebody might ask me? Right. So really, you have to think about yourself in these interview situations because, I mean, like, there's science and research that shows like the brain can't tell the difference between a real experience and an imagined experience. I like some neuroscience stuff there, but if you could just imagine yourself in an interview. Right. And just imagine how you'd explain something. [00:33:46] It's almost like you're going to experience so that when the real time comes, it's like I kind of been there, done that, have anticipated these questions. Right. That goes a long way. So that's that's what I would say there. And I sound crazy because, you know, like, yes, I actually sit there and I talk to myself when a man walks, like, mumbled to myself and to his second question, I mean, what's your take on auto smell? [00:34:09] Is it going to replace Data nice jobs? No, that's all I got to say about that. [00:34:16] Yeah, I think there might be some organizations that try to do this place. They Data scientists with Auto Amelle, and I think they'll just find they will need a ton more Data engineers than they expected. And it's hard to still get the same level of fine tuned results or domain expertize and a lot of the things that humans can bring to that process. [00:34:43] Yeah, you can't automate creativity and problem solving. You could automate steps all day long for sure. But you can't automate someone's creative problem solving approach. And for that reason, they're always going to need scientists. And so what if Data engineering jobs are on the rise? That's great. That means shortly after Data science will be on the rise again. Exactly. So, Mark, you got a question here. [00:35:07] It was it was the follow up for talking about why. I'm just curious, like Shaggy's you have when it's very hard to tie it to a number. So if you're building a project that's tied directly to sales, that's easy. Like they sold this much because of this this kind of deliberate product. But for other things, when I'm at a startup where it's hard to get those metrics. But but to give an example, I basically recreate a metric to understand our products, health. There it is using just email Obinze. And now I create this kind of nuanced metric that essentially looks at all the different aspects of our products that's replaced our North Star now. But it's not tied directly to like a sales and numbers. And it's like a cultural shift, more than like a financial shift. And so like how and that's one use case. But like going essentially like figure out how to put those into numbers for showing off, like, hey, this is the impact of having those bullet points on your resume. [00:36:03] That's a great question. I think I would start first with what what kind of what you mentioned was that you kind of replace the North Star and that is impact itself. It may not have a X percentage like in a certain direction, but even making the point that you something that you built was adopted company wide, I think that hiring managers, they read that they're like, OK, you know, regardless of the fact that you are on a small, maybe a small team, you are doing things that everyone else is using, I think that still shows I rely even without having it be as measurable as net profit or reducing the amount of spend on something. [00:36:53] I'm curious to see how you came up with this metric. I don't know if you can be sure what the metric is, but I'd love to know what the thought process was behind creating this and changing the North Star. That's huge, man. Like, what was that like? [00:37:07] Yeah, I mean, I think I think the challenge being a startup is that there's really not that much ground truth. You kind of have to create it and throw it out in the world experiment, see what's Carex. And I think that was that was the biggest challenge. And something I noticed was, you know, we're only using one aspect to understand our Data, but like in the past three months, we've built out all of these new features that didn't exist before. And so this process was never up Data yet. Right. And so my managers, the grand genius behind this and the other teams are like, hey, we need to build something else tasked with it. So I got lucky in that sense and was like my first project. But when tested that my first set was like just going into a data warehouse bakery and just doing a whole bunch of SQL queries just to understand or Data and what it all means. What's like a good quality Data. And then I talk to all these different business stakeholders. I'm like, hey, I have this metric. Does this impact you? You know, what's what's what's something that's like wrong with the Data, right? What's. That's correct. So basically creating all the ground truth for the Data first and talking to business stakeholders. And once I get that ground truth, I basically create like a composite score. And I can't really go into details of the score itself. But a major step of it was like when I built those composites, actually create Excel spreadsheets and essentially just create a quick MBP with like statements of like if I change the score this way. [00:38:36] Right. What would happen? And I'll go back to the business stakeholders say like, hey, what do you think of this score? What do you think of these weights? Does this match kind of like your expectations? And thankfully, we have like people, scientists who are like this is their role, like organizational psychology. So I relied on them a lot. So I did an interview process for about a month before I even wrote lines of code because I got really strong understanding of like what what was the business use case and how is the Data fit within that business use case? And once I understood that, then I wrote the code and embedded within our workflow and is now our product goes to our customers. So there is a block for that. I can share a blog of that metric by then. Once we finally got that metric and put it back into our data warehouse is like a whole other process in itself. We're able to pull in those scores and actually use them for analysis. I use it for our product health kind of dashboards, for our team meetings or a company wide meetings. And so I think to summarize real quick, get your ground truth on your Data, get the ground truth from the business stakeholders. Iterate on that process, put it into code, into production, pull that data back in that you create it and then analyze it and then share it broadly. [00:39:58] That's awesome. That's a really, really good process. And that's like you can learn how to do that in like a boot camp or in school, like you can't. I learned that you can't teach that and you can figure it out and learn how to do it on your own, but you can't be taught how to do it, what it is that you just did. That's awesome. And it might be reaching out to you in the near future as I have very similar challenges that work. I've got to help create Data strategy for a behemoth 70 year old organization, the billion dollar valuation. [00:40:27] And so the only way through modern Data strategy, which is helpful, I think one key skill that this Nimetz hadn't got like this wasn't really a technically hard project, but it took months to get kind of get like these simple paragraphs. Right. Stakeholder management. That's kind of been like my go to skill as a data scientist. I was kind of submitting myself and but having that stakeholder management, knowing who to bring to the table and then also know who's not at the table and really just talk and talk and talk like crazy. It'll make your life easier when you finally cut the thing. And I learned that from being burned a whole bunch of times where I didn't talk to enough people or talk to the wrong people and built the wrong things. So it's about messing up a few times to finally get it down to Tau has an excellent question I'd like to ask here. [00:41:16] So if you hadn't made this, what would it what would it have taken to do it manually or in an alternative way? [00:41:24] I think I think this sounds like a cop out question, but because we're a startup, there is no other alternative. It just wouldn't have happened. And we kept on using this metric as a proxy. And so many times when you're when you're just kind of building a plane as you fly it, you can't get the next best thing as a proxy knowing it's not the real truth that gets you closer than than before. So, like, for me, like I know this metric I created, it's like it's a stronger ground. Truth is a stronger proxy, but it's probably going to be something better when we get more data. And so that's going to be replaced eventually as it should. So I think that that's the alternative for that. [00:41:59] And I think what we're seeing for the love of metrics is like I essentially have a whole set of safety queries that I have. And so when people ask questions about data like, hey, Mark, we think about this on my case is a safe query. [00:42:11] There you go. And so that was also really helpful as a data scientist, actually dig into that data warehouse and like create save queries, because now I built within the organization kind of like domain expertize, like for a next Data. So people have much Data questions. They come to me and the Harp in the first few months because I took the time and managers like, please do this just sitting through all of our data. And so I guess like alternative just been the people asking me for a whole bunch of ad hoc SQL queries, which I wouldn't want. [00:42:42] I like that idea that you talked about, like this metric that was a proxy, almost like a vanity metric. Email opens like, what does that actually tell you? We can't. Is that a metric we can actually impact, like we can't impact if somebody is actually to open our email or not? But it would you say that's a vanity metric type of situation you're in? OK, define Bande metric, like, for example, like metrics that don't really tell you much. Like maybe we have a website visits or a number of downloading a podcast, right? [00:43:11] Yeah, I think it was a strong metric for where they currently have. So I think where the product was at the currently was all through email and so email. Oltmanns makes sense because like if you open the email, nothing else happened afterwards. So I think that was a really strong metric. And then once we changed the different formats to receive our product, then it becomes strong and that's when the need for this other metric really rose. And so like with startups, everything's on fire and everything is a priority. And so you need to choose which one you're going to work on today. And when we add more products or more kind of features, that priority rose up. [00:43:52] I think Torx might have a follow up question. So if you do talk, go ahead. And then Jay, I have a question for for Marc or related to Marc. So we'll go from tour to Jiah Marc. [00:44:03] Very interesting. But initially you said, how do you value it, is that correct? I understand what you were asking for, the value of what you have done, how to evaluate and how to value it, either monitary or in some way. [00:44:17] Yeah, that that was the main question I really struggled with because I know how to solve problems. And sometimes those problems aren't directly tied to a specific number, but like it's a need that I follow pain points rather than dollar signs and that makes sense. I don't know if that's sometimes the wrong approach, but this is a project currently take right now. [00:44:38] I know the reason why. I just want to clarify this before I get my answer. You know, I don't want to feel like I'm way out there. I'm not a technical guy. [00:44:46] The rest of the group here, but working on projects in general in my field of work, it's difficult to measure monetarily how much money you're earning based on what you do. My job as an auditor, I don't really earn any money. I'm just costing money. But there are ways. Of measuring results by how much of an impact you have on reducing other people's work, how you are improving processes, procedures by reducing the time spent on a task. Now, normally what I do in one instance, for example, I created a tool for my own job to analyze. Now, initially, it took me three or four hours to give a proper, simple management report on a request on how many resources do I need, how much is going to cost me to perform an audit. And then I would have four or five parameters. Now, that would normally take me about three or four hours to summarize or write a manual. And like everybody else, I'm a lazy bastard, pardon my French, which basically means that if it wasn't for laziness, we wouldn't have invented the wheel. Right. So I created the tool, a simple Excel tool. It's not complicated for five parameters. I plug them in and you take a mark and automatically generates the resources based on a number of weeks and the data and then now ultimately created the two lines. [00:46:13] I can now generate a report just by typing in my summary, etc. So instead of spending four hours, I'm now spending two, three or four minutes to do the same. That is not reflected in my revenue stream because I'm not making money out. But however, for a person looking at it or myself, it saves me three hours. My hourly rate is X. So this way I can measure how much it's actually saving money. So sometimes in saving money, it's saving time to other parameters that you need to use. It's never especially in the startup you can't really measure because you have no idea what impact it's got to have revenue wise. But on the other hand, you can then look internally as to how much to save us because your organization is now probably working more efficient. You are working more efficient, you're not wasting your time, which frees up time to generate revenue, to do other things. And so to me, it's always a question of kind of looking at the impact that what you have done past. [00:47:14] Now, if you want to calculate the return on the investment value investment, what your time is spent, all of this, your return on investment is basically what you have set in the future, not just for yourself, but then also for the other partners. So that's how I kind of approach that valuation question. [00:47:33] I kind of come back of the napkin calculation like, OK, before I created this thing, before I did this thing, it took one person X amount of hours to do this thing. Right. There is ten people doing this thing and that's 10 people times, ten hours times, you know, 52 weeks of the year. This is how much time per year was spent doing this task before my invention and the average salary of these people was this amounts. Now I've got a baseline amount of time in dollars that it took. And now after it takes, you know, a fraction of the percent of time. And now that's the dollar saving kind of thing. That kind of a mentor. [00:48:10] Yeah, that's exactly what you do. You kind of client the the cost element, something that people people are generally the cost and of course, the electricity. And it can go into very detail, but on a very high level, like try and just keep it to that the time that you spent, because time is money. Time also means if you're working on repetitive tasks, that is costing your revenue because you're wasting time on something that can be automated. In my world, I've always looked for ways to improve my own day to day work by developing my own tools, my own processes, procedures, just to simplify every day and every time I have a new @ArtistsOfData whenever I get a problem in my face, I always think of it in two terms. It's just something that it's very likely it's going to happen again. Then I will spend extra time to try to solve it, to see if I could make something that will minimize the future work, because I expect it's going to happen again and again. [00:49:14] On the other hand, if it's something that technically I don't foresee that it's going to be a repetitive thing, I will just get it done as fast as possible without taking time to build something or evaluate something. It will just be done to deliver right there. So it's kind of like that balance over time. You face not a problem. I don't think that matters. What with my type of problems, like your type of problems and that and analytics, sometimes you just get the Data out there because you don't expect that's going to happen again. But if you start seeing a trend that it happens over and over again, well, then it's time to build those models that actually simplify that job because you will invest a lot of time in developing those repetitive tasks. That time has to be recovered in the future from the set. [00:50:01] That's awesome. And that's how you think like a business person guy. So that's awesome. Advice, thank you to her, let's go to Jay's question, by the way. You're welcome. [00:50:10] Yeah. So my question is so related to work, he's able to get the Data within his company and do an analysis, get the matrix and stuff like that, minus the total opposite. I work for a biotechnology company and I'm trying to create a Data project within the company, you know, and kind of upscale myself a little bit and get some experience. How do you get the get the managers buying or buying in creating a Data project? And I've kind of alluded this during my reviews and stuff like that. And I've got the I've got this idea and I want to create Data project within the company. Can you share some of your Data, etc.? How do you get the buy in to kind of not to kind of know hopefully the bottom line gets better or improved, something like that. [00:51:00] So that's always a tough question. So it's a tough thing to do. How to get executive buy in? I mean, I'll try this one over time, but she's got more experience getting executive buying. [00:51:10] And I think a big part of it comes from showing them the potential benefits or the potential ROIC. So if there's ways you can if you're building this Data Data project, you save them money or two. If you can any way tie them to a goal like that, it's a lot easier to get buy in in your face, a lot less pushback. But on the other hand, you'll face a lot more expectations to have this project actually really work and reduce the amount of cost or something like that. So that is one way. I think another way is a little bit more just General Data education. So it sounds like you might be in an organization that's not very focused on data science and machine learning. I think just by showing other examples of maybe there are other similar companies who have fully fledged small teams and are doing work that they publish on media, being able to show them kind of successes in the same industry might be helpful, as well as someone who I like, who writes a lot in this space is Tom Davenport. [00:52:22] He's quite well-known. He's written a couple of books, analytics. I work big Data. We're competing on analytics. So I think competing on analytics is the most recent one, or maybe it's big data work. But he's got this framework of analytic maturity that goes from level one to level five. Right. [00:52:38] Ok, real quick. So I think you should talk to his. He's on the Friday office hours. Eric Sims. I talked to him and he. Yeah, I know you had an interesting quote when I had a one on one with them. You said, like, he somehow always found the Data in his rolls and that's why he started to pursue their science because like in his roles at one Data, well, he kept on solving business problems with Data. And so he's really figuring it out of, like, how to really bridge that gap of like, hey, we have this Data here. I can solve all these problems without being even in the Data role. So I think he'll be an awesome person to talk to us within our own small community. [00:53:17] Ok, thank you, Mark. I will definitely get in touch with you because my problem is they have the Data and it's a family owned company, so they protect their stuff quite a bit. So I have that issue and they seem to be open to the idea of me helping them get Data stuff. But they are not ready yet. So I think partly because I need to first kind of show them how Data can improve their bottom line or their business or whatever. So I need the Data. So yeah. So I've got a few ideas with this company and I'm trying to kind of create a couple, I mean at least trying to get one or two projects offline. I see things in the company that they're doing that it's very manual and I want to automate that so and some of those things. But yeah, I'll touch base with Eric and see how he can help the company. [00:54:07] I'll be definitely happy to to discuss this with Eric on Friday as well as on one of the Friday officers. And Eric, if you're listening to this and I know you probably are, you need to come hang out on Sundays as well. Great questions. So next question. [00:54:21] I see here some Cristoff, so go for it because I already said I was a little bit late with this question, but I wanted to ask, how do you relate to passion project to your to the four states that you mentioned, Harpreet or Ayodeji? I always said that you need some number at the end to that increased idato revenue or decreased costs. I mean, passion projects don't really work like this. So how do you sell them during the interview? [00:55:00] I think the store format would still definitely apply. Instead of just having a financial result, you can have just a result of maybe. And they made your life easier in some way, you did this and you learn something new or something to that effect. I think the star framework still applies there. Here's the situation I was interested in. Here's the tasks I had. Here is the actions or analysis I did. And as a result, I observed this or I caused this to happen, so and so forth. Another thing you could do is take that framework that Toura talked about and then just say, you know, in an imaginary world. Right. If a company was doing a process that was a manual. Now, with this thing that I've done with my passion project, if a company was to implement this, it could save them this many hours per week, which would result in this much money save at the end of the year as a hypothetical type of way. [00:55:49] Well, let me take a stab at this as well. [00:55:52] I think what passion projects are great for is also showing off really two aspects of you as a data scientist. One is that you are passionate about this issue up front, talking about why you chose this project, why or how you found the Data. And then I think really you can also show off if you learn something from it. So let's say you've not dealt with and I see a question in here. You've never played with NLP before. So you could say, OK, what I took from this, despite not having like that impact, is now I understand the Data preprocessing for tax Data. And now I understand that if I'm looking at one scenario versus another, I should use tokenized, more bag of words and being able to speak intelligently to that. So be able to maybe have a takeaway of something that you gave that you didn't know before working on that project and then explaining that in the star format, I think is helpful for making those passion projects look a little bit more serious or get taken seriously more often by interviewers. [00:57:03] Hopefully that. So what you're saying it is a learning experience is also very valuable to talk about. I mean, to clearly explain what you what you learn during the project. Absolutely. OK, thank you so much. And I love NLP questions also. [00:57:24] And I know precisely zero things about NLP. So if anybody does then take a stab at Mark's question, but go for Mark. [00:57:32] Well, it could also be like a production question as well. But essentially I recently got a NLP model in production for our product. It's super simple V1, but it's still in production, which is a whole beast in itself. And the challenge was essentially is that our first run of this in our product is for our biggest client. And my car basically broke the whole pipeline because NLP is very complicated and expensive. Thankfully, I was able to work with I work around with our team to figure out a way to get it going through. But my question specifically is one area that's not really impactful. I know it's slower than I want to be. Is that tokenization step? So for context for people NLP you have your tax or corpus tokenization essentially taking all the words or spaces where it is and creating a token from and from there. You're NLP model can do various things like remove, stop words or limites. And so that tokenization said, I have logging now for for all of it because it takes a lot and the engineers to know what's happening and the logging stop. The logging step is that tokenization process. What are ways to speed up the tokenization process for NLP? Because that's kind of a necessary step for for what I'm doing. So I can't just get rid of it. I from what I've read, it looks like paralyzation is the way to go about it. And for context, I'm using spacy, which is like team does like this production level NLP chool and just curious, like just wait the tokenization, because for now it works. But my, my, my Data science senses are telling me it's going to blow up again. Maybe I'm being too pessimistic, but I'm like, I want to get ahead of this. Or like my perfectionism. It's like I want to get better, I want to run faster. [00:59:27] Definitely open this up to anybody who has experience working on this. [00:59:31] But before I do that, some say that there's an awesome blog post on Comment Emelle that was done by Nicholas Lascaris on getting started with NLP US Airlines sentiment analysis. That's right there in the chat. Guys, check out that blog post and check out how come Amelle can help you experiment with NLP go it. [00:59:49] Yeah, I was going to say so. I have not put a ton of NLP models into production, but I think you hit the nail on the head finding ways to parallelize it, if you can. I'm not sure if you're working with you, Cluster's, but just. Yeah, yeah, I think that might be. The fastest, easiest way to not deal with this, the issue of just breaking things, because it does take a long time and then unfortunately is like the most computationally expensive part of NLP when you're in production, working with life and Data. [01:00:23] And I think I forget his name, Dennis Rodman. He did. He created a book called Transformer's. [01:00:32] He does NLP stuff. So maybe you can check out his book. That might be some clues in there because, yeah, tokenization is a big thing. Yeah. So you might have some I don't know too much. I did one project and NLP and I was using the R and then I asked him to do the tokenization. But I think our analysis team is kind of getting old. So I think people are moving towards Transformer's and also. Yeah, I can't say too much about it, but check out his book. He's he he has YouTube videos too. I guess so, yeah. [01:01:01] So Christoper, I saw you shaking your head rather vigorously. This conversation. This is something that you might have experienced with. [01:01:09] No, because I don't have any professional experiences about experience yet, but NLP definitely want what I'm aiming for because it's just so interesting and there is endless ideas and possibilities that it is really crazy how how great NLP for the future is. And I just find it really interesting. And Mark, you mentioned spacy. I discovered it like three weeks ago and it was like instead of because it is so crazy, so well documented and you can do so much with it. It is great, but unfortunately I cannot tell you about it. [01:01:57] Yeah, no problem. See, Carloss unmetered. Do you have any insider here or was that just on accident versus an accident. Anybody else have any tips on how to get tokenization faster from work. [01:02:10] I have a comment for people then. [01:02:12] Yeah, just one major bug that I spent a lot of time putting this in production. So space is deemed as like this production level, kind of essentially they take all the research. What's the best of the best? And like, here's this one bottle. You ought to think about it. And that's why I use it for it, for this use case. But the challenge is so we have things in production and your requirements into a file. And so like the script says, like, hey, here are the requirements. You have both the spacy model or the spacy package, but then they have their separate models, which are essentially the serialized machine learning models. Right. That that you have to load in loading. [01:02:49] And that step does not work well with production systems. The reason being is that there's so massive because it's like a huge, like NLP machine learning model. Right. They can't have a package for it. You load in Seattle, read it and be a GitHub link if you read it, and be a GitHub link pet freeze, which is typically used for a lot of a lot of kind of production setups, doesn't play well with that link. And so I dug through hours to get Harp links. And so one way to kind of work through that is essentially to create your own type repository locally if you want to put in production a quick way. We did. That was after the freeze we listed to replace with the correct term to get it going through again, start up trying, make things move fast. But that was one of the biggest hurdles I experience when putting space into production was the requirements. So if anyone's listening and think about space, hopefully I saved you hours because it took me a couple of days to dig through a couple of engineers to figure this out. [01:03:58] And as excellent, excellent tips. Thank you very much. So, yeah, I mean, hopefully if anybody listening on the podcast or on YouTube has some insights, send me an email and then I'll get you connected with Mark or something you might be able to bring up on a Friday. Officeworks and others NLP enthusiast on Friday as well. It does not look like there are any more questions in the chat. So I'll go ahead and wrap this up. Guys, thank you so much for spending part of your Sunday with Ayodele, and I really appreciate having you guys here take care of the rest of your weekend. And if you guys found this useful, if you guys enjoy this, do me a favor. Shout us out on LinkedIn. Tag me tag Odili Tag Comment Emelle, spread the word list, get these things pop in every Sunday. Kind of. There's a lot of people out there who want to have conversations like this. So appreciate if you guys help us spread the word, you can use the short link which I'll put here is htp colon slash slash bitti dot l y slash comet dash emelle dash o h. So check that out guys. Help us spread the word and then get. These sessions, bigger and better, take everybody have the rest of the weekend, remember, you got one life on this planet. Why not try to do some big cheers?