emilyrobinson-jacquelinenolis-2020-07-07.mp3 Jacqueline Nolis: [00:00:00] There's a lot of bro-ey areas of Data science. I think there are certain sub-reddits you can go to where they very much have this like Mocho, the bigger the Data, the better. How many layers of your neural network. Have you only went to a boot camp? That's not a real Data scientists are like there's a lot of that energy out there, but that is not the entire field. And don't let those areas intimidate you or make you think that you are less of a Data scientist because you don't engage in those games. Harpreet Sahota: [00:00:44] What's up, everybody? Welcome to the Artists Data Science podcast, the only self development podcast for Data scientists. You're going to learn from and be inspired by the people, ideas and conversations that will encourage creativity and innovation in yourself so that you can do the same for others. I also host open office hours. You can register to attend by going to Italy dot com forward. Slash a d. S o h. I look forward to seeing you all there. Let's ride this beat out into another awesome episode. And don't forget to subscribe to the show and leave a five star review. Harpreet Sahota: [00:01:46] Our guests today are both Data scientists have collaborated on an amazing book on how to build a career in Data science. Harpreet Sahota: [00:01:54] One of them has a PhD in industrial engineering and has over a decade of experience helping companies like DSW and AirBnB use Data. She's currently a principal data scientist at Brighton, where she creates models to help restaurants and retailers improve their customer experience. In her spare time, she likes to use Data for humor, like using deep learning to generate offensive license plates. One of them has earned a master's in management with a specialization in organizational behavior and has worked at companies such as Data Cap, where she built and ran their experimentation analytics system, and Etsy, where she's worked with the research team. She's currently a senior data scientist at Warby Parker, where she works on a centralized team tackling some of the company's biggest projects. In her spare time, she moved to kidnaping her parents dogs or regularly giving talks on HIV testing, programing an hour and sharing Data science career advice at conferences and meet ups. So please help me in welcoming our guests today, the authors of Build Your Career in Data Science, Dr. Jacqueline Nola's and Emily Robinson. Thank you guys so much for being on the show. It's the first time I've done an interview with two guests, so this is really exciting. I appreciate you guys taking time out your schedule to be here. Emily Robinson : [00:03:12] Thank you for having us. We're both excited. Harpreet Sahota: [00:03:15] So talk to us about how you guys first heard of Data science. How did you get involved with their science? You know, kind of what drew you to the field? We could start with Jacklin and then go on to Emily. Jacqueline Nolis: [00:03:27] Yeah, actually, I got my undergrad, a master's in math, and when I graduated, I really wanted a job using mathematics to help companies. And this is before Data science was even a term. So it was I called analytics when I started, but I really just had this idea of I really liked what I learned with a math degree and I wanted to use it to help companies. Jacqueline Nolis: [00:03:44] And I have this idea that there's someone who in a boardroom gets to be the person who says, oh, you want to do that idea. But I will use math to prove what a good idea is. I just had this desire to be that sort of a person which now we call a Data scientist. Emily Robinson : [00:03:56] And for me, as you mentioned, I have a master's degree in organizational behavior, and that was part of PhD program I was doing. Emily Robinson : [00:04:03] But after the two years when I earned my master's, I decided academia wasn't quite for me and Data science actually was a pretty natural next step coming from the social sciences, which does surprise some people. But the quantitative social sciences are quite similar. You're thinking of a question that you want to investigate. You're gathering data to answer that, whether by running an experiment or using like archival data Data that already exists, analyzing it, and then you're presenting it to make a case for what you find and what should be done. But that being said, what's your media industry? Was that those types of problems, one, are in academia can sometimes be a bit artificial versus an industry like you're working very applied, often with teams directly who are facing these issues. And the life cycle of a Data science project is usually more like a couple of months, like maybe a year versus the academia. You might be working on the same paper for seven years. So that's what drew me into Data science. And yeah, it's been a good now about four years that I've been in the field. Harpreet Sahota: [00:04:58] So how did you two meet? Jacqueline Nolis: [00:05:00] So we met because we both were speakers at a conference together. Data de Texas 2018. I think something Emily I saw when I was giving my talk. Jacqueline Nolis: [00:05:09] First I thought Emily and the audience and she asked a really good question at the end. And I'm like, oh, that's a smart cookie. And then later I sat down to hear a talk. It it turns out I was by the same person as in the audience. And so we talked a little bit after. And that's true, I must say. I think Emily does that and that's it. Emily Robinson : [00:05:23] And then Manning had reached out to Jaclyn about the book. And so even though we'd only met this once at a conference, I think we did talk a little bit, but not like, you know, a long time. Emily Robinson : [00:05:31] But Jacqueline reached out to me and asked if I'd be interested in coauthoring a book, and that's how we got started. Harpreet Sahota: [00:05:37] So you guys live across the country, right? So what's it like collaborating on a book together across space and time? What were some of the ups and downs that you guys had? Emily Robinson : [00:05:47] I think actually it was it it wasn't so bad. So she mentioned normally I'm in New York City, Jaclyn's in Seattle, and we did all our collaboration on the book with GitHub. Emily Robinson : [00:05:57] That's how we wrote it. So we were writing and word documents, but we were saving to get have to make sure we always stayed in sync and to see the changes. And the other thing I think that really kept us on track was having a weekly call because it was really good for, you know, just figuring out, OK, where's the other person at how it's progressing. I think we actually stuck pretty well to the schedule that we had it. It was like a month and a half. So we put up the book and two we each had half the chapters and then we were done with the first draft. We sent it to the other person and they came back usually with a lot of edits which we then incorporated. Jacqueline Nolis: [00:06:27] Well, and it was nice because it wasn't like I was kind of expecting before starting the book that Emily would say something. I disagree. I like oh no. The opposite of that is what the advice I would give in a book like this. But like I think that's largely not true. Generally, we would add to what each other said, like. Oh, you said. But like here, what if you add this part to it, too, but it's never like no, actually I strongly disagree with the way you put it. Also, I think it's funny because The Times has really worked out because Emily was on the East Coast, but she tends to stay up later than me. So I think, like we kind of like the time zones worked in our favour. And also in the morning, I would wake up super early because I had a toddler or an infant when I was working on the book, the kid became a toddler and had one child. And so I would always wake up super early because of having a kid. And so, yeah, like, I just we were on the other coast. It would have been twice as hard. Emily Robinson : [00:07:12] Yeah, I do think that worked out well. And I think actually Jack and I have talked before. We could have you know, we took on this project having never worked together and having met like in person once. And it could have gone, you know, really not that well. Emily Robinson : [00:07:24] But I think actually both of us were responsible, but also flexible, you know, like I took my honeymoon while we are finishing up the book, someone, like, took a vacation or have a busy week at work. It wasn't like, oh, now, you know, you go into the same corner because you didn't finish exactly on time when you said you would. Jacqueline Nolis: [00:07:41] Yes. There's some lesson here about finding someone to work with, who you click with, and us just happen to get really lucky because we didn't do the vetting first. We just. Harpreet Sahota: [00:07:50] So was there at all any particularly frustrating moment that you look back at it now and it just makes you laugh? Jacqueline Nolis: [00:07:57] Oh, frustrating moments. I think there's like people say writing a book is a lot of work. And I think this is true. It is a lot of work. But I don't think I really understood. I thought as a lot of work and like running a sprint is like a lot of work. And it's, you know, it's like running a marathon. It's like every week you have to write more for a year. And when you're like 60 percent of the way through, so you're a little bit more than Harp is a lot like you're just like, oh my God, this is so much I think there's like not like one like funny story of a Harp. And that was the most frustrating. It's really just I like Murro 60 to 70 percent, but Milestone's rate is like, oh my God, will this be over yet. But I mean I like the book, I like how came out but like that I think was the hard part. Emily Robinson : [00:08:36] Yeah. I also I think what are the benefits of writing the book was so many. What they do is they send your book out for like an informal review, like after you finish each third. Emily Robinson : [00:08:45] And we have like in generally, our review is pretty positive. And like there were some, you know, obviously some comments of improvement. I remember one, I think this is the final manuscript. So it's just really negative. And they actually recommended this book that got that. I'd never heard it. I got terrible reviews on Amazon and I think I would have been easy for me as a solo author to really personalize that and find it really frustrating. But I think actually, with Jacklin both being, I think, a more optimistic, quite the right word, but I like what is this jerk knows that he's just more confident in the work and being like, OK, every bit of it. We should hear the criticism.But that doesn't just because someone wrote it doesn't mean it's valid. Jacqueline Nolis: [00:09:20] Which is, I think, a good lesson I would put as advice in our data science book that we already wrote. But like I think this idea of write, like a lot of people that have imposter syndrome, this idea that like, oh, I'm not a real data scientist, everyone's going to figure this out. Jacqueline Nolis: [00:09:31] And it's like, no, it's very much just a moment of, OK, I know who I am. I believe in it. And if someone has some advice for me on how to change it, I will listen to advice. And so it just tears me apart. I'm going to be like, well, there's going on with that person that maybe isn't related to me, and I'm going to accept that that's who they are. And I'm not going to let that change how I think of myself. And I think you kind of have to grow that. I don't think that comes very naturally to most people, including me. Harpreet Sahota: [00:09:52] So how did you guys divvy up the chapters when it came time to figure out who works and what part was it kind of just what you guys felt most passionate about or was there drawing straws type of thing? Emily Robinson : [00:10:02] Yeah. So I think some of them, you know, kind of really clears like our last chapter is moving up the ladder and includes, like, you know, being a manager and Jaclyn's been a manager and I haven't. Emily Robinson : [00:10:10] But how he did it, which I thought worked well, is we each so once we came up with the outline of the chapters, we each put a rating on each chapter of like one being we really don't want to write this chapter. And five being we did we did that without knowing the other person. So we compared and most chapters felt like one of us had a preference. And then in terms of the ones that did it, I think Jaclyn had just ended up with more chapters being tilted in our favour for writing. So the ties went to me. Jacqueline Nolis: [00:10:35] Yeah. So it's fun because we used some sort of data analysis and that algorithm to decide who wrote what, which is great. Harpreet Sahota: [00:10:42] Thanks, guys. So let's get into it. But there's a few chapters in particular that I think the audience would really like to hear about. Well, there's the making and effect of analysis, deploying a model into production and working with stakeholders. But before you get started, it seems like you kind of talk about three different types of data. Scientists can like protagonists in your book, can briefly describe these archetypes for us. Emily Robinson : [00:11:03] Yeah. So the three types are sort of map somewhat to the three areas of Data science are an analyst. Emily Robinson : [00:11:11] So some focusing on analytics decision, scientists really focusing on statistics and inference and a machine learning, engineering, focusing on machine learning. Emily Robinson : [00:11:19] And why we divided this up is because data science is a really broad field and it's helpful often to delineate between the different types of roles. And some companies do this formally. So AirBnB, for example, has after all of their data scientist title like data scientist Colma Analytics data scientist, CAAMA Machine Learning, because often what they're looking for, people with very different backgrounds. So machine learning engineering may have computer science. That's. And may have been like sort of regular software engineer for a while versus a decision, scientists could be someone with a strong background in statistics who would never want to work, like engineering, a big thing that's going to go into production. The first the final one, the analytics is someone who can really do everything from making a dashboard to making reports for the executive team, often like finding what value can we get out of the data that we already have or like collect new data? Because I think people can underestimate this, but especially early on, for a company that's getting into data science, this is often the most valuable work. And there's a lot of low hanging fruit and the analytics space. Jacqueline Nolis: [00:12:22] And I would just say we included these three definitions intentionally, all under the umbrella of Data science. But I think there are a lot of people in the world who are weirdly gatekeepers. Right? Like you're an analyst, you're not a data scientist. Analytics isn't data science. What you're doing is just analysis. And I mean, we really feel that if you're creating dashboards report thinking about what visualizations to show executives, that's very much the same style of work as data scientists. And like it all kind of applies to the same thing. Similarly, the idea that, like machine learning engineer is so different and somehow sometimes people think it's superior and more important, complicated than decision science. We don't think that's really true either. It's a lot of the same ideas of decision science. You just take a little bit more of a software engineering approach them. So for us, we really intentionally we brought down those gates. We really think that they're all kind of in one bucket, just different flavors of that kind of idea. Harpreet Sahota: [00:13:09] Absolutely. Love that you guys have that philosophy at that point of view, I think is really important, especially, you know, for people trying to break into the field, having that gatekeeper kind of mentality there, you know, oh, I'm just a data analyst. Well, not many play an important role in the process as well. And I really like having your book. After every chapter of every section, you kind of break down how this particular thing applies to these particular rules. I thought, that's really cool. I did that. But yeah, let's go ahead and jump into making an effective analysis, starting with real, seemingly easy question. What if an analysis really? Jacqueline Nolis: [00:13:43] What is an analysis? Jacqueline Nolis: [00:13:45] So we kind of talk about this. The book people are in general an analysis is like a thing that answers on a question. Right? Jacqueline Nolis: [00:13:51] So if a business stakeholder is like we want to know why this product is doing poorly or why these customers are leaving, and analysis is like a file, like a PowerPoint or an e-mail file or whatever, you know, like something that gives the answer methodologically walks and gives an answer or say things like very similar and reporting is the idea that automated way you have a Python script that each week calculates the average number of customers and what percent of customers have left and like automatically put that into an Excel file that like weekly thing is a report and analysis is more for answering questions. A report is more for getting Data to people on a repeatable basis. Harpreet Sahota: [00:14:29] So what are the traits that separate good analysis from bad analysis? Jacqueline Nolis: [00:14:35] Oh, there are so many. And Emily, I want to name a few on Emily. I'm sure you're going to be like, Oh, Jacqueline, if I have the one that bothers me the most. I think for me, one trait that really bothers me about a bad report, like a bad report is not repeatable. Jacqueline Nolis: [00:14:47] Right. If I do an analysis on my computer one time, I make a PowerPoint of it, and then I delete the analysis that you can never see the calculations, the results and try and repeat it, that I would consider a bad report because we haven't kept the evidence on how it works. I guess I would say another sign, a good analysis. The difference between good or bad is how much can the business stakeholder understand it? Right. Are you using easy to understand language? Jacqueline Nolis: [00:15:08] Are you making a clear like, oh, here's the clear result you should derive from this. Here's how you get that. Like, is that how clear it is a distinction between good or bad? Can can other people besides a data scientist understand that? That's another one, I would say. Harpreet Sahota: [00:15:21] Jacqueline, how are the types of analysis different for the different types of data scientists? Jacqueline Nolis: [00:15:29] So I would say a decision scientists whole job is to make analysis, right? Like a decision scientist is using data to answer questions. So it's very natural that what they do is create analyzes for each question they have to answer. But a machine learning engineer, for instance, makes analysis to they may use analyzes to make some documentation. Why is the model working poorly in these situations or how is my training accuracy improving over time? And why should I choose model over model B? These sorts of things also fall into the form of hey, some document that I share with non Data scientists improve results. And I mean, that's true for analysts as well. They also still have to be doing making one off analyzes that kind of just show why things are happening for them. Harpreet Sahota: [00:16:08] So I'd like to get into asking questions and how to translate requests from the language of a business question to a Data science question. Emily, do you think you can share some insight on how we could do that? How can you take with the business stakeholder tells us and then convert that into like a Data such problem? Emily Robinson : [00:16:27] Absolutely. Emily Robinson : [00:16:28] So we adopted this framework from Renetta, which is exactly as you said. So Junior Data scientists, they probably would realize that if they thought about it. But unlike a cargo competition, that's like, well, here's all the data that you need and. Here is the exact question like predict this outcome business, the questions you'll face and industry are going to be a lot more vague. So an example we give in the book is, for example, how do we split our customers into different groups to market to. And so you translate that to the Data science question of how can we run a clustering algorithm to segment the customer data and then you get the data science answer from that. So for example, O k means clustering. I found three distinct groups, but that's not very useful to the business because as an example of clustering, those groups don't come with handy labels. So instead we want to give the business of selling like here are three types of customers, new high spending and commercial. The type of skill that you need for that is definitely need some communication skills throughout this and also business domain knowledge and domain knowledge is something that is going to be hard to get before starting your Data science job or even if you've worked in Data science before, if you transition to a different industry, you may have to learn new things. But it's really important to work to understand your Data because that can help you in the process of figuring out, OK, what problem are they actually trying to solve by asking this question, because sometimes they will do that translation for you and they will sort of say, oh, can you pull these numbers on this? Emily Robinson : [00:17:59] But if you dig into it, it turns out those numbers are actually not going to be helpful for solving the problem that they're having. So occasionally you need to actually bring the data science question that they ask you. You need to work it back to the underlying business question and potentially retranslate it. Harpreet Sahota: [00:18:15] And sometimes when I get a data set, I'm kind of guilty of just crack my knuckles and wanted to start typing away at the keyboard and get down and dirty with it. So what are some foundational questions that we can answer for ourselves? Emily, before we start, just get down and dirty in writing code. Emily Robinson : [00:18:32] Yes. So definitely, as I mentioned, like, OK, what's the problem that we're actually trying to solve here? Emily Robinson : [00:18:38] And the importance of writing analysis plan is, you know what? I can give you some direction, but I can also keep the analysis from going on for too long, because that's another component of a good analysis, is it can be done quickly. You know, there are some data science jobs that are very research heavy and it may be totally fine to go into a cave and work for a year on one problem, really, a bunch of academic papers. But most jobs are not like that. So an analysis plan means that you're limiting your scope and you come to an agreement with stakeholders on how you're going to be tackling the problem and get that sign off from them. And because that makes it easier when you've reached the end of it, maybe that you come back and you're like, you know, I looked at these things and it doesn't seem like there's unfortunately there's not really the Data the answer this question or of eliminate these possible things, having signed off on that analysis plan, can help prevent the situation where the business stakeholders are just like we'll just keep working on, you know, keep looking at new things, because at some point it's not worthwhile to just keep digging in. Harpreet Sahota: [00:19:39] Really appreciated that part of the book. And I like how you guys had a sample analysis plan that kind of outlined what you should kind of structure our analysis plan and what it should look like. But that's really cool to see that they look like my analysis plan. So there's a bit of validation there from some actual data scientists. That's pretty cool to see. But I want to jump. It's now like talking about deploying models into production. And this is something that doesn't get a lot of coverage in many books, really. I think a lot of people just present algorithms and code, but you never really get the other side dealing with deploying models into production. So, Jacqueline, what the heck is deploying a model into production mean anyways? Jacqueline Nolis: [00:20:19] So this is actually a great Segway into just kind of how my career played out. So as I said, my undergrad and master's are in math and I went and I worked as a data scientist, basically doing all sorts of analyzes, making reports and analysis and sharing them with the executives and things like that. Then I went and got a Ph.D. and I moved into consulting, but still my job. At the end of the day, every job and consulting engagement I had was more or less some form of take Data, do some modeling on it, put it into a PowerPoint, or I'll share that with executives up until about two years ago now, maybe Lammert, but some number of years ago where my career started shifting into, hey, instead of writing code to make up some results, you put in a PowerPoint. Let's actually write code that then has to run continuously in a way that customers will actually hit it. So going from writing code that runs wants to make some charts you put into a PowerPoint to code that may have to run multiple times a second in systems that if the systems go down, customers are actually Harp. Jacqueline Nolis: [00:21:19] It's a very different experience. And I was really scared of this kind of work for a long time. And this is much more the work of what you call a machine learning engine here, like actually taking models and putting them into systems that are continuous. They run. I was very scared of this kind of work for a long time because it just seemed like so different from the decision science kind of work I was doing of analysis to show executives. But after getting into it, I realized that most of the principals were the. Sam, it's still very much about doing good modeling, getting good results. The only difference is, are you going to make a PowerPoint file or you're going to make code that turns into an API, which is basically like a fancy website, where every time you go to that website that hits your code, which causes your code to run, and then you return a result that way. So like the wrapper is a little different, but the basic work is largely the same. And so that ended up sounding a whole chapter of this book. Harpreet Sahota: [00:22:05] And can you provide us illustrative example of putting a model into production? What does that look like? Jacqueline Nolis: [00:22:14] Let's say we're at a company and we have a churn model. Right? So I make a model that takes a set of customers and predicts if they'll churn or not. So it's the decision scientist. I may do an analysis that predicts the churn of all the customers. So they may make a report that says, we expect that half our customers are going to turn to the next year. But so that's the kind of decision sciences side to put it in. Production would be like, OK, I want to actually take this code and make it so that any time a software engineer can do something that will predict the churn of a customer. And the way I'll do that is I'll make something, you know, it's called a rest API, which is I'll make some code that when you run the code, it just hosts like basically like a fancy website. When a software engineer goes to the URL like Churn model dot com slash customer has been with us for five years and I put the information about customer in the euro. When I go to that website, my code will then run and then it will return the prediction of the probability of that churning. So that's the basic idea. And so the work behind that is I have to take the code that I've written in Python or R whatever language, presumably Python or R, and I need to make it into running on some like a server in the cloud as opposed to just running on my laptop and the actual steps to having your code run on the cloud, like you could literally just go to Amazon or Google Cloud, pay for a virtual machine and then install our python, start your flask or Plummers service and actually have your code run that way. There are very straightforward ways to do this. But for like enterprise companies and production systems that are large and integrated, they have more elegant ways that can much more manageable. But the point is, it's basically like you're setting up computer somewhere in the world that are going to be running this code for you continuously. Harpreet Sahota: [00:23:50] And now that we've got the thing in production. Jacklin Right. We've got the model. It's in production. How do we keep the thing running? Jacqueline Nolis: [00:23:58] Ok, that's great. Great question. So ideally, if you have it running on a server in you us on the Amazon cloud, if you just have it running on a server, hopefully it should continuously always be running. Right. If you hit, start on your run of Web service and so on. It's called Plummers a service that does a in python use flask. Are there other ones too? But you hit start on the Python script and they'll start hosting this website. And so your system should just always be running so long as that service on that system should always be running. And any time you hit it, you go to that Yoro and you pass it the parameters, you will get your prediction back for the customers probability of turning. That said, you know, you want to do things like, well, if we got a lot of traffic, we want to scale that out. We want it so that, you know, software engineers, you know, people who are not Data scientists should be controlling that server. There's a whole type of job called dev ops to make sure these things are managed. And the modern ways of doing this are using things called docker containers and kubernetes, if you heard a lot or like systems to help do this. And so, like, there's a whole universe of people who have thought very deeply, very deeply about how to have this stuff continually run. And the software engineering world has done this for a very long time. And so as a data scientist, you really just get to piggyback off that and just set your code up to work in the way that whatever company you're working with, dev ops teams or I.T. teams have already set up for all the other code that they are continuously running. Harpreet Sahota: [00:25:17] And this next question, I'll turn it over to Emily here. So we've done all the math, all the statistics and whatnot to build an actual model right now that this thing has been released out into the wild. It's doing its thing. What are some things that we need to monitor and statistics and from a business standpoint, when that things in production and at what point do we retrain this model? Emily Robinson : [00:25:42] Yeah. So the first thing you want to monitor is like, is it even working at all? That is, it is like is it even accurate? But like are there errors happening? Emily Robinson : [00:25:51] Like is it still serving recommendations or giving back results so you can use logging to record if there's any issues when errors happens and you can you can monitor that. And there's a lot of tools out there that can help with that, like Data dog that you mentioned. Right. So even if it keeps working and it keeps returning results, it may be getting less accurate over time. And that's a pretty common thing, often called model drift. And, you know, we'll just find that like, yes, this was the accuracy metrics. When we put this into production, we're good enough that we want to do this. But they've gotten worse since then. So, you know, kind of the simplest way is you just basically run the training steps for the model again, but with new data. So with the last couple of months of data, you may find that this actually is enough to solve the problem and giving it more examples, you're giving it more recent examples. Maybe something's changed over time. You know, there may not be enough, because also when do you do this to just sort of keep an eye on it, like you build a dashboard to monitor it. And we think a better practice is to, you know, set a standard amount of times when you're going to retrain the models to maybe less many months and you can even automate that process. So if you have the script that loads of Data builds a model, you can actually put that on a schedule so you don't have to spend time, you know, doing the work yourself. You can just have it run automatically and have it send an alert like, OK, what's the new accuracy metrics? How's the new model performing? And if it's not, if it doesn't improve anything or somehow gets even worse and you manually take a look at it, but if that doesn't happen, you can have it set on a schedule and be running fine. Jacqueline Nolis: [00:27:25] I want to add two points to that. I think Emily is totally right. Just to point to that. One is having that model automatically retained. As Emily said, it's a common practice. It is a good idea. But what you are now doing is you now have two systems in production. You have the production model and then the production retrain. Are that automatically retrained? So that is twice as much upkeep of models, but that can generally be fine. You know, you might need two computers to virtual cloud computer running or whatever, but that's fine. You have now two basis to maintain. Point I would make, though, is that generally people tend to air on over retraining, like retraining to freckly and getting to the process of automatically retraining to early. Right. People like Dreft does happen, but I think people in general tend to overestimate how big of a problem is. And so a lot of the engineering work that people do around automatically retraining like every day the model retrains above. A lot of that stuff, I would say is premature optimization. Like you may be fine by hand retraining that once a year manually like that might just be fine. And if that is fine, the process of creating an automatic retrain or that retrain once a day like that's a lot of work to build that system and maintain it when you could've just done it by hand once a year. And like, that's fine. Who cares. Harpreet Sahota: [00:28:36] Thank you so much for that. Yeah, this is a topic I feel like that does not get enough coverage in many books. So I appreciate you guys included in your book and I love the treatment that you guys gave it in your book as well. And just for the audience that's listening, when it comes to monitoring things like you mentioned, model Dreft, Data, Dreft, concept checks, things like that, do you want to shout out some metrics that we can track and maybe look up so that we are aware of these things when they are brought up in an interview setting? Jacqueline Nolis: [00:29:03] Ok, so I will sort of answer this, but more on it, which is like, yeah, you can modder the root mean squared error of your model and the, you know, the percent of times their model gets the answer right. And there's like a lot of things you could measure. But I think the thing is here, there's really not like rights or wrong answer. And it's so dependent on what your model is. Right. Like the things you want to measure for Dreft in terms of a natural language processing model are totally different than the things you might want to for Dreft in like a Chern model. And even if you are in the world of natural language processing models, it is so different what you might want to measure if you're measuring using NLP to predict which category of customers and or to predict what response you should say to them. Like it is so context dependent that if an interviewer, whether they ask the question, what metrics would you want to measure for an arbitrary system of checking Dreft like I would. I mean, I'm like I'm senior enough to do this. I would open back up like I don't actually think that's an appropriate interview question. You need much more context before you can make a statement like that. Harpreet Sahota: [00:29:57] I find that with a lot of Data size stuff, the answer typically starts with," well, it depends". Right. So another thing that doesn't get covered in a lot of books is how to interact with and speak to non-native scientists. And you guys have an entire section dedicated to working with stakeholders. So, Emily, who are the various types of stakeholders that we may encounter in our data science career and what do they care about? Emily Robinson : [00:30:23] Yeah, so we talk about four different types of stakeholders. So the first is business stakeholders. This a little bit of a catch all. Emily Robinson : [00:30:30] But it's, for example, people who work in the marketing department or sales or customer care. So kind of non-technical colleagues. Then there's the engineering side, which is maybe the most familiar for folks. There's the leadership. So executive leadership at a company. And finally, there's actually your manager who may be a data scientist, but often people don't necessarily know how to work effectively with their manager. And that could be like one of the most important relationships and really determine how your career is going to go. Harpreet Sahota: [00:31:05] What's up, artists? I would love to hear from you. Feel free to send me an email to the artists of Data Science at Gmail dot com. Let me know what you love about the show. Let me know what you don't love about this show and let me know what you would like to see in the future. I absolutely would love to hear from you. I've also got open office hours that I will be hosting and you can register by going to Bitly.com/adsoh dot com forward, slash a d. S o h. I look for. Or to hearing from you all, and I look forward to seeing you in the office hours. Let's get back to the episode. Harpreet Sahota: [00:31:49] And can you share any tips for our listeners out there who might find themselves in a room full of executives and how they should tailor their communication when they're in front of that particular audience? Emily Robinson : [00:32:06] Yeah. So the thing to know about leadership, right, is that they're very busy. Emily Robinson : [00:32:10] This is the case, whether it's at a startup, to sort of usually where you're more likely if you're junior to be an executive to a huge company right there. They're all executives have very busy, tight schedules. So this means that you should focus when you communicate on being brief, on getting to the point. And with that, you know, you could have, for example, an appendix of some leaders do like to go more in depth, so maybe have that available. But, you know, you should have like if you're setting the report, have an executive summary at the top, which is a paragraph that's a good practice. In any case, if you're doing a PowerPoint presentation, just keep in mind that, you know, they're going to be really focused. I want to get to the point. They need to know what should they do with this data versus maybe the marketing department where you form a relationship, where you have multiple meetings every week and they want to really dove into the details. Emily Robinson : [00:33:03] The executive is not going to be taking on the project themselves, but they want to understand the implications of it and what needs to be done. Harpreet Sahota: [00:33:10] And Jacqueline, what are some questions that we can ask ourselves when we're considering possible task to work on? Because sometimes people just so excited to finally have a data scientist in the organization trying to have a data scientist to do some work, and we might get bombarded with a bunch of things to do. Jacqueline Nolis: [00:33:27] Yeah. And so I think this is an interesting question because it may not feel like you actually have that much control, but you work on as a data scientist, especially a more junior data scientist, you may feel like you are just given a task and you have to do the task, but usually you actually you may not realize just how much control you have in terms of like, well, what's the order and what you're going to work on, the task you have. And well, when you have a little time and you're not busy, like, what are you going to kind of do on in that extra space? And so even if you don't directly say that the team strategy is to focus on X, maybe maybe don't have that level control and you might if you're the only data scientist, but if you if you don't, you may indirectly get to contribute to a lot of like, kind of what exactly going to work on. And so there's really two axes we cover in the book. Want to access is how impactful is this work going to be like? Is this something that is going to help everyone like immediately in the future? How impactful is this going to really change everything? Or is it kind of like more of like a low stakes kind of interesting? Jacqueline Nolis: [00:34:21] And the second is, how innovative is it like really like new and exciting, or is it kind of like mundane, just like we have to do the same task again and again. And if you kind of cut by those two axes, you kind of can then get a quantum system where you have a couple of equations like one quadrant that Data scientists fall into a lot is this like ivory tower sort of quadrant. And this is the one that I feel very passionately about. Jacqueline Nolis: [00:34:42] And this is the area where the kind of work where Data scientists, you know, they have this grand idea, like, well, what if we build a model? Like, what if if our car company what if we made a model that designed the car itself? Right. Like like this grandiose idea that no one's asking for. They just kind of have a hunch that you feel like might be good and you spent years working on it. A lot of data scientists get stuck on this kind of ivory tower. I think that's interesting. But at work, it's not really impactful. It's not providing value right away because it's so ivory tower and like no one gets super innovative but so innovative that, like, you kind of get lost in the weeds. Whereas, you know, you can imagine there's work of like, hey, we have much more practical like this model. We really the business is asking for it and we can build it today. That's kind of the best kind of work. And even the mundane stuff of like, look, it's not interesting, it's not new, but like, hey, we had the standard reporting that just needs to get done. Let's just keep getting it done, maybe improve it a little. All of those sorts of work is like stuff you should think about and really just try and think about like, hey, am I avoiding these bad states like ivory towers where I'm doing work? That's just fascinating intellectually, but maybe it doesn't provide value to anyone. Harpreet Sahota: [00:35:41] I found that decision matrix that you guys laid out in the book to be really cool. Reminds me of the Eisenhower decision matrix. So I really, really enjoyed that. Your book is chock full of amazing content and figures and illustrative examples that I encourage everyone to go out there, get their hands on it. So thank you for diving deep on your book. Shifting gears here a little bit, I want to talk about building Data science teams from the ground up, doing research on both you guys. Seems like you both have some great experience when it comes to building teams. Emily, what do you think are some of the essentials to kind of lay the foundation on which the House of Data can be built? Emily Robinson : [00:36:18] So I think the first thing that any house of Data needs is good Data engineering. And so this can either come in the form of their already being a Data engineering team or those being hired at the same time as the first Data scientists. Or it will be your job as the first Data scientist to build that Data engineering part. Because even so, why do you need Data engineering? Couple reasons. So first is, well, you can't do much data science without any data. So, for example, if you work for a company that is mostly a website, an online business, and they're not tracking like, they're certainly tracking things like conversions. Right. They're tracking when people buy, but maybe they're not tracking pages. Maybe they're not tracking where people how people land on their site. Maybe they don't have good tracking with their email. And in that case, any questions about that before you can answer them, you have to implement the ways to record that data. But even if that data does exist, it may not be in the format of stored in the right way for you to use as a data scientist. So, for example, I talked to someone who was one of the first Data scientists at a company, and the data was stored in tables like, for example, page views. But it took them six minutes to run a school query for this table that was only about a million rows, took six minutes to run just because the data wasn't optimized. We started to be queried. It was optimized to be use made for the website. So I think that's really the first fundamental thing, because also as you build a team, you either have to look for other people, have that skills, or you have to be the one or working with the Data engineering team to lay that foundation because a lot of data scientists don't necessarily have that background. Harpreet Sahota: [00:37:56] And Emily, I'm somebody who's a first data scientist in an organization who's supposed to build a practice from the ground up. What are some challenges that you foresee me facing? And can you share some encouragement, some words of advice to keep me going? Emily Robinson : [00:38:13] I feel like I actually said I'm going to punt this question to Jacqueline because we actually have different. So I am someone who really never wants to be the first data scientist versus Jacqueline is like I you know, I love this. And it has been the first data scientist. So I'm going to let Jaclyn make that pitch. Jacqueline Nolis: [00:38:29] Yeah, I've been the first data scientist like three, four times, depending on how you count, like. Yeah. So I think right when you're first data scientist, you're juggling a lot of things. Like one is you kind of need to figure out what your job even is. Right, because if you're in a company that has no data science before, they don't necessarily know what you're supposed to be doing. Right. Should you be making models? Should you be showing reports to people like what is exactly what what is your the goal of your job? Usually you have to kind of define that. So that's part thing one you have to do think to actually have to do that job right. You actually have to be making models and you actually have to be creating reports. You actually have to be doing the work. And then lastly, you are setting precedents with everything you do for future Data scientists that will be hired. So if you use our future data scientists who will probably have to use these python to use Python, if you get so, you have to be constantly thinking about is this decision going to be good for everyone in the long term? And the job of the first data scientist is to balance those three things effectively. Jacqueline Nolis: [00:39:25] And so the pitfalls, I would say, to watch out for is just you could easily get stuck on any one of those things, right? You could spend so much time trying to figure out what you should be doing. They don't actually do any work. You could spend so much time doing actual work that you don't figure out what the right precedents are. And when people join, they can't actually like the team can't be built. They can't use a code like there's no precedent set there. And things are difficult as you scale. And you can spend so much time trying to set the right precedents that you don't actually do work or no one at business knows what you're doing. And they never decide to hire a second scientist. So that continual balance is just enormously important to being the first data scientist and really something I just like every hour. You should be thinking about it, I would say, to really try and get this right. Emily Robinson : [00:40:02] Jacqueline That sounds like so much work. Why would anyone ever want to do that? Jacqueline Nolis: [00:40:06] Oh, that's a great question. Why would I want to do I will tell you, as a person who likes doing this one, it rules because you get to set the precedents you like Harp the team uses. Are you like GitHub? The team uses GitHub. And this isn't quite true. As people join, they may want to change it, but being the person who gets to say the first thing is very different than the person who says the second thing. So that's an enormously valuable to craft the actual decisions the way you like. The second, it is enormously fun to figure out what your work should be, right. If you're the kind of person who likes solving problems, really like problems, then the puzzles of, hey, should we as a company even be thinking about churn right now as we are a company? Should we be building a forecast model? Like which of these are good places to put our effort? That is a fascinating problem to solve just as much as should I use a random force or an extra boost. And if you like those kinds of puzzles, then be the first data scientist is really good. And lastly, it can accelerate your career very quickly. By being the first data scientist, you have to learn a lot of things, including how to manage stakeholders, how to manage other data scientists as they join. And so if you want to become like a director or things like that, having the experience of building a team from scratch, being the first person like that is a great way to kind of get into that role. Harpreet Sahota: [00:41:11] Thank you so much for that. I really appreciated that. So what do you look for then in Data science candidates when you're trying to bring more people onto your team? Emily, do you have any tips on how someone can cultivate these qualities that you're looking for within themselves? I know definitely technical skills are a must, but apart from those one of those skills or traits that you're looking for. Emily Robinson : [00:41:32] Yeah, so as you mentioned, write like technical skills are kind of the base level we talk about in the book. Emily Robinson : [00:41:37] Like, for example, can you be a data scientist? I'm not to any programing in our. It is one we think there are reasons to do program, but to practically no most Data scientist positions now require you to program. So, yes, you mentioned that's a table stakes, but OK, what else do you need? And I think a big part that can be overlooked is almost all companies will have a behavioral interview as well as the technical interviews. So behavioral interviews are things like people asking, like what's a project you worked on and what did you learn from it? Or, you know, tell me about a time you disagreed with a coworker. And so the idea is they're looking to understand first, can you communicate because you have to communicate and that answer to, you know, what experience do you have? So, for example, with a disagree with a coworker, do you give an answer like, oh, yeah, I've worked with a lot of really stupid marketers and it was so annoying and they didn't get like this ah thing. And they disagreed on this, you know, and just sort of showing that you don't have empathy for the stakeholders that you worked with. You didn't try to see their side or you end the situation resolved. You were like, yeah, I got them. I got them fired because they didn't know how to do this thing. Emily Robinson : [00:42:46] So those are the types of things you can practice, how to frame your answer and also practice and think before you go into interviews like what are some situations, what are some projects that you worked on, difficult situations that you could use to answer multiple questions, because that could really help you prepare to give a thoughtful and informed answer. Emily Robinson : [00:43:07] And so the method that I like to use for answering these types of questions is called the star method. So it's the situation, task, action and result. So basically you describe what was the situation that happened? What was the problem? What was your responsibility in that situation? What did you do? And then what was the outcome? What was the result of that? And that really can help you tell an effective story to the person who's asking the question. Harpreet Sahota: [00:43:34] Thank you so much for that, Jacqueline, I'm wondering, how do you view data science? Is it an art or is it purely a hard science? Jacqueline Nolis: [00:43:42] Actually, I got to say, I really got a kick out of the name of this podcast because I am actually an amateur artist in my free time. And so truly, I am an artist of Data science. Anyway, I think it is. It's like I don't even think it's like I don't know if I call it an art or a science. It's much morelike a form of communication, like Data science. Jacqueline Nolis: [00:44:01] It's like, oh yes, it's building models and oh yes, it's being thoughtful and like how you design it and creativity and architecture. But it's very much of like, hey, you have a person here who disagrees with you. How do you convince them to agree with you using these numbers? I mean, that process of like it's like convincing negotiating, kind of like a little bit of charisma in there like that kind of using Data as a way of changing someone's mind. To me, that's what the essence of Data sciences is like, getting people to change their mind with the numbers that exist in the world. I think of it very, very deeply as a human thing. When people hyperfocus are hyper fixate on the exact algorithms and the exact accuracy and, you know, do I use means where there I mean, there's stuff that stuff matters, but it matters just as much of like, hey, do I actually show ten graphs in this presentation or one really thoughtful graph that, like, proves the point better, like that sort of stuff is so valuable. And to me like that to me is data science. Harpreet Sahota: [00:44:55] And Emily, how about you? How do you view data science? Is it an art or is it purely a hard science? Emily Robinson : [00:45:00] It is definitely not purely a hard science. And I think that's also where you can get issues. Emily Robinson : [00:45:05] So a big topic that's been coming up recently has been bias and so biased algorithms, biased data and the effects that has in the real world. Emily Robinson : [00:45:15] And I think it's very dangerous when folks like, oh, well, you know, like maybe the one there are people, but usually outside the field who are like, it's an algorithm, it's scientific, it's subjective. Like there's no such thing as a discriminatory algorithm. And I do think the feel like people work in the field have moved past that point and recognize, oh, no, there are definitely algorithms that do discriminate. So whether that's certain, Cathy O'Neil has a great book, Weapons of Mass Destruction, that talks about the effects of some algorithms that might be something like teacher pay or promotions. But then there's other ones where, like certain automatic tools don't work for people with darker skin. Or there was a recent one that came up with this fascial. The idea was, OK, we could take a pixellated face and get back out what the face actually looks like. And someone put a pixellated face of Obama and it turned into a white guy. And there's a lot there's why that is. And probably one of them is the training set probably did not have a diverse set of faces to learn from. So I do think that we want to be careful when we say too much, like, oh, this is subjective, this is scientific. This is just the facts. Like, this is just the statistics, because you really need to think about all the possible ways that bias crept in, because Data is not the ground truth, Data is not objective. And there's lots of issues lying there. So I think that is more you can approach it in some way scientifically. But there is some of an art of communicating with people, of understanding context. That's really important when your work.On projects that affect people's livelihood. Harpreet Sahota: [00:46:43] And Jacqueline, how would you say the creative process manifests itself in Data science? Jacqueline Nolis: [00:46:51] As a person who, like I said, does art in my free time when I'm painting, I'm like doing an oil pastel of a landscape. There's a certain amount of like, OK, well, I'm going to split this painting into first the mountains and then the trees and the trees, I can tell have two colors in them. And so creative endeavors still have a certain amount of problem solving and like breaking things up into smaller ideas and executing those ideas. And I think that same sort of thinking happens when you're building like a machine learning model. Right. The painting was to paint a picture up of some mountains. But the machine learning model, I'm trying to make a model that predicts churn. And so I'm going to need to break it up into creating the features and then the model choice. OK, the model choice I have to break that up into like I use hyper premeditating or not like, you know, just creating all these smaller problems out of the big ones. And I think to me, the creative process is very much how do you decide to take the big problem into split into little things? Like there's a lot of creativity, I think, in that, you know, I think you can just be very, you know, the difference between a model that technically that as a result and a model that is really easy to maintain and really thoughtful and its decisions and not hyper complex, I think that to me is that's a lot of creativity there. Harpreet Sahota: [00:47:58] So even if we don't think we're creative in our work, we actually are because it seems like there's a lot of choices. Right. And I think it's a choice that makes it creative in a sense. Jacqueline Nolis: [00:48:10] I mean, if your job is every day to run on a linear regression copy, paste the results into a PowerPoint and then email that to someone literally like that, then you maybe you don't have as much creativity. But if you're designing something new, then I think there is inherent creativity in the work you are doing. Harpreet Sahota: [00:48:26] I absolutely love that. Thank you so much for that. So the next pretty much all the questions. Let's go. Alphabetical order on this will go Emily and Jacqueline for responses. I'm first wondering if you can speak to your experience being a woman in tech and if you have any advice or words of encouragement for the women in our audience who are breaking into or currently in tech. Emily Robinson : [00:48:49] Sure. I think the two things that I would recommend is, one, definitely find a community. Maybe you want a community of other women or non binary people. So, for example, I'm a big fan of our ladies, which is a global organization promoting the advancement, inclusion of gender minorities. And ah, so that's one. But, you know, it doesn't have to necessarily be a group like that. It could just be your local meetup group or it could be folks you meet at a conference. But just having other people there, I think especially other women, is really helpful, both to have folks who are at the same level as you, to have people who are maybe coming up behind you to remind you like, oh, yeah, I have grown, I've learned some stuff and I have things to get back and to see. You know, women who have been very successful, which fortunately, especially in areas of Data science that are more statistics oriented, is quite common. Emily Robinson : [00:49:41] And my second piece of advice is to find sponsorship. So mentorship is something most folks are very familiar with. You know, I want a mentor, like getting advice from people, but sponsorship is actually often much more powerful. And research has shown that women are over mentored, but under sponsored sponsorship is when someone actually kind of gives you resources. So that could be financial scholarship for conference. It could be bringing your name up in a meeting recommending that you work on a project. It could be putting your name up for promotion, recommending you speak at a conference. And so we talk about this a bit in the book. I also have a blog post on the topic, but I think it's worthwhile for folks to keep in mind and be on the lookout and thinking about, OK, how can I get you a community, but also some folks who can help me advance in my career. Jacqueline Nolis: [00:50:32] So my advice for women and gender minorities and data science is a little biased because I have never been both a woman and a junior Data scientist, which is usually when the advice. So I've never been both those at the same time. But that said, the thing that I would say, in addition to Emily's good advice is there's a lot of broad areas of Data science. I think there are certain Subrata you can go to where they very much have this. Like Mocho, the bigger the Data, the better. How many layers is your neural network? Have you only went to a boot camp? That's not a real Data scientists are like there's a lot of that energy out there, but that is not the entire field. And don't let those areas intimidate you or make you think that you are less of a data scientist because you don't engage in those games. I mean, so much of the like, big data needs better above us. So much that exists out there. And I'm like a I'm a principal data scientist. I've been in this field for many years. I've barely touched on most of those cutting edge technologies that people talk about in those spaces. And I've been very successful and just fine. And I've helped out many companies and I'm no less of a data scientist because I don't use whatever tool they use, like, it's fine. So really, you know, by being in the space, as Emily suggested, by finding more women, you will get other voices. And don't just let that brawly attitude define what the field is for you. Harpreet Sahota: [00:51:51] I absolutely love that, Jacqueline. That's really good. So I think, Emily, to your point, there's a book that I really liked, I think is by Sylvia and Hewlett. It was exactly that title. Forget a mentor, find a sponsor. Definitely recommend that book for anyone out there listening. So what can the Data community do then to foster diversity and inclusion in our space? Emily Robinson : [00:52:15] Yeah, there's a lot of different things, I think. One is the creation of Meetup group of kind of these spaces where you can see yourself represented. Emily Robinson : [00:52:25] I mentioned our ladies, there's pie ladies as a new group, Data umbrella for kind of all underrepresented minorities and Data science. So I think that's one. The second is, I think to be very conscious of it and also think especially like maybe you're a white man listening to this podcast. How can you support people who are underrepresented and think beyond necessarily just giving advice? Because sort of on the sponsorship mentorship part, like people may think like, oh, I just, you know, people I need to help folks be underrepresented, folks become qualified. And actually a lot of them are already qualified and what they need are opportunities. So what they need is for you to recommend, say, your regular conference speaker. And you you see someone at a lightning talk, you know, maybe talk to them and say, hey, can I recommend you for this bigger conference or bring up their name in a meeting or refer them to a position. So I actually have someone who tells me they keep a list of women, great women speakers in the Data science field because they're a prominent speaker. They're often asked for their recommendations. And I think that's one concrete way that you can help out people is to use whatever kind of privilege and opportunities you've had and pass that along to other people. Jacqueline Nolis: [00:53:40] And I want to double down on this because I think family is right. But like, let's say you're a white man and you're like, well, what can I do? And having formerly been a white man who was recently foreign Data science, you're going to find there will be a point or a day. There's only one moment where you want to help. But that help would be too much, right? Like you want to support women and minorities. But in this situation, to support them means we would have to turn down this client because let's say we have a client who's sexist. We can't if we turned them down, then we would lose money, like there's a material loss there. They're going to be these moments where you're going to have to put things on the line, put your career on the line, your business on the line, and take hits in the name of women, minorities, diversity. Helping minority races these days will come in on those days. If you are the difference between being willing to take the hit to help people out or not, that's the difference with helping. I've seen both. I've seen people really be willing to go. Yes, it's worth it for us to turn on this client to support the minorities in our company because we know this client's offensive or whatever, like those sorts of situations. And I've seen AIs be like, well, yeah, this employee has been saying racist stuff, but we can't fire them. They bring in sales. Right. Like that. Those differences, that's where you're going to help out more than anything I would say. Harpreet Sahota: [00:54:50] Thank you so much for sharing that. I know that our audience is going to gain so much from that advice and that words of encouragement. They're so last formal question before I jump into a quick lightning round. And that is what's the one thing you want people to learn from your story? Emily Robinson : [00:55:08] That's a good question. I think I want people to learn that they can be ambitious, they can get more done that they think. But also that doesn't have to come at the cost of having no other life outside of Data science and spending all your free time like your training models. So, for example. Right, we wrote this book that obviously was pretty time consuming, but Jaclyn was raising a toddler while doing it. I still like went out to dinner, went went out on weekends, but now I get to look back and I was like, wow, my first couple of years of a Data scientists like I've published a book and, you know, so sometimes you just kind of like set some ambitious goals but find people who can help you along the way and you don't need to do it alone. So I'm so thankful that I had Jaclyn as a coauthor for doing this. And also I know some other folks who wrote books, Data science books. We asked them for some advice. Yeah, I would say, like just you can do maybe more than you think, but you don't have to do it alone and you don't it doesn't have to take up all of your time and block out everything else. Jacqueline Nolis: [00:56:14] So for mine, we don't really talk about this. But the chapter that I wrote that I'm the most proud of is a chapter in our book about failure. And what do you do when you're Data science projects fail. And so I've been a data scientist for many years and many, many, many of my projects have failed. And it's been a lot of career growth to go from. Wow, this project failed and I'm a failure and I don't know how to do Data science to hate. Failing is a very natural part of this. The fact that I'm failing means that I'm trying, which is important. And, you know, successes periodically come and this is just an acceptable risk. And there's a whole chapter about how to handle that. And what do you do when those failures? And I think the thing I want people. On my story is just, hey, let's be open about failures, let's talk about failures, and let's just create a Data science as a more emotionally vulnerable space where we can talk about our weaknesses, areas where we're feeling weak or anxious, like we can just feel safe and talking about them. And if you don't want to buy the book, definitely buy the book. It's so good now you don't want to buy the book. I do have a talk version of that chapter online. It's about an hour. I talk about my five biggest failures in my career and what I learned from them. And I think that that's the sort of thing I want people to get on my story of. Like we're all people. This is not about humans, not about us, about numbers. And let's care about our emotions. Harpreet Sahota: [00:57:27] I absolutely love it. And that chapter is helping me go through my own failure right now. Yeah, it's helping me. Jacqueline Nolis: [00:57:36] Yeah. That chapter was really I wrote it for me ten years ago. I'm like, you're past me. This is the chapter you need. And so, yes, it's really nice to hear that that is also helping other people. Harpreet Sahota: [00:57:47] And I'll definitely go ahead and look for that talk as well. And I encourage everyone to check this book out is probably the most different and most amazing book on Data science, because it's not even about Data size persay. It's just about persons' about the data scientist, which I think is awesome. So thank you both for writing this book. So let's go ahead and jump into a quick lightning round. And again, we'll keep the alphabetical order going here. The first question is, what do you believe that other people think is crazy? Jacqueline Nolis: [00:58:19] I think that probably about 80 percent of the work that Data scientists do is not useful to anyone, not like, oh, Data cleaning isn't useful, but like. No, like 80 percent of the models Data science builds are like just not useful. Like Chern models for Castlebar as a field, we make so many things that no one ever uses. The 20 percent that end up being useful are so useful that it makes the rest worthwhile. But I think 80 percent of what we do is we should have never even started on this and we're just unwilling to quit right now. And I think that, yes, I think as a field we haven't really grappled yet with how do we quit early? And I think maybe that's my mike hottest Data science take that. People might think it's just Buckwild. Yeah, that's what I got. Emily Robinson : [00:58:58] Yeah. I thank you so much. More original than me, Jacqueline. I'm like I'm like thinking of stuff that I was like, wow. I saw this person on Twitter once who also said that they think this is true. So I think you have a good heart. Jacqueline Nolis: [00:59:08] I've thought about that. No. So I thought about that when people like unpopular, quote your unpopular opinions like 90 percent of the time when someone says an unpopular opinion, it is actually an extremely popular opinion, but just in the subset of people they're part of. So it's like, no, really say you're offensive because like, I think dogs should vote like. Emily Robinson : [00:59:27] Yeah. Emily Robinson : [00:59:27] And I guess I think, well, this is still TBD. Emily Robinson : [00:59:33] Well, actually, probably a lot of people agree with me on this, but maybe it's more of a help than like a wild take. But that Data science needs to shift out of just like being in San Francisco, you know, even in New York, it's sort of like the second biggest thing. And, you know, we really I don't believe in this like Data is a new oil, but I think set up correctly, a lot of companies could benefit from it, maybe specifically data analytics. I think maybe that's how to take I feel like a lot of folks focus on, like, machine learning or like, you know, oh, my gosh, we've got to get machine learning. We'll have, like, the sort of awesome algorithms, like, no, no, just get someone who can, like, actually pull the numbers from your Excel spreadsheet of doom, which probably is a ton of errors because it's fifty spreadsheets long and like all color coded and just like get someone who can wrangle the data effectively and that's going to deliver so much more to your business than hiring some like machine learning engineer. Jacqueline Nolis: [01:00:22] So I actually think this is fascinating because I think at some level we are saying opposite things when we say I think at some level mine is we need less data science and yours is we need more data science. I think it's pretty easy to put that into. We need less useless fancy data science and more simple basic data science. Jacqueline Nolis: [01:00:38] But like at some level we are saying opposite things that I love that. Emily Robinson : [01:00:42] I mean, a lot of basic data scientists, data science also never gets used, just like ask any data scientist about a dashboard that that they've built. And they'll be like, yeah, I was used for like three days and then took three months to build. Yeah, yep, yep. Harpreet Sahota: [01:00:55] I'm still looking some wounds about a dashboard and not knowing when to quit on projects. So, yeah, a lot of stuff. It's just super hyped up. I think we should just keep it simple, parsimonious and deliver results. That is very entertaining. Thank you. And so if you could put up a billboard anywhere, what would you put on it? Jacqueline Nolis: [01:01:15] I think I would put just something nonsensical that make people feel like better like keep at it. And like a person giving a thumbs up, like I like a nice mountain picture I seen like I just like, let's get a little extra cheer. Things are grim. Yeah, I like that. Emily Robinson : [01:01:29] I also it's hard to think of like best for everyone, but like I guess something along the lines of like there's room for everyone and like other people's success does not diminish yours. Emily Robinson : [01:01:40] Like how we talked about all this gatekeeping and Data science. And I think it's like people really thrill threatened us, like, well, forget call the Data scientist, but I work so much harder than them. I did this fancy thing and like, that's not fair. Emily Robinson : [01:01:50] Like there is room for everyone and that that everything is not a zero sum game. Harpreet Sahota: [01:01:56] What's an academic topic or area of research or interest that is outside of Data science and mathematics and statistics that you think every data scientist should spend some time researching up on? Jacqueline Nolis: [01:02:11] I have an extremely hot take on this love. How hot? So this is not academic, but so much of Data science is storytelling, right? Like I was saying, like getting a phone executives and convincing something with Data. And so often people come to me and I assume Emily too, with like, well, how do I get good at the storytelling? I want to do storytelling. What about storytelling, what books I read? And you know who's really good at storytelling? Stand up comedians like stand up comedians are amazing. It like, here's a premise and then this happens and it's a funny result and like there's a timing element to it. They're always about captivating your attention, like their beats to it. That sort of presentation of information that a standup does is very much similar to what I do when I'm giving a conference talk or in front of an executive. But it is less funny generally. But that style of like, hey, I'm going to be engaging with you, captivate your attention and get you to think about something is something Data scientists really have to do. And I think strangely, if you are a junior Data scientist, you won't get better storytelling. I really recommend just like watching stand up comedians and really trying to dissect what they do in any given minute up at any particular stand up comedian that we can learn from. Jacqueline Nolis: [01:03:14] I just watched it. Oh my God, it's not the net. What was the second one, Emily? Did you watch it? Emily Robinson : [01:03:19] Got Douglas Hannah Gadsby. Jacqueline Nolis: [01:03:23] Yes, Douglas, that stand up on Netflix is very funny. She tells a bunch of stories in there that are really she has stories in there. Just watch. But yes, really go on Netflix. Fine. Stand up. Be like I think that's a great, great thing to do. Emily Robinson : [01:03:35] I like John Mulaney a lot of fun. Put it like he's the only white man I trust. Harpreet Sahota: [01:03:42] So and the same question for you. Emily Robinson : [01:03:44] Oh, yes. Emily Robinson : [01:03:44] So I'm a bit biased because but I would say actually, like the you know, my background organizational behavior is I really think pretty much like any one like working in industry should read because organizational behavior covers things like negotiation that covers the idea of passion. Emily Robinson : [01:04:01] There's really this was it's not exactly organization behavior, but a book I really like is called Unlocking the Clubhouse, which was two Carnegie Mellon professors, one, I think a computer science, one maybe sociology studying why there weren't many women in the computer science undergraduate program. And so they did a qualitative study. They did lots of interviews. And it was really interesting. And one of the things I took away from it was a big thing they found was women would say, well, you know, I haven't been wanting to do this since I was five and I don't dream and code like they do. So this idea of what you need to be passionate to be successful in computer science and passion is a very specific thing. That's just like this myopic focus on computer that you've had since you were a kid. So I really think there's just a lot people can learn. And besides, the Harvard Business Review is a great place to start because people will publish that for more popular audience and it's a lot less. And they might give an overview of a bunch of studies on negotiations rather than you having to read like twenty different academic papers. Harpreet Sahota: [01:05:03] And so my next question is going to be the number one book, fiction or nonfiction that you'd recommend our audience take or rather you recommend our audience read in your most impactful take away from it. Does that book recommendation standards or another one that you wanna add to? Emily Robinson : [01:05:16] I really want to recommend that. But I also would love to recommend this is one of the books you recommend in our book and the resources section, which is Bird by Bird, some instructions on writing in Life and Lamott. Emily Robinson : [01:05:26] It's kind of a mix of like obviously it's instructional writing. It's also a bit of memoir. It's just a really excellent book. And I think a big takeaway from that. The title comes from when she was growing up, but I think it was her brother had this project on Birds Do, and it was just like really big project that was supposed to do over the course of, like months. And you just hadn't done any of that. And it's the night before. And he's freaking out because he's supposed to be, you know, supposed, I don't know, like right up about like 30 birds are saying it's a huge thing. And their father just said to him, you know, just take it bird by bird. So basically take it piece by piece. And I think that it might be really helpful for folks looking to get into Data science, because if you look at the whole field, if you look at all the things that like are on some lists of things, you have to know it's just completely overwhelming. But remembering like you can take it piece by piece, you can build incrementally and also like stick with Data science. You don't actually need to know that top 20 list, like, I don't know half the things on it. And I am still a data scientist. Harpreet Sahota: [01:06:22] I love it, Jacqueline? Jacqueline Nolis: [01:06:23] OK, so I got to one nonfiction helpful. So nonfiction helpful. Others a book, difficult conversations, how to discuss what matters most. Like I was like I think I said like eight times in this podcast, but I really believe it's so much a Data science is the work of convincing someone else of an idea using Data. And this book just discusses like just really how to think about conversations and, you know, how to express ideas that may be controversial to others, especially. Right. Like if you have a model that shows something and but, you know, the executive believes the opposite. These can be very difficult conversations. So I found that book to be illustrative. And the other book I would recommend totally unrelated nonfiction is I just really like the fifth season, which is by Encaged Jimson. Yes. And it's so good. And it's a series that's the first book of three. They're fascinating. Jacqueline Nolis: [01:07:11] I burned through them so fast. She's a black woman author, so you can support black authors. And it's just it's a great book. I highly recommend lots of fun. Emily Robinson : [01:07:21] Yeah. And that trilogy, I believe so it came out I think one year after each one came out, a year later. I think she's the only author to ever when the Hugo Award three years in a row, which is the biggest award for sci fi and fantasy writing, and she won it for each of those books. Emily Robinson : [01:07:37] So it's just an experience. Harpreet Sahota: [01:07:38] Yeah, I'll definitely be adding those to the show notes and I'll check them out myself as well. Thank you so much for that. So if we could somehow get a magic telephone that allowed you to contact your 18 year old self, what would you say? Emily Robinson : [01:07:51] You know, maybe say don't stress so much like and I'm like, you know, things well, things will work out. But like in general, look, I'm pretty happy where I am now. Emily Robinson : [01:08:01] And so I don't think even, you know, there are some experience like, oh, I wish I had known that earlier. Like maybe I should have taken another computer science class would have been helpful. Emily Robinson : [01:08:08] I think overall, I like Steve Jobs quotes. Emily Robinson : [01:08:11] Like, you could only connect the dots looking backward and looking backward. I can really see how everything led up to where I am now. But if I had set out with this plan, like I want to be here where I am, like ten years later, I don't think it would have happened. So I think I actually just sort of going with, OK, this is what I kind of want to do right now. This is experience and not worrying about like a five or ten year plan because just so much changes and a case, especially in a field like Data science. Jacqueline Nolis: [01:08:37] Yeah. So I think mine is similar. And I've also had the thought of like, well, if I went back in time, I wouldn't I would change the things that the bad things that have happened to me are the things that have been difficult are by far the areas where I've learned the most. So I don't want to avoid the difficult things that I end up doing in my life, because those are in many ways been the most helpful. But I think maybe that would be my advice to my eighteen year old self. So eight year old Jaclyn was so obsessed with optimally living my life and getting into the best college and getting the right major and Belova and like doing this real optimization, I think it's kind of ingrained into us of like you have to do everything right if you want to be the most successful in life. And I think the message I trying to receive is, hey, you learn the most from the failures, don't stress so much of and try and optimize so much to do the exact optimal thing because there is no optimal thing. There's no one parameter you can optimize. There's every decision is going to have different outcomes in different ways. And stressing about A or B or anything like that, it doesn't really help you in the end at all. Harpreet Sahota: [01:09:31] What's the best advice that you've ever received? Harpreet Sahota: [01:09:35] You want to go first, Jacqueline? Jacqueline Nolis: [01:09:37] Yeah, sure. So years ago I had a boss. So this is maybe not directly advice, but years ago I had a boss and I was a more junior. Jacqueline Nolis: [01:09:44] You know, I just started like kind of do leadership things. Like I the first time I helped start a team, but I still pretty junior at being a leader. And my boss, I thought was a really great leader, just really fantastic. And so he didn't give me advice, but he was such a good role model that I like basically just tried to mimic exactly what he did. And what what was that? It was very much. He was very open with his employees and things are going well, but they weren't, you know, he was just very clear and very cordial with here's what's going on, here's why and here's how I feel about it. But similarly, like when he was in a meeting with clients or anything like that, he just was very cautious and slow and methodical in the way he talked and thought and really took his time. If someone said something to him, he would wait. He'd think about it for a second and then give a response. And so this is an advice. But by watching him act, I realize that so much of my behavior was anxiety induced. I'm moving so fast. I want to do everything as fast as I can because you got to do things fast. You got to think that and allowing yourself to slow down, take time, really just, you know, not rush and just give things that care there do that makes an enormous difference in how you work as a leader. And that just totally changed my perception of how I should think about working for me. Emily Robinson : [01:10:53] So it's not exactly like advice I was given, but watching other people and sort of being thrust into myself, which is so I called like top driven development. Emily Robinson : [01:11:01] So there is some Data scientists that I know that will sign up to like give a keynote on this new package that they have not actually written yet. But there is now a deadline and they have to, you know, get working on that package because they've committed to speaking about it. So in my case, the first talk I ever gave was about six months into working at ATSE. It was at a local meetup because the organizer, Jared Lander Lander, asked not if, but when I would be speaking. And I don't think without that plushie necessarily would have done it. But by like. All right, I guess I'll like, oh, I have a couple of months. I'll sign up in August. I'll talk about a testing. This is what I've been doing at Etsy. That was really helpful to me. So I think what that kind of comes down to is knowing yourself, knowing what motivates you and sometimes having being pushed a little bit to share externally can really help, at least for me, like motivate myself and also give back. And so that's something I really care about. Harpreet Sahota: [01:11:56] What song do you currently have on repeat? Jacqueline Nolis: [01:11:59] So I have a toddler, which means that every time we get in the car, he wants to listen to the same song over and over. And that song Just Happens to be me by Taylor Swift and that guy from Panic at the Disco. But yes, Taylor. So it's me is absolutely what's on my life right now. Emily Robinson : [01:12:13] And for me. So it's funny. It's so Spotify makes you like an on repeat playlist. I don't often like put on that playlist because I'm definitely someone who can listen to the same song over and over and over again for hours to some of the ones are on. Emily Robinson : [01:12:25] We just watched Hamilton, so I saw the show about four years ago. We just watched the the movie, which I highly recommend. So that's one of them. And then also some songs from Crazy Ex-girlfriend, which is a TV show that I really recommend. Harpreet Sahota: [01:12:37] Yeah, definitely. Check out Hamilton. I notice that it's on Disney on demand now. So that is lined up for this weekend. Looking forward to that. Thank you for those song. I'll just call them recommendations. So where can people find the book? Emily Robinson : [01:12:50] This depends on your personality. So if you're more like you, a serious, like career oriented, you can find it at Data sci career, Dotcom or Jacqueilne? Jacqueline Nolis: [01:13:00] or and you can tell which one of us has what you are. Jacqueline Nolis: [01:13:04] Or if you go to best book Dark Cool, you get the more zesty, hey, we're going to do great with our careers version of the book. Now both of those URLs will bring you to the exact same site, but which Oura is you're going to bring with you as you purchase the book depends on if you use Data side carrier, dot com or best book. Dudko Yeah. Emily Robinson : [01:13:23] And both of those lead to manage. It also is available on Amazon, but we recommend Manning because you can also get forty percent off at any time with the code build book. Emily Robinson : [01:13:34] Forty percent so and forty percent is for zero percent sign. Jacqueline Nolis: [01:13:38] Yeah. And you, if you buy the print version on Manning you got the print and the e-book for free and there it's so it's like it's cheaper than Amazon. Emily Robinson : [01:13:45] You get more, it's like yeah it's OK if you get the book on Amazon, the physical copy there is a like they do this weird thing. We also get the e-book copy bilic. You know, they have a special code hidden in the book. Well, I learned something from listening to this podcast. Harpreet Sahota: [01:13:59] So how could people connect with the work? They found you online. Jacqueline Nolis: [01:14:02] We're both highly Twitter people. Very Twitter. Emily Robinson : [01:14:05] Yes. So Twitter's a good place. So I'm Robinson underscore E. Jacqueline Nolis: [01:14:10] S and I am Skye Tatra. That's České Wiki T, a sky Tatra. Emily Robinson : [01:14:17] And we also both have blogs and websites, so mine is hooked on Data dog and I am Ginola dot com. Jacqueline Nolis: [01:14:24] And that's where you can get like yeah we, I think we all have our topics, videos and blog posts. Harpreet Sahota: [01:14:29] Yeah exactly. And we thank you so much for taking time out of your schedule to be here today. I really, really appreciate just everything is fun conversation. My first time doing two people who just happened to be the two most coolest people ever. So thank you. Emily Robinson : [01:14:44] Thank you so much.