open-office-hours-dec11.mp3

[00:00:07] I what's up, everybody? Welcome to the artist of Data Science Open Office hours, or shall I say the @ArtistsOfData Science Happy Hour. I'm so happy you guys are all here. We got the we got the room filling up. People coming in from everywhere to. Thank you guys so much for joining in today. I hope you guys are able to tune into the episode this week released an episode with Donald Robertson. He's the author of How to Think Like a Roman Emperor and Stoicism of the Art of Happiness. One of my favorite interviews that I've done this year because I'm a huge fan of Donald Robertson's work. So definitely check out that episode if you have not gotten a chance to. We got people coming into the chat and into the room. Thank you guys so much for swinging by. Before we get started, folks, I just want to take a second to call out that the largest protest in human history is happening right now in India. I'm not an economist. I don't know the details of all the bills that were passed there. But I do know that hundreds of thousands of poor farmers are exercising their right to peaceful protest. They're being met by undue aggression and violence by the Indian government while being portrayed as terrorists by the Indian government. The vast majority of these people are my people, people from Punjab, and all they want is to exercise their right to be heard by their government. It's a peaceful protest and they're being met with violence and aggression. This is a violation against human rights.

[00:01:36] I just want to say that injustice anywhere is a threat to justice everywhere and that I stand with the farmers. That being said, guys, thank you so much for taking time out of your schedules to come into the @TheArtistsOfDataScience happy hour. I'm so happy you guys are here. Welcome. So we got a lot of people in today, so thank you guys for coming in. Who's up and who who like to ask a question. All right.

[00:02:01] So how about a bunch of new faces here? So that's cool. We got Camille, we got shake hands and say hi, this man. So many, so many people, man. How are you guys doing? Thanks for the update. Hey, how's it going? Good, good.

[00:02:18] They are great. I have a question to kick it off. Yeah, absolutely. Man, you know, I'm an analyst. I'm not a data scientist yet, but I look at these charts that you see on the Internet about the daily breakdown of a data scientist right there. Like sixty percent of it is data cleaning. So any data scientist are on here, no matter how mundane, like what your latest early data task you had to overcome and kind of, you know, specific like your life on or whatever, you know, what can you do, what what kind of command we're using to clean up the mess you have to deal with?

[00:02:53] Yeah, I'd say like sixty percent of any project that you start out with, the first chunk of time is definitely spent on that Data cleaning aspect. So the biggest challenges that I've had to face is, first of all, people not knowing where to get the data that I need to do the job right. Because you'll start in a project like, oh, we think we have data to get this project solved, but we just don't know where it is. And you'll spend a lot of time interviewing people, I guess, quote unquote, interviewing people, trying to figure out what they know about the data. And then once you do identify where the data is, you then have to figure out what the hell each column represents. Right. So I found a lot of times that, you know, the the last couple organizations I've been that don't necessarily have really thorough documentation in terms of data dictionaries.

[00:03:44] So that's always been a challenging task is OK, first, I need to figure out what data it is that I need to make progress against this problem statement, and then I need to figure out what these columns mean to give it some context. And then from there, it's just a matter of, OK, how do I pass through this data? How do I combine it in a meaningful way? And that in itself can get that takes a huge amount of time. So I guess, like, I don't really spend sixty or eighty percent of my day actually cleaning data, but the any given project that I start working on. Yeah, maybe about 50 or 60 percent of that is just data understanding and data preparation. Dave, what do you think.

[00:04:25] Fifty percent man. I envy you. That's what I think. Yeah. If you if you think of data management all up acquiring data, understanding data, cleaning data, rinse and repeat until you get a significant significant enough data set to actually conduct your analyzes. For me, the 60 to 80 percent figure which has been quoted since the nineties has been accurate for me all along. So it takes a lot, and especially if you're if you're lucky enough, I would say in your analytics work to actually grab new data sources that are from maybe outside of the firewall of your organization then the number. Really goes up after that, but that's also the fun part, you come up with a hypothesis and you say, hey, if I grab whether Data or whatever, I could maybe do something awesome and people go go for it. And then you're grabbing whether Data and you're cleaning it. It's fun, but it's also a lot of work.

[00:05:19] Mikiko, what do you what do you say? How much of your time is spent cleaning Data and let us know for touching on your question. No, you're asking like tooling and stuff like that.

[00:05:29] I just know the specifics, too. I mean, almost like if you can like an agile fashion, like, what can you do this week? So if you were cleaning this week, like I had three thousand states and a state field and it was another mass or whatever or something like that, I'd like to hear just like real World War stories of the kind of cleaning exercise people had to go through.

[00:05:50] All right. Oh, I'm in one of those four trenches right now with my minivan. This hear about it?

[00:05:57] Yeah, and I think to so I think the percent. So OK, so I think the percent of Data cleaning and the the extensiveness of Data cleaning, I think really to some degree depends on the maturity of your company or organization or client that you're working with. So for example, like prior to like currently I I'm I'm running a startup cofounding, a startup, launching one within the next like two or three months. Right. And so that's thing I like. Money is a bit of a constraint. And so we can't immediately go to like Eniwetok or a GCP stack, especially because we're dealing with Data that first off, like is like regulated. There's some security concerns around it, not because of concerns around the Data, but like in general security concerns about how we handle the pipelines and also things, too, like there were things I think I learned as a data scientist that are great when you're working on, like independent research projects, but do not translate very well when you are talking about scalable, efficient pipelines that have greatly latency and all the other stuff. Right. So right now, in setting up the pipeline, like we're dealing with, like a lot of like for example, we get description fields that have just special characters all over the place. So the minute you try doing like a copy from into postgrads, like, it just, you know, it just AIs. But like probably down the line will eventually have to do things. For example, like there's clean the Data to make it useful for modeling and analysis.

[00:07:35] And then there's also cleaning and managing the data. So, for example, you don't have bias that comes in. Right. So the staff I work on is real estate bias around gender. Race in the US especially, there is actually like a lot of different sort of factors and characteristics and attributes that correlate with race. Right. And when we're talking about real estate Data, you know, it's something we just have to be very careful about. We're like if we're recommending things to people, we want to make sure that those services are or those kind of products that we're recommending are not being biased towards whether someone is male or female, whether they are like a gay or straight couple. Right. Whether they're married or unmarried, whether their wealth is an equity versus cash, like some of those things are kind of interesting. But they're they're kinds of things that like when you're cleaning Data, you do have to be aware of those. Right. So but I think it does depend on like kind of like what organization or what kind of team you're working with. If they're like super sort of developed, they might already have analytics engineers. And so cleaning is just formatting it nicely. But if you're working like really bare metal, like in a more sort of bootstrap situation, I would say like cleaning is like cleaning and like munching is like eighty percent. Right. Like if you're you're lucky if you get like twenty percent, it's like modeling and analysis.

[00:09:04] I would even add to that another issue is just deal with duplicates like are they true duplicates or are they just look like duplicates. That's always an issue as well. Monica, how about you? Do you have any stories from the the trenches or stories?

[00:09:21] Yeah, I have a recent example, actually, so it depends on how you're receiving the Data. So hopefully you have access to database within the company that's already clean and already maintained and easy to get access to. But in some cases you have to you don't have that. So you have to get reports that are coming from management. So whether that be in an Excel file, CSV file and the same with Excel, sometimes I'm sure Dave knows this very well, is that it likes to manipulate your data behind the scenes without knowing anything. So one example, I recently got a Data set the. Some dates in it, and for some reason, I was working with a client in Mexico and the US has dates month, day, year and everywhere else has been day, month, year. And for some reason, every other row was switching the formats. So there was no way that we were just like really trying to figure out how do we go around this. There is no way to really like cat made or string cool or anything like that. So what we did to resolve that actually is we were able to pull the records in a text file and it just came out wrong and all the dates for the same format. So that helped us there. So it s it's a little tricky.

[00:10:47] Also, you have to watch out for those carriage returns. If you end up with one of those inside of every Data frame, that can be quite messy. UTF encoding is also the thing that is a pain in the ass. Hopefully that answered your question. So I've got a great queue of questions here. I'm going to go down the list. We'll start with the pool, with a pool. Go ahead and eat yourself. Let us know what your question is that after people go to.

[00:11:14] Hello, everyone. So so my question was basically just to understand how to break through into the Data size industry. I'm currently working in product management with one of the global human capital management firm and SUMAI my current profile and revolves around getting some requirements. So basically, just as doing the product on our product management work. So it involves some sort of analysis as well where we do some production analysis and the tools which I use as basically squill. And and then also if you want to know some trends I use probably. But what my interest lies, basically, I want to grow myself into Data science and or propertied analysis field. So just want to get some understanding from from from the experts and the call that should I follow the path of learning new tools are or should I just focus on just the old fashioned way to learning the statistical learning or like learning new language or so the old ancient way is Classic's right.

[00:12:23] I wouldn't call it automation. I'd call it the tried and true principles that have withstood the test of time. That being said, I would say you don't actually need to have a job in Data science to do Data science work. It sounds to me like you have a significant portion of your job responsibilities involve working with Data and there's no body really over your shoulder telling, you know, you can't use this methodology or no, you can't use this thing. You can use the tools that you want to learn to get the job done and deliver value for your organization. And nobody is really going to care that much. All right, Tom, what do you think?

[00:13:04] Sorry for being slow there. I was typing an answer to someone I loved where you were going Harp. And I think it's an opportunity to do what Nick and I were talking about today, one on one chat. It's good to know your internal partners that are decision makers are your business counterparts. Basically, you're developing your internal customers and go and find out where they're hemorrhaging and find that perfect intersection between what you can do now with Data and what their needs are. And then just give them the most stupid, ridiculously simple visualization you can. It sounds like you already have skills in that area with results and then build that up and then make that one of your first Data science problems. It may not turn into that, but they'll start to see you as the guy that can bring Data to answer questions. Then you look for opportunities to do some predictive analytics. But sometimes just telling the story well with Data makes you a hero.

[00:14:07] So another thing I've actually done this.

[00:14:10] I've done operations, program management, like I love Data and it's kind of like kind of where you are. I had Data to play with. I could experiment with it.

[00:14:20] I could do what I would call minimal analytics.

[00:14:26] And so I started talking to people within my company that were hiring the kind of jobs that I wanted to get into. And I asked them what kind of skills are you looking for? What kind of projects are you working on? Is there anything that you need help with? Exactly what Tom was just saying. I got a job out of one of those by accident. So use your network, listen, go in with questions, but never be afraid to reach out and ask other people's opinion because you may stumble on something.

[00:14:58] Thank you very much. And for Dave, you look like you dropped a great comment into the chat. Do you mind sharing that with this?

[00:15:03] He's sure why not? So I used to be a PM at Microsoft, don't hate me because of that. So when I put it in the chart was that product management is actually, in my opinion, an awesome place to start building your analytics portfolio because you have a ready made business problem for you to use your analytics to improve, which is the product that you own. And I'll give a specific example that might not necessarily be intuitive. So I used to run a team that was in charge of all the buy and data warehousing assets and analytics assets for Microsoft, supply chain, operation build next boxes and all that crap. And we implemented some telemetry on our SQL Server data warehouse. So we put in database added specifications, which allowed us to track every single query that was issued against the data warehouse. And then we use that to do a couple of things. One, we did some process mining to understand how people were using the data in terms of flows of queries as part of a socialization exercise. We also were able to determine which tables were actually used very much so we could drop them, saving money from sand storage perspective and all kinds of things. So I'm just using that as an example. So all kinds of product management roles have an opportunity to use data and use analytics. And if you have access to event logs and telemetry, I just put this in the chat. You've got a gold mine opportunity because I want to give you two words. If you're not familiar with this, you want to study it, process mining. Good, good stuff. Right. So you take your event log and it allows you to transform it into useful information around the customer's journey or user's journey through your products. While the awesome stuff. And I'll be quiet now.

[00:16:42] Thank you very much, everybody, for that. So it sounds to me like the big takeaway here is the fact that if you want to start breaking into Data sites, you are positioned already to do so. It's just a matter of you just starting to use the methodology in your day to day where people do you have any add on questions? So that I was that was that good?

[00:17:03] Well, absolutely. Thank you so much, guys. Thank you. It did actually. That clearly explains that probably in every way we can use some sort of analytics. But as as I think everyone has mentioned, that it's product management and whole sort of has that kind of sort of a background where we can use analytics every day. So thank you so much for your guidance.

[00:17:23] All right. Let's go to got next in the queue. I've got Naresh Norrish. Are you still here? Go for it.

[00:17:28] Yeah, I'm still here, but everybody else join in. My question is trying to start working on the Data portfolio. I just wanted to know. I've been digging some I've been digging data sets, the easy Data stance. But I was there a whole bunch of data sets, but I couldn't figure out which ones. So my question is, does anybody know the easiest data sets to start working on Data portfolio?

[00:17:52] So you probably don't want the easiest data set to start working on Data portfolio. You'd want the easiest data set for you to start just messing around with some of the methodology so you can get familiar with maybe how to use this.

[00:18:03] I could learn API, use the Panda's API, but ultimately the data set that you use, it's not really that. OK, let me rephrase this. If you use the Titanic data set, mass data set. But what is that? The breast cancer classification Data said those are probably not going to make for strong or interesting portfolio projects because they're cleaned up. They're ready for you to try the type of portfolio you want to build out should be built around something that you are interested in. So it's going to have to come from a from a place of introspection. I think you need to sit back and really think about what kind of problems am I interested in solving? And then from there, try to find the appropriate data to then make a I guess, a statement or a progress against that problem statement. I hope that makes sense. Let's open it up and see what maybe Brandon has to say about that. Welcome back, by the way, Brandon.

[00:18:57] Hey, everyone. One interesting presentation that I've seen recently became from a person within the company and I'm currently working at and he wanted to get into Data science. He brought he has a tracking device that he uses when he goes on walks with his kids. And he found a way to get that data. And he just started collecting it for I think so far it's only been a month, but he plans to collect it for longer. Then he just started making all these graphs about, you know, with random columns there. And a lot of the conclusions were obvious, right? There was something like how many animals did I see as a function of the time of day or things like that? And I was OK, you know, there's there's nothing groundbreaking here. But it showed that he had an initiative to find some Data real world Data, if you will, and then learned all the the graphing, just like Harpreet, just like you had shown. Right. Started to use Jubera notebooks and all the different plotting options. And I thought that was that doesn't mean that you shouldn't use the other datasets that are available. But I just thought that was a creative way for somebody to break into the field and show their interest. And and I can see the person's interest in the data set. Right. Another one of your points. Because he's just wanted it was just curious.

[00:20:05] It's like every day I take this walk and what if I just after this date and looked at it, I think another important night, important, but another interesting project would be if you have Spotify, for example, if you can pull your last year's worth of Spotify listening Data right from their API. They have a very robust API called Spotify, and each track has a wide number of metrics associated with it. So if you can tie in weather data with your listening data and maybe come up with a project to see how does weather affect my listening behavior on Spotify, do I tend to listen to more acoustic songs when it's raining outside, or do I listen to more upbeat songs, something like that. Just something interesting along the lines of what Brandon was talking about. Hopefully that answers the question. I'm going to go ahead and move on to the next question. We've discussed projects and Data for projects in a number of other officers.

[00:21:02] I urge you to go ahead and look back onto those and you'll get a bunch of other ideas. Let's go to Jake now. Jake, as we go on strike, still here and it looks like Jake might have disappeared and left Jake or Jacob, whichever one you go by. OK, let's go to OK then. OK, how's it going?

[00:21:22] Yeah, I guess so. My question was quite similar to the question, so I just wanted to know, like, what should I do to get the first interview that I feel like I'm a graduate student right now. So looking for opportunities. And my previous experience was completely in software development. I have done a couple of projects in Data science in my academy, so I just wanted to know what should I keep in mind to get a first break through into the science fair? So I just wanted to have some expert guidance.

[00:21:56] Yeah, definitely. Open this one up to Mikiko. Looks like Mikiko is having some audio issues. How about either Tom or Dave?

[00:22:05] Ok, I'll go ahead and jump in. So we talked about this in previous office hours. And if you're on LinkedIn, you've seen it all over the place, right? So you have to have projects, you have to have projects and they have to be like really good quality projects. Got to be a good hub. You got to have good code. It's got to be documented. You've got to have nice documentation to communicate all the things that you can do. It's just not throw up a python notebook and just say, oh, there you go, I'm great now. It's not going to work. You need to have a great project. Unfortunately, these days I would argue that's not enough either. That's just table stakes. I just get you on par with everybody else. So as I've talked about last time at Happy Hour, I'm a big fan of taking up the next level and start creating some content, start doing some content marketing for yourself to try and see if you can find some sort of ritual. Meet up where you can do a talk, where you can explain something, things you can demonstrate your knowledge, your communication skills, YouTube tutorials, I mean, all kinds of stuff. So you got to have the project. Absolutely. But the portfolio may not be enough. How do you differentiate yourself? Take it to the next level with some content marketing is an output that does that demonstrate your knowledge and it starts establishing what we call as influencers call authority authority.

[00:23:20] Monaca, any other tips on how to break into Data science? And then we'll move on to the next question by Jim Oshri. So go for it, Monaca.

[00:23:30] Yeah, so just be curious and I myself am best. I know other people like to be specialists or maybe those t I forget what it's called, but with the with the T or their general S, but they find one area and they go deep down into that. But just learn all that.

[00:23:53] You can be sure to show that you are willing to learn and that you do know how to learn, because once you get into a job you kind of start at ground zero anyway because you need to start gaining that domain knowledge and understanding that particular business and all of that. So if they know that you have that curious mindset that you are eager and willing to move forward, those are really good soft skills to hone in on.

[00:24:22] Let's get Mikiko. She is back online and go for Mikiko. What does she need to do to break into their science?

[00:24:30] Hey, can you guys hear me now? OK, but then this is my worst, like, terrible pair of speakers too. So this is for OK. But this is also for the other questions that I'm seeing the chat and before that about like how do you break into the assistant general. So two things to kind of I think think about is, number one, so do you like some machine learning? It's we want to think of those as like meta skills, not necessarily just like as a single sort of group of like functional roles. And I think where we're seeing. What we're going to see is that more and more like we're going to see like Data science machine learning skills to be required as part of jobs, but that's not necessarily like their own jobs by themselves. What do I mean by this? Right. So the way I like to kind of you know, what I like to ask people is like, what is the part in what do you want to do? Like with the new science machine learning, what is the kind of work that you want to do? Because a law, the rules are around three persona's its strategy and analytics. You're working with, you know, internal business teams or you're working with clients. Help them leverage like Data to drive their business or drive decisions. You're either doing research. So you're trying to figure out answers to kind of novel questions or novel uses of existing technology, or you're doing the engineering side.

[00:25:52] Right, because essentially that's how you sort of deliver value at scale. So if you're thinking about like, how do I break into the science? I think the first question is to ask yourself is like, what do you want to be doing? Right. So if you're not already working as a data scientist. Right, why do you want what is the kind of work that you want to be doing that would be under the label of the science? And I think that's a little bit more important than the title. It's like, do you want to be doing research on me doing more of the engineering side, or do you want to be doing more of the like working with business partners and teams to, like, make decisions? I mean, I've been through kind of those first two, I try them, said I didn't like them. And so now I'm kind of moving more towards engineering part. But the other part to think about is where are you starting from? So the bigger the gap is between where you want to go and where you are currently, the more you're going to have to prove either in the form of projects, referrals, Kaggle competitions like whatever. So a lot of times it's just easier to start with, like the track that is closest to your existing skillset.

[00:26:58] So if you have an experience, for example, in project management and working analytics, then I think the best sort of thing to do is just figure out, OK, like what is the where are the questions I can solve within that area? And then you figure out the skills you want to learn that go along with that. If you're starting from software development, it's a very similar thing. It's like what kind of problems you won't be solving, what kind of products do you want to be delivering? And then where are the skills that you need to be doing to kind of go along with it? And that's like hopscotching is one way to kind of like ease your way into Data science without, like, feeling like you're jumping off like the deep end into a pool. So it's safe for everyone who is, like, thinking about like, how do I break into decides there isn't going to be one cut answer because your unique situation where it's like where are you coming from with your skills and experiences and also where are you going? Is going to be very unique to you. And so it's best to kind of figure out, well, how do I leverage what I already have to kind of take that like the smaller steps forward? And what's the closest, like milestone point to that?

[00:27:57] Thank you very much for Keigo Tom. Go for it.

[00:27:59] Yeah. Real quick, I get asked this question a lot, people reaching out to me on LinkedIn and I would Harp call them me. I apologize. I was thinking deep in thought about something else and meet them. What did you see that they do have a lot of people get that switched around and just be a data scientist always. Your current role doesn't define who you are. Always be a data scientist. No. Well, you ask if you're a data scientist, what do you do? You do data science every chance you get. Once you have that mentality, you will eventually have the role of something like Data scientists, this or that. Now, what do I mean? Literally someone that's like a daughter to me now. Manpreet was just she took she had to take a job, so she got a Data engineering job. And she came with me with her tail between her legs and said that I had to take a Data engineering job. So what do you mean had to. I'm ecstatic for you. That's going to be great industry experience and you'll be able to apply some Data science skills. They'll do better in that role. Well, it totally flipped her perspective. She started to realize, yeah, I'll know how to work with Data better. I'll understand how to work with my Data engineering counterparts. And then I think what Maseko things Supreme Court in the field is so broad and someone else is saying that the comments don't don't think you're going to get your dream job right away. Just keep doing your best work and every role you're in. But remember, are you a data scientist or not? Then do what data scientist do.

[00:29:37] Absolutely love that, Tom. Excellent advice. I think people tend to overestimate how long a year is when you're looking at a career of twenty plus years. So if you have to get on the parallel track for a little while just to get adjacent to where you want to go, whatever, chalk it up. It's all good.

[00:29:55] Next up, we got so just a heads up, Dave, looking at the chart, people want you to go deep into process mining. We'll do that at the very end. Everybody stick around for that. Right now, let's get through the rest of the questions. Next up, we got a question from Mashariki Machree. Go for it.

[00:30:09] Hi. So I was reading this article. It said that the ABSOL agreements need not be normal. It's just that the areas after modeling has to be normalized, has to be normal. And if it is not normal, we cannot draw a conclusion by hypothesis testing. But as for my prices, I used to always just not and I to absorb variables before starting and then take the. So it's kind of contradicting for me. So should we normalize or even if we don't normalize, how would that affect.

[00:30:53] So stop reading articles and read books instead or things from university websites.

[00:30:58] I think anybody can put out a article on Medium and just write whatever the hell they want and people will look at it as if it is like the truth and it's not. You can easily find university websites and and they're much more reputable sources. To answer your question, you don't. In order for you to do a ANOVA or a Z test or a test, it does not require you to normalize your variables prior to conducting the test. You can do the test and then you notice afterwards that maybe the error of whatever it is that you're testing is not normally distributed. That means that you're violating the assumptions. So whatever conclusion you make from that test is probably not going to hold. So that's kind of what I think you're asking if anybody else would like to chime in or if anybody thinks that I was completely wrong, let me know.

[00:31:55] Please be careful. Right. A lot of people will say, oh, you got 30 you have 30 observations under the central limit there and you can assume normality. Be careful about that, especially in the business world, because a lot of times the underlying assumption is that you have homogeneity of variance, which often isn't the case in business Data. So I'm with Harpreet, take a look at some books. And in particular, I've rediscovered econometrics for no other reason because econometricians have to deal with imperfect data and they've come up with a lot of techniques to work with it. And generally speaking, business Data looks an awful lot like economic data. So that I would say the introductory book on econometrics and study a good idea. Or you can use process behavior charts, statistical process control because they do not assume homogeneity of variance. So they're wildly, wildly useful.

[00:32:44] Brendan, any thoughts on this?

[00:32:46] Yeah, it seems like a kind of a specific question. Generally, as I go through my day to day work, I always think about this is my problem and then what should I do? And if it comes to me that, yeah, I think what I need to do is normalize this for this reason, then I'll do it then. So I think without having the whole context around what the goals are and what the data looks like, it's kind of hard to give a general answer.

[00:33:07] Yeah, yeah. I guess like for this question, I'm not sure if all you're concerned about is I'm trying to do a hypothesis test, whether that is ANOVA or test or Z test, and you're questioning whether or not you need to normalize your data before conducting your test. The answer is no. You do not need to do that. Um, do you need to then look at the distribution of your errors afterwards? And this is specific to like, you know, linear regression. Um, then, yeah, look at the distribution of errors. And if you notice that they are not, I'd normally distributed zero one. Right. Then you've violated your assumptions and whatever results you get is going to be garbage essentially to your question.

[00:33:51] Yes. Yeah, I got it. Thanks.

[00:33:53] Yeah. Any other input on that. Anybody would like to let me know. So Christian had a question but I think he left hopefully. Christian, you got your question answered in the chat door. Everybody, I'll post the chat on the show notes as well. So we'll be able to to keep up with that. Next up is Ferrars everyone.

[00:34:17] Can you hear me? Yeah, loud and clear.

[00:34:20] This my first happy and excited about it. I have a question. So this is pertaining to Kuwait and situations like this. So it's about how does one account for relevance like or relate to predictive modeling? Do you just give a few months of data or come up with a correction factor for your models?

[00:34:39] This is specifically for a time series.

[00:34:41] Models, yes. Even models which account for behavior of of customers. For example, I'm looking at what sort of products I'm selling and looking at Data. Now, if I would look back standing in December, looking back last year, the last six to 12 months, the data is confounded because of it right now. Right. For some sectors, it's just a gender question. But it does apply for different scenarios of what predictive modeling. I just want to get your thoughts on that.

[00:35:12] Yeah, I don't have an answer for that one. So I'm going to throw that right back out there to people out there who are far smarter than I am.

[00:35:18] So either I can speak to this one. This is Ben. Oh, Ben. Hey, how are you doing? Hey, I'm in the car, so I love this question, by the way. So one of the things we talk about is you actually can't anticipate something like covid coming. So think about the models you built Premraj. They were all invalid as soon as Marchette, if you're doing logistics or something, that's where covid impacts your model. And so one of the things we talk a lot about is you need to be able to catch your model going off the rails. So we talk about future draft prediction distributions. What are the alarms in place? How can you stop your models from being bad things like sending an infinite toilet paper request or something where you're like, oh, crap, I never thought this would happen. So you want checks and balances in place, but then when the crap hits the wall and you've got pressure to retrain the model, I redeploy. A lot of people were unable to redeploy models in March because they had to wait to get enough data. One of the tricks you can do if you've got a lot of urgency to do something quickly is coming for the financial world. They'll do exponentially weighted weights on their observations. So you put a lot, say, still train on a big data set, maybe a few months, but I'm going to put some exponentially weighted factors on the most recent data. It's really powered my residuals through. That's one of the ways you can get around sparse data, because if you're dealing with like two weeks of data in March, you're kind of screwed that you could still take advantage of that. So that's a trick. But you can't predict every every regime change.

[00:36:42] So that's my my two cents is solid gold and appreciate that. Anybody else have any thoughts on how to deal with this? One thing you can go to Data. No, you go, buddy.

[00:36:54] Ok, one thing you can't do is you can engineer specific features. So, for example, if you're using sort of tree based type of algorithm like boose or something like that, you can literally put in a feature that says essentially this is a covid time period feature. And sometimes what that will do is that will then trick the the the booster trees to say, look, I'm getting all this shit wrong, so I'm going to go ahead and focus a bunch of extra trees in in the boosted ensemble to focus just on that. And you can get some temporary uplift. Now, another thing you can do, and this is often not very popular, is just break the truth to people to be like, I can't really do anything right now. I'm sorry, the world is different now and I don't have any Data and therefore I can't actually improve the model. So we might have to do something old fashioned, like actually use like a big dashboard for the time being until we get enough Data.

[00:37:44] I really love these previous two answers. One thing that occurs to me is that I'm glad Tim's out there because we've done this presentation together before, going through your machine, through your machine modeling pipeline and just start working it. In other words, you've got to become friends with your features first and look at your distributions. You may not even know what features you're going to use yet. You may know that you may have several models that are different labels to help deal with the issue at hand. But if you just start going through the methodology, going through the pipeline, looking at potential features and looking at some unsupervised learning, just some some key means you can nurse neighbors, stuff like that, look at clustering basically, then just start marching through it. After a while, you start to get a feel. I can still electrico linearity, I can look for things doing the same job here. But over time you might get a feel for oh well I'll just throw this out there. The nice thing about throwing something out there is it's probably going to get shot down, but at least it's like a Thomas Thomas Edison mentality. Then it's like, OK, that's not hitting any bugs, but you're not shooting in the dark. You you have and you have a good clue about what's important. And just by throwing out and you can ask for feedback right away, it's really kind of comes back to the thing we were talking about earlier, of if you have a spirit of just trying to meet the need and and you do micro releases and you put something simple out there, say this is how they want feedback, so you do better. And the community, this community especially, will rally around you.

[00:39:24] Man, those are such great responses. I definitely learned a lot from that. Brandon, I see you're needed there. Do you want to jump in now?

[00:39:32] I just say we did something like this earlier in my career around 2010 when I was two years after the financial crisis. And we're trying to build these risk models to see who should get a credit card. So we're looking at this past few years and Data and say how much of this is valid then we had, like, nobody knows. Right? So we just had to make our own best guesses to say, OK, there's going to be some sampling that we do. We're going to have to select certain ones that are good and versus others. Well, at the same time, accounting for the seasonality of things right through like, OK, Decem. A special month in terms of if people get credit cards and what ends up happening to that credit card. OK, so let's let's just chew some. And it was just, you know, I kind of hate to say to say this, but it wasn't Data. It wasn't based on Data. It was just a couple of us are sitting around thinking, we want one December, let's do that. But let's not pick Decs.

[00:40:22] We want this one. We don't want this month because like the stock market did something crazy in this month where the economy is on the crazy, we probably want two of these months. These look normal and then we use that as our terms. And then the waiting, as Ben had mentioned to. Right. We did the waiting later stuff and we're like, OK, I think this is what we can do now. And from then on, it's it's that monitoring. Right? It's OK. I think this will hold us down for a few months. Maybe we probably have to train more often than we would in the past because things are changing so much. And then you just keep monitoring that and eventually you'll get to some normal that as you can see from my comment. Right. There's at what point are you normal guys? You know, we had some extraordinary event back then. We have an extraordinary event today. So I guess that's I don't know, all my answers always come down to like an educated guess what you think is most reasonable.

[00:41:09] It depends so much on so many things. And I think the selection criteria for your Data that you decide paczki model is definitely very important. Mikiko, go for it as your unmuted.

[00:41:20] Yeah. So this is where it really pays to have a good relationship with your business partners and to feel like everyone who everyone smiling, who understands what that means. So like for example, it could be an extraordinary black swan event. It could also be that you're opening up in to meet new market and you have to give them like pricing assumptions. And you could do like look like analysis of the Harp. And when I was working over at the solar company on the finance team and we were entering into like five or six new markets, of which, you know, not to go too much into the solar or serve specific parts of the business, but the price things were different. The sort of regulatory incentives were different, even the sort of competition. Right. So whether it's a black swan event, whether it's like a new market or a new product into an existing market, having that relationship with your business partner is really important because essentially, like, for example, if they're in sales. Right, like they are very much so sort of tied to like what is my quota and target and commissions. You're telling me I'm not going to hit it. Why? Right. And if you already have an antagonistic relationship with that, there's not going to believe anything you say anyway. Right. And they're just going to be like, oh, I can't work with this analysts. Right. You know, but if you do have a good relationship with them, if you can sort of communicate respectfully and also sort of like do a give and take sometimes like things like, for example, we're not hitting our numbers because of Koed.

[00:42:51] They're going to be a lot more open to it. And it also still comes down to the fact that, like, forecasting is still kind of like a human art. It's a human business. Right. So when you say forecasting to statistician, I think a lot of times I think Time series analysis, when you say forecasting to a business partner, a lot of times they think, oh, this is going to be like this artifact that I work on with the data team where they will, you know, include like assumptions or like, for example, we know this probably is going to hit the market earlier. Right. So that's something to kind of like really think about is are you trying to account for it in your model just just to, like, understand where things are going or how far off were you? Or do you need to include it because you're giving sort of strategy and decision making advice to a business partner who has these kind of like KPIs that they are sort of accountable to? Right. And they'll tell those two things will play slightly different. So that's just kind of, I think, a good thing to think about.

[00:43:53] Thank you very much for being with us. Awesome. Awesome advice from everyone there. Greg, I'm curious, how are you guys handling this at Amazon?

[00:44:02] So simply as this massive team based on their expert in econometrics and they're pulling all sorts of data from a raw materials perspective, and I can't speak too much about it, but all I can tell you is see, for example, if you are reselling products that are based on copper as a raw material, how can you go back and track the impact on copper that might go up or significantly down in terms of price or demand to predict what the finished goods will do? So knowing when those triggers happen for the raw materials will help you position your modeling for the finished goods a little bit earlier where you can capture some. Spikes in demand way sooner than being a reactive company, so there's a lot of analysis being done where we go beyond just the finish good. And go up to even the raw materials upstream to catch to capture some spikes.

[00:45:21] Much of that that hits your question.

[00:45:25] Absolutely. Love the perspective. Thank you so much.

[00:45:29] Right on, man. Thanks for asking. Next up, we got Eric.

[00:45:32] Hey, so this is kind of a explain it like I'm five question. So I am trying to understand the kind of different degrees of deployment, whether it's a model or an app or whatever, because it's like at some point, you know, somebody putting something just up on GitHub so you can kind of see it or so you can clone it and work on it yourself. Then there's somebody who's putting up who's making a interactive dashboard, something like that's something I've been thinking about doing. And they're setting it up on a website like Hiroku or something like that. And then you have something cloud based. And I just don't exactly understand how those are all similar and how they're all very different from one another. And I'm not talking about, like, you know, enterprise scale stuff like that's totally foreign to me and not really that useful right now, but just kind of as it's like kind of near what might be within my my world. I just would like some explanation of those different things.

[00:46:33] Definitely. I think this would be a great question for for Ben. Ben, if you're still around.

[00:46:39] Yeah, I'm here. Can you hear me? Yeah, yeah, yeah, yeah. I love this question. So I think think of it as a dial from experimental to applied an experimental.

[00:46:49] It really doesn't matter what you're doing, like you're doing a notebook and a one off and get in it if it runs on this version or doesn't like I'm going through a tensorflow under the bus early on, tensorflow one point zero. Google would post blogs that wouldn't run like literally within a few weeks you would run the code and it wouldn't run very, very buggy. Some of these software packages had a lot of bugs early on. And then when you go to apply, do you actually run into things like user experience, things that are outside of the machine learning world that would fall into the app design space? But in the machine learning world, the thing with teeth, the thing that keeps people up at night, the nightmare, the skeleton in the closet is service level agreements and customer experience.

[00:47:27] And so with that, the service level agreements. Imagine deploying a model where you actually are on if you want me. Sorry, I got two kids interact with me. So with service level agreements, you actually sign contracts where you will be sued if your API goes down. So there are clawbacks. So so literally, if a region of Amazon goes down, it doesn't matter that Virginia went down, it's actually your fault and you'll be sued. And so a lot of these multiple deployments, we actually have to have triple redundancies in place.

[00:47:58] So it brings the engineering mindset kind of a sloppy answer. But I like the scale of zero all the way to ten being applied. It brings a level of sophistication that sometimes requires teams to support.

[00:48:11] Eric, that answer your question.

[00:48:13] That's a little bit of a follow up follow up clarifying thing. So we see like I was looking at this guy, Sean Sullivan, on LinkedIn. So he's got a project that's dashboard made with streamlined and hosted on just a website. Right. What is the difference between, like this static? It's a static dashboard. I think it's pulled from a static data set as opposed to and then like I'm really trying to figure out what the heck a cloud deployed model or whatever looks like and like practically like when have I seen one of those and just not recognized because I'm the customer and I don't have to know that or something like that.

[00:48:52] I've been in Tacloban.

[00:48:54] My audio glitched out by the quake. I was going to say there is four miles to get deployed. We care about cost so costs can fall into an instance that's always on. Or think of the serverless instances like lambdas where they're fleeting. And so for people that do very high volume influence with our startup, we were doing one hundred million Invensys per month and we had to run on lambdas because the cloud cost would have killed our startup, would have been tens of thousands of dollars a month in Always-On instances to support where lambdas was really, really powerful or just existing for a second to to fulfill the prediction that probably again, doesn't answer your question directly. But there's I feel like as soon as you Google applied, all of the all of the terrorists of running a business come after you. What is the cost? What are the contracts? What are the agreements? Whose fault is it? The models are not working. It's always your fault.

[00:49:45] I reckon that the models that Amazon, the recommendation engines that are running when you're on the website, that's probably all in the cloud. Likewise for like Netflix movie recommendation and things like that. So hopefully those clarifying. Yeah, that helps. Thanks.

[00:50:00] So can I take a stab at it. Yeah, absolutely, yeah, absolutely, go for it. All right, all right, Eric, are you familiar with Python at all? Oh, yeah, of course. That's too bad. OK, I'm joking. I to very crude terms, if I understand your question correctly, imagine, if you will, that you're a developer and you're working on your local laptop and you create a model using circular or something like that, you could literally save off that object as a pickle file. That's literally the binary representation of that model being trained and you can move it around. So imagine, if you will, you uploading it to the cloud and there's some sort of service. And what it does essentially have loads of the pickle file in a cache is in a memory for you. So it's constantly ready to be hit from low latency for low latency calls. That's conceptually how most of these types of machine learning pipelines work from the major cloud providers like, for example, Azure Machine Learning, which is the one I'm most familiar with, literally. That's what basically what happens is you create this workflow and it executes. And in the end there's some sort of binary representation of your model. And of course, they save it off to disk for permanency, but then they also load it up in memory and they cache it so that any time it needs to be called, it can be called in from you is the customer. You'll never know whether that's like an on premise server hosting it or some VM or some service in one of these cloud providers. And I don't know if that exactly answered your question, but I thought that's kind of what you were asking.

[00:51:22] Yeah, it's kind of gets closer to where where I can actually put my hands as opposed to and some things that are just really I have failed, you know, Dave, resurface some scar tissue from deployment day.

[00:51:34] So the other thing, too, is what version of the model are you running so soon as you throw something on the cloud? There's this idea of constantly retraining different versions. And so that version management is an issue you have to take on as well. Besides just having one model.

[00:51:48] Yeah, and the SLA thing that Ben mentioned is immense because this is a true story. By the way, when I was working at the Evil Empire, somebody wanted us to give them four nines in Azure itself was only three nines at that point. So how do you make that work?

[00:52:03] Eric Pickles and this guy. That is your answer.

[00:52:07] I was only one problem with Dave's answer. He tried to bring up the Harp on war.

[00:52:13] So I thought for products hours for research.

[00:52:17] Yes. Oh, yeah. I deserve what?

[00:52:22] I deserve it. Guys, it's it's the holiday season, man. Let's let's be friends. Let's all just be friends. Right. So next up, we got Abe. Is Abe still in the building? All right. So looks like Abe and Jake bounced out, but I was able to find their questions. Jake's question is really cool. So I'd like to open this up to to anyone in the in the room right now. His main question is, what's the most rewarding part about being a data scientist? Why are people going into this field? I'll kick it off by saying that I just love solving problems. Like that's the reason I got into neuroscience. I just love really challenging problems that forced me to learn new things. So that's the most rewarding part for me, is just solving interesting fun, challenging problems and forcing myself to just have to learn and grow and become better every single day. I don't think there's many careers out there that are like that. But then again, I haven't had many other careers except like actuating biostatistician. And even in those roles, I felt a bit stagnant. So Data scientist is definitely pushes me to become better. Why do you guys love being there? Scientists who to open opened up Mark what he loved about being a scientist that this mark.

[00:53:44] Yes. Yes. OK, yeah. I mean, I think the big reason why I pushed into being into science and my background is health care. And I just saw the writing on the wall. Where am I allowed this big change? Our system is going to be happy through Data. And I felt like communities I care about, underrepresented communities represented that, Data said. So I want to get ahead of this disparity and see what I can do for that. And so it's a very mission driven thing. I work on products that have value. I'm not health care right now. I built I II products to help people be happier in their jobs and but is also tied to to well-being. And so similar to you, I'm constantly learning every day, but it's a nice bridge between coding and research and stats, but also like business problems and talking to people and being very interdisciplinary. So it's honestly a really rewarding career and it's constantly like mentally challenging. And I think the scary part is that there's so much to learn. I can't learn it all. But that's also the best part is I get to learn something new every single day.

[00:54:50] Monica, what do you love about being a scientist?

[00:54:53] A continuous learning, obviously. That's my jam. I just love learning new things every day and to. To your point, those people out there that wanted to become detectives but couldn't get into the field, you can put on your Data detective hat and solve some business problems. I really enjoy helping others. So if I can help others solve business problems or helping others kind of learn how to learn, that's that's really what I like is it's rewarding to me.

[00:55:24] Tom, what do you love about being a scientist?

[00:55:26] So I started out. Gosh. So I'm older than dirt, but I, I got really passionate about physical system modeling and design control systems stuff. And there was a point in my career where I was in my heyday at that and it just all went away because of corporate stuff. And I found myself in a role and I just magically migrated into Data started doing more and more Data. I think I started seeing Dashboard's before it was a thing and I was actually studying Data science techniques before we even called it Data science. But it was a way to stay alive because I'm just addicted to predictive analytics and making the data flow and doing anything with Data tells the story. So it was just my way of survival. And then just imagine my thrill when all of a sudden it became the sexiest thing you could do. I said, are you all just now figuring this out how old you are?

[00:56:24] The sexiest damn thing to do, Mikiko, what you love about being a scientist.

[00:56:29] So if thing is, I didn't I didn't start out loving it. Right. So there's my favorite book. It's so good looking. Or if my counterpart he's like my my professional crush, I guess. Right. But when I graduated college, my parents wanted me to become a doctor. The tiger parent dream. Right. Doctor, lawyer, engineer. Right. The Trinity in finance someplace. Right. You know, so I kind of had to figure out what I could even be good at. And the thing that so the thing that I'm very appreciative of is like I feel like Data science and tech, that there is a lot of problems for sure right around like sexism and racism and and so many problems in tech. But I feel like for me it's been the closest path to like the American dream, you know, where my my mom immigrated. And, you know, I'm trying to like, you know, sort of live the American dream as her child. And it's why these careers where it really is more dependent on, like your personal initiative, you know, like how dedicated are to learning how willing you are to pick up skills and apply them. And it's really not like an age or gender anything thing. Right. It's just like you as an individual, like are you willing to dedicate yourself to no one kind of always feeling stupid. Number two, to not really being the smartest person in the room.

[00:57:50] I regularly feel dumb, especially when I have to do daily stand ups now. And I'm like, this is the progress we made with my code, you know? But if you're willing to dedicate yourself to that and also to, like, help people and learn from others, like, I don't know if there's any other and I did work different jobs. Right. I don't know if there's any other job where I would have met so many smart people who have kind of like bootstrap themselves to really applying this like these skills to do really cool things. And so for me, it's still kind of like it's you know, I'm very appreciative. It's a very emotional journey. But it's also something that is like I can kind of connect with because I can I know, like I just need to learn the skill or like solve the problem. And it's always there. It doesn't it's not dependent on, like, my gender or my age or my looks or my family's wealth, which is very, very broken. All those things, you know, even education, you'll sometimes get people that like kind of go like I'm like, oh, you're not Stanford, you're not Harvard or like whatever. But I think there's fewer and fewer of them, to be honest, which is great.

[00:58:55] California State University and the University of California system right here, baby. This was all about so I being an immigrant myself of Indian heritage, like the doctor, lawyer, engineer community definitely can resonate with that. But I think that lawyer has fallen out of favor. And it's now doctor, lawyer, data scientist. If, if, if, if I can say so, I'd love to hear what Brandon and Dave love about being data scientist. And then after that, open the floor to questions that I might have missed. I've been going through the chat and I think I've gotten everyone's question, but if I've missed it, we'll get to it right after we hear about what Brandon and Dave have to say.

[00:59:38] Yeah, I did. And I'm pretty surprised that a lot of the interest that we've got, I didn't know it was going to get so personal. But for me, I like at this stage of my career, I like working in teams and leading teams. One of the most rewarding parts of my job now is just mentoring the people who are who are working for me and getting them getting them through, like from the. Style Data science, they learn to or here's how it works in real life, and then also like many others, I like affecting people's lives. Some people come from more modest backgrounds. And I'm thinking, man, if I can teach this person these skills, then this first can really make a big difference for their family and for themselves. So I like that part. I also like working with other teams. So now I'm at a state where I don't do everything anymore. So if I need something from a Data perspective, I work with the Data engineer. If I need deployment as such, I work with the software engineer. And then when it comes to how is this going to affect the business I work with, like a business owner, this is I'm just using, like, agile terms, but I work with the business owner, so I actually don't have to do everything now. And my role now is more like the glue. So I'm the one person who does everything else together. Everyone does their special parts.

[01:00:54] And I like that kind of role a lot because I'm getting to see different things that are happening across my team and then drive them together to to get to everyone's collective goal, which is our KPIs and such. One last thing I would say is I also like the organizational impact that it has. So depending on the situation you're in for me, when I came in, executives thought, like, we have all this Data do something with it. And then the first thing you learn is that all this data is like a big mess. So then it's like, OK, we need these people who are entering into the system. We need them to enter it more consistently because the different teams are entering it differently. And now I can't build a model because everyone's doing things differently. And I have to preach about like Data as fuel and the need for clean Data and how what you were doing before. It works if the audience is another person, but now the audience is a machine and an algorithm. So you have to do things differently now. And that takes a long time and some people won't even buy into that at all. And half of the people will think like what you're doing, is it going to work? And your prototype sucks, which it is, because it's your prototype. So that's all challenges. And just working through all that is rewarding for me.

[01:02:03] Let's see what you love about being a scientist. And I'm sorry I have camera on bias. I keep forgetting that Ben is here because I don't see him on camera. Ben, after Dave, let's hear what you love about being a scientist.

[01:02:13] So I won't say that I'm as old as dirt, but I'm rapidly approaching the age of mud. So like Tom, I've done a number of things in my career. So I think this is a really important question because, for example, some people want to get into science because they think it's a lucrative career path. Well, actually, you can make more money as a Data engineer and even more money as a software engineer. And you're still solving problems. You're still using your creativity. You're still doing a bunch of stuff, because I used to be a software engineer back in the day, so that's not it. So the reason why I like being a data scientist is or an analytics professional, as I typically refer to myself these days, is actually what really gets me excited is affecting the course of the business, more so than writing a cool algorithm, more so than training a model, more so than watching my cross-validation go. Yeah, I've got great generalization. Estimate's awesome. No, that's all cool. Don't get me wrong, the geek in me loves that. However, the thing I really, really like is when an executive comes to me and says, Dave, we've got a new product, how should we price this? And then I use Data and I influence the strategy and how things are going to happen at the business. That's what I really enjoy nowadays. In five years ago, I probably wouldn't have given that answer. So that's another reason why I like being in analytics. Thank you very much, Dave.

[01:03:24] Ben, thanks. I love this question.

[01:03:27] So I'm a my dad's a doctor, my mom's a lawyer and I was a homeless hippie prophet or whatever. I lived in the woods and was figuring life out. I fell in love with High Performance Computing Seminary there, but I feel like Data science is really magic, like just it's almost like Harry Potter stuff. So the stuff that we can do today would have been considered impossible five or ten years ago. And I just love that about them, about the field, because it's a blank slate. So think of a creative project you want to work on. It's not hard for people to come up with a lot of really good ideas that have never been done ever. And and so I think that's why this field is so exciting, because the impossible is redefined every couple of years and it just blows my mind. And then the good that you can offer to people, it's every discipline needs this. So whether you love philosophy, science, engineering, it doesn't or if you came from law school, it doesn't. You can you can attach this to anything you love and improve it. So, yeah, super exciting field, super rewarding.

[01:04:24] The homeless hippie prophet has spoken. It's a question now from I think it's Data or D Day. And so this question might have been answered already because many people ask the question in this office already, but go for it anyways.

[01:04:37] Man Hi everybody. Thanks so much. Can you guys hear me? Thanks so much for having me. Time. First time I got invited by Erichson. Really happy to be here. And I know some people have already covered the question, but I kind of just, um, I was I'm coming from the military. I was a minister in Geneva for the past five years and that just recently got out. I'm looking to transition to Data. Science and and I'm specifically interested in being like a Data science in the financial industry, like investment banking and kind of finance stuff. So I just kind of wanted to get some on some advice on what I should go. And I really haven't done that. I did something when I was in university, but I haven't done that much. I didn't do that much, didn't listen to me very. So I just kind of wanted to give the advice today, like think about maybe going to school for one year master's program. Should I try to find some online courses to focus on?

[01:05:39] I guess just trying to get some advice on which way to go from here and how to start. Thank you.

[01:05:45] So in terms of the master's program like that, I don't think you really need a master's to get into the science. And I realize that I've got master's degrees and I'm saying this, but I'm curious, like, what is your, like, status in the U.S.?

[01:05:58] Like, are you already a citizen or do you need to go to school in order to stay in the States?

[01:06:04] Yeah, I'm I'm already I'm already thinking, OK, well, a lot of the resources are free online or you just need to make a road map for yourself to figure out what skills you to pick up. And I mean that that stuff is is covered widely in a lot of places. But I'll turn this back into it to the audience there. But just to specifically address your question about that, I believe you're saying you're interested in using data science machine learning and like the finance sector or something like that. A great book that I would recommend by Stephen Jensen. He wrote a book called Machine Learning for Algorithmic Trading, and he has a very, very robust GitHub on a very robust repository on GitHub, which is essentially the entire book outlined. So I point you to resource to really understand how to use machine learning within that context. But for this question, I'll flip it back to whoever wants to tackle this one. Go for it.

[01:07:07] Ok, I'll go. I sorry I was typing too. I put a link to the book that Harp just mentioned in the chat and then Data I just sent you a biblical message that I copied and pasted that I said the people that asked me for started help, but in the long run, big things. If this was an ultrasound lifestyle, if you don't like continuous learning, you need to find another field. Just be honest there and then it's going to take a while to pace yourself. But keep moving and keep your mind fresh because you have to have a sharp mind and a tired mind is not sharp. So sometimes it takes time to get into steady shape. But you know what I mean. There'll be times where you can work eight hours and you've got to make yourself take a break. There'll be times you can't focus more than ten minutes because it's super hard new material. But the other thing is what was said many times already and in this office hours is don't be afraid to just start doing Data science and share on LinkedIn post what you're learning and how it's so exciting. Just start there, start writing blogs when you can start building up better and better. GitHub repost. I love Dave Sencer earlier, but I think it's OK to recreate just an escalation practice. Data set. Just do it. Really good show. Exceptionally good visualizations along the pipeline and then and show that you tried different models and what the how they compare just showing that helps new people and you can direct people to that. And that's you can what I'm saying is the real point that David Langa was making earlier is don't just do another české, learn GitHub repo, do one that helps everyone get much better. And then other things you'll see I'm going to be quiet.

[01:08:57] So here, thanks to you for a very short answer.

[01:09:02] They do like me and learn econometrics study statistics into that. And you're going to be fine. All right.

[01:09:10] Thank you. Thanks very much. Monica, go for it.

[01:09:13] I just wanted to add Tom's point. It's really good that you have a path that you're going forward in relation to that you are interested in the financial field. So with all of the advice that everyone already said I would add to focus on something in the financial field, when you're working on those projects and learning something that way, you do build that background that's relatable to what you want to get into.

[01:09:40] Thanks, Monica. Appreciate it. And thanks, everybody. Was that helpful, tighter and less in your name right now, do you say I should actually go by Didi?

[01:09:50] Didi I call so that hopefully that that helped you out there, man. So thanks very much. Awesome. So kind of question in the chat here for me then. But go ahead, evangelist's where you yourself and and first let me know how that how that wine is sipping on that.

[01:10:09] It's it's it's great and high from London. It's quite late here. So I was just signing off and it's great to be here. So I like a practical question, as Eric pointed out. So I've recently come across these interesting sort of stakeholder who seems to not be that interesting, what kind of Data they are or how you know, how well these Data understood before making sort of steps towards getting something either on the dashboard or a model. I think we're well, far from doing any any machine learning yet. But essentially, we understand from a business point of view, you want to prototype different views, different rocket Frogtown, test different things. How do you how do you balance that or what stories or examples you had that where you have to take the boxes that produce a dashboard?

[01:11:02] I've done some analysis. I know that 50 percent of my data is right because I've got like an A and A or B sort of things. And then I've got like three categories. And then I need to go back, extract some more Data label train and do all these things again. But at least I have a dashboard with some nice bars and graphs that we can iterate through. It's always a fine balance. I know there's not a right down to it.

[01:11:26] I being just to hear some stories and experience around that, I just give them the first simplest model I can get and they're going to like do this model shitty. And I'm like, well if you give me more time I can make you a better one and here's how. And then I'll progress. And if they're pressuring me for time, then they're incrementally going to be getting better and better. That's the way I would approach it. When I'm when people put a time box, I'm in my work. I just give them what what I can in that period of time and then say, this one I've got so far, it looks like it's doing better or worse than random chance and it's better than random chance. That means we've got something here that we can work with. If you give me some more time to uncover more of the relationships and the Data maybe play around to see what other useful information I can extract, I can definitely build something more more accurate, like a better term, and we can iterate towards that. That's how I would approach it. How about how about you folks out there? We'll start with Dave.

[01:12:26] Oh, impatient business partners. So typically what I do is I try to I try to I try to map the techniques that I know, the types of analysis that I can do, the things that I know how to do to the situation at hand. So, for example, if I know that Data is a necessary clean or it's not complete or I just don't have time to do a full blown model and go through the entire process of future engineering and all that stuff, I won't use machinery. I will use another technique. So for example, one of the easiest ones that I use quite a bit, actually is a form of statistical process control called process behavior chart. And that has very, very minimal Data assumptions can work on a relatively few data points over a time series and can produce some relatively interesting estimates that you can then use to communicate with the business. Where am I use something like market basket analysis or maybe even some just process mining, some very simple techniques. But typically what I do is I try to avoid using machine learning as much as possible, mainly because most of the time, in my experience, this is people aren't really ready for the time and the expense for a production solution with that. So I tend to focus on other things.

[01:13:43] First, I just want to add to Dave's excellent point. If you do the kind of things that Dave encourages in his free courses and data visualization, those are the things we kind of need to do on the pathway to coming up with good models anyway. So why not show those parts of the pipeline before you get predictive analytics?

[01:14:04] And so you're able to move beyond that? Great. But I think evangelist's it's also an opportunity to do what Harpreet was saying, you know, do as much as you can show them shitty stuff. Sorry to my grandson along the way, but then just say, I've got to remind you all garbage in, garbage out. Do you want me to work with garbage Data? And, you know, sometimes just cleaning the Data will give you phenomenal results to cook.

[01:14:33] So I'd like to add something real quick. So from me on the business side, who likes to work with seems like you guys, you will always hear business folks in a hurry because they are the ones taking risk. The issue is they are in a hurry. They have to make a decision quickly, but they can't put their finger on or on how to quantify that risk, which is why they like to come to you and put the pressure on. So sometimes when they're looking. Or is I cannot quantify that risk, which is why I like to work with data scientists, which is because you guys have the ability to explore the possibilities in use and employed a risk based approach to these possibilities to tell me if you choose X, this is what might happen. If you choose Y is what might happen. You give me garbage data. This is what will happen based on what I know. Take the decision. So sometimes that's what we need in a hurry to make a split decision. So that's what I want to say. I wanted to add to this Mikiko.

[01:15:34] Yeah. And to add on to Greg's point, he always brings in like such really like fascinating insight about working with favorite teams because it ultimately it's still a relationship between Data and business. Right. I mean, and so what I've done in the past. Well, OK. So to continue, add on that. Right. So one thing that has been this sort of interesting tension. Right. A lot of the teams I worked in is how do we sort of balance like the long term needs versus the short term, sort of like mission-critical, like firefights. Right. Because a lot of times when my business partner was in a pinch, it was because their leadership was sort of pressuring down on them a little bit. So one thing that does help, first off, is actually understanding like what is the thing that like what is the critical thing that they need? What is the critical answer? Is it directional or is it super precise? A lot of times if it's directional, my business partners have been willing to I don't say compromise on quality, but they've been like, look, we'll like, you know, we will sort of copy out everything we give to our leadership. We'll let them know what the assumptions are. And sometimes, like, I felt this pressure to, like, give them a really good answer to be really like outcome oriented. But sometimes that's not what they need. Sometimes they need something that's a little bit dirtier and rougher, like just to directionally understand, like, OK, like do we or do we not pull back our suppliers in this area? Right. So that's one thing to consider is really talking with them and understanding what is the critical thing that they absolutely need in the moment. And a lot of times when you have that conversation with them, it'll kind of bring up other things like, OK, we know this is something that'll be important two quarters down, but it's not something we need to deliver on right now.

[01:17:22] So why don't we take these these parts of the requirements, put them in the backlog, and then when we have, like, that quarterly, quarterly or monthly planning talk and when you go through the backlog, let's bring that up at this point. But for this thing, let's just kind of deliver what they need. I think it's always good to not go model first because a lot of times you also like the business partners. If you could talk with them, let them understand what's the cost in the trade offs? All of them would probably go, yeah, we don't actually want to go model first. What they want is they're like, we want to salute outcomes which don't have to necessarily be positive, but we want outcomes which we can understand and we can communicate to our leadership who actually might even be less data driven. A lot of times, like their leadership is, can also be like less data driven on the business partner side. Your business partners working with you is probably the most like immediate person. And so they can also serve as your advocate advocate using a lot of companies, even though Data still important, the business partner side is still the one that's kind of like driving the revenue, you know. And so it's it's one of those things where it's like you don't want to tell them no, because they also have like outcomes. They're sort of, you know, accountable to what you want to find, the thing that will they need, even if it's all the thing that they want this year.

[01:18:41] From Monica and Brandon on this point, Brandon is here already. And so go for it then after Brandon for Monica.

[01:18:46] Ok, yeah. I just want to add on Michael's point about cost right way. The way that I always work with my business partner is I give them the menu and I say, well, here's option A, they know the benefits. Right? But this is my estimate of the benefit. But I know the cost. This is the cost. Here's option B, maybe it's better for longer term and these costs and your options. And I go and I say, you can make the decision on which one you want, but I can provide all three of these. I just have to be upfront about what things cost and then they can make the decision from there. Well, the other thing I'll say is if you're trying to convince anybody prototypes go a long way. Earlier I said that products are cut because they're prototypes, but prototypes are better than your slide or whatever. I'm saying. I think this and I think that they very much.

[01:19:28] Brandon Monaca.

[01:19:30] Yeah. I mean, I completely agree with the Ticos points very spot on. You just want to really understand from the business partners what exactly they need to be able to provide them with that and also to add on to be very transparent in what you can provide to them. So if the Data is really crappy, you need to let them know that there's limitations that you can provide to them and maybe that would help them on their side communicate that further up. To say when we meet, we may need to pivot and focus more on getting the Data at a place of quality, that we can then use it to make our decisions.

[01:20:11] Evangelize to the your question? Absolutely. I think everyone feels great right now. That's excellent question. Shout out restlessly active in the chat. Thank you so much for your contributions in the chat. Let's open it up for one last question before we call it quits. There's a lot of people that my camera of bias ignored. So, Greg, deeper on Melania. If any of you guys have questions, whoever needs themselves first gets the floor.

[01:20:39] I actually have a question. Oh, yeah. Go for it, man. Go. So I guess my question is not an Data scientist. By year, I'm finally starting to catch my groove where I can actually work pretty, pretty independently. And now I'm having conversations. My manager preplanned to say like, hey, where do you want your career, this company to look like? And so I question I have is like, what's the next level for a Data scientist when you don't want to be a manager? And I want to move away from the titles because at a certain titles, kind of meaningless, but more so like what is the responsibility deliverables look like for someone who's trying to get to the next level as a data scientist?

[01:21:18] Interesting question. I know that there's companies like, for example, like Google, they have like two separate tracks. Right. They've got tracks for people who want to become managers of people. And they fill in those roles where you're now kind of orchestrating what is going to be happening. And then there's people who will stay on kind of the more technical route and they'll still be climbing up and taking on more responsibility, but more of like the technical aspect of stuff. Personally, I don't have much experience in what you're talking about here, but I'll open it up to see maybe if if Brandon or Dave or Tom or Mikiko or anybody has insights. Love to hear it.

[01:21:55] Yeah. So I can talk a little bit about it. So when I was at Microsoft, I was a principal level I see for a number of years. And then I was also a manager at the director and senior director level. So a lot of it's going to depend, quite frankly, on the nature of the company that you work with. For example, Big Tech will have a common title that you'll see in the big tech companies, a distinguished engineer, which is an icy person that is like awesome to with extra awesome sauce on the side. So there's definitely ways to do that. And typically what happens there is it's all about two things. One, it's about technical virtuosity. You just have to be like unequivocally, objectively awesome at Data science. If that's where you want to go, you don't have to have to be a researcher, persay, but you have to be like really, really awesome with the technical chops. And then it becomes also about communication, because to justify that level of salary as an icy, you have to be a force multiplier in some way, shape or form, whether that's in the marketing side, like maybe like Ben Taylor, where you're kind of doing evangelism and you're a force multiplier that way. Or if you're an internal team, then you are mentoring other senior engineers, for example. Like if I'm a senior engineer who mentors me, well, a distinguished engineer can do that sort of thing. So that's really if you're not interested in going into management, it's really around this idea of like, how can I be a force multiplier? That's really what you want to think about as an icy and technical virtuosity is going to be part of that and communication is going to be part of that.

[01:23:29] So whenever we see, like the dropouts as principal, that's kind of what that means. Then they're there on that manager level, but they're not actually directly in charge of like people operations there. They're more in charge of helping grow and develop more junior talent in a technical capacity. I guess that's what that means. And the other and the other comments here from Brandon or Mikiko or Tom or Monica, when I have these conversations with my reports, I, I always I let them know what I observe.

[01:24:01] So I might have somebody who I've observed likes doing a lot of the research and doesn't like any of the the implementation or the product position, but really good research. So for them, I think the research data scientist is what they're going to do. And then I've met people who came in saying, yeah, I want to be a data scientist, but when I watch them work, I just see they're really more interested in engineering. And then after a few years that I and I said you're I think you're more of a Data science engineer or maybe even a software engineer. And, you know, I kind of see if that is something that resonates with them. And if it is, then I'll lead them down that path. So I think it would start with interest. And you can see if you don't see it, then somebody else, maybe your manager can see it could help you point that out to you as well. Right.

[01:24:45] Just add on that that to that answer your question there.

[01:24:49] That very insightful. Thank you.

[01:24:51] One awesome. Well, we'll have to wrap it up, guys. Thank you so much for hanging out with anybody. Did not get their questions answered. I apologize. We have had plenty of. It is to just get yourself in the queue, so it's really on you. We will be back next week for the last hour of the year. December 18th for 30 p.m Central Time. We'll be right back here. The last interview of the year was released just earlier this week with Donald Robertson. Check that interview out. That was probably one of my favorite ones of the year. Monday, I've got an episode released of a special year end recap and a special treat for you guys. So definitely tune in to that and look for Camille.

[01:25:34] Data goes to see Camille did a nice post about today's days stars. It's near the bottom of the chart. Might want to go support her there.

[01:25:42] Oh, definitely, Camille. Do you have a question? You won't be able to make it next week. Do you have a question that we should we can help you with?

[01:25:52] Yeah. So this is my first time attending. This is really great. And I was struggling with some stuff and ah, this week. So I have an R question if you guys want to tackle that. But we're also at the end. I'm happy to wait for another week.

[01:26:05] David Katz is the guy you can salmiya are we connected. Oh you should say yes I you can actually request I have a few left so and I can help you out with the New York questions, OK. That's my that's my penance for what I said earlier actually.

[01:26:22] There you go, guys. Well, thank you again, everybody for joining and look forward to seeing you next week. Until then, take care and tell them to remember, guys, you got one life on this planet. Why not try to do something big piece out?