office-hour-30OCT2020.mp3

[00:00:00] What's up, everybody? How's it going, man? Welcome to @theartistsofdatascience open office hours. Hope you guys are doing awesome today, man. We had an awesome week on the podcast, had a couple of really cool episodes that released had one release on Monday with Mac saying of Human Prosperity. That was a cool episode packed with a bunch of tips on how to be more productive, how to be more happier. So definitely go check out that episode. And on Thursday I released an episode with Shonda. Super, super entertaining episode is pretty much a crazy mad scientists from Britain, but he's actually doing really great work in the realm of, you know, sustainability and things like that. So definitely check out that episode chock full of a bunch of insights from science and like science plus statistics and a bunch of British humor, which is always fun. On Monday, I got an episode releasing with Annie Duke, who wrote Thinking in Bats and How to Decide and super excited about that episode, thinking that literally changed my life. And actually that was one of the books that within the Vicious that had actually recommended.

[00:01:14] What separatists? We all know that cloud computing has changed the way we live, do business and stay connected with everyone using the same cloud platforms, winning and losing comes down to having the talent to build products better and faster. So whether you're an aspiring data scientist looking to build your skills or a seasoned veteran looking to level up developing tech skills and being comfortable working in cloud environments has never been more important than it is right now. Cloud Academy has thousands of video courses, learning pads and practical hands on labs in real world cloud environments designed to help you build critical cloud skills. They cover everything from major certifications to dev ops to security to programing languages. Cloud Academy is the cloud training platform of choice for Fortune 500 companies and thousands of tech professionals around the world. Don't just take my word for it. Check out the reviews on GTA and get started now at Cloud Academy Dotcom, members of the artist's loft can lock in 50 percent of the monthly price for life. Just put in the coupon code artist when checking out. It's a great way to pursue certifications or just build your cloud expertize again, go to Cloud Academy dotcom and use the coupon code artist to lock in 50 percent of the monthly price and then is here today at office hours hanging out and been super excited to have you here today.

[00:03:01] We've got to start doing good, man. How are you to get to get to chat with you again? It's been a while and I think last time we talked was right before code got insane. Like, I remember I was still in the office during that time, so I must have been like early March.

[00:03:18] It was it was March or April or something, but things were just kind of getting crazy.

[00:03:23] Yeah, it's been really, really insane here in Manitoba where I live, they just announced that we were going back into lockdown on Monday. And just because the situation here has been insane, I guess it's it's been spreading like wildfire. We've got the highest incident rate per capita now, which is which is insane. But yeah. Man, welcome to have started. Super excited to have you here. We usually have a bunch of students pop in and just help them out, answer questions right now.

[00:03:54] See, we got Tanisha in the chat as well. Tanisha, how's it going, man? Not only do you get to chat with me, but we got vivaciousness. He got two people to help you with, whatever it is that you need some assistance with. How are you doing tonight?

[00:04:10] I'm good. How are you guys? I'm so happy to get to talk to you people.

[00:04:18] I'm excited to to chat with you as well. How are you doing?

[00:04:22] What word? What? Yeah, go ahead, Dan.

[00:04:25] I just wonder what we can help you with.

[00:04:27] Yeah, I just had a couple of questions regarding the Data Science Dream Job Corps, as I told you before. And I just wanted to just ask you about that. And lately a few questions about credit wise and such.

[00:04:44] Yeah, definitely. I'm happy to answer any questions you have, so go for it. Yeah. So did some seem have I guess in a nutshell, it's just a mentorship platform, probably the most awesome mentorship platform because well, I'm a mentor for that platform, but it is geared on helping you get a job in Data science. So it's definitely not a boot camp. It's not a university by any means. It's a mentorship platform where you can speak to mentors such as myself or some of my colleagues. We've got a whole library of videos for helping you navigate the job search process, as well as roadmaps on the skills that you need to be successful in Data science. And a whole slew of technical videos got a really active channel as well. And I host officers just like this multiple times a week and my colleagues host hours as well throughout the week. So there's like a total of I mean, there's seven days in a week. I think we have eight office hours, which is pretty, pretty impressive. I would say there's no other platform I like it that I've seen or come across, but yeah, I'm happy. Any questions you got?

[00:05:55] Yeah. So that's the main point, which is why I was concerned. Of course, I don't mind platform as you mentioned so well simply because I, I'm now Ivanek already worn out, stayed in my job search like.

[00:06:12] Ok, as you slide into the limit on myself or my actually from India had a and then I moved to try to do my undergrad in engineering and then I came to New York to do my master's and my assignment. And this is basically this is, as I said, like a major or like of concentration in Data. And the tech does that. I kind of want to, like, move my career into like a nayda and start a business and conditions or something in between those. I got rid of this name and I've been trying to search for a job, but so far I just can't or won't do the internship, which I'm currently doing. But in terms of job, I feel like I need more from guidance or more of something from someone who has more experience working there. So that's exactly why I'm considering this course. Major, take away from me from this course walking like the like network which I've read about are actually just spoke about and the mentorship and the one on one that is I you have like a one on one for a week.

[00:07:31] And so they're not one on ones. There are many to ones. So there will be group officers, will be numerous students in on any officer session and they're all asking questions. And there it's always good to be in an environment where there are multiple people asking questions because people might have questions that you didn't even think of that are helpful to you. So you have multiple ones of those throughout the week.

[00:07:55] Right. But I just want to ask, is it also like one on one mentorship that the course offers or not?

[00:08:03] There's nothing really that's that unique about your particular job. Let me look. I've mentored two thousand six hundred plus students. There's nothing unique about your job search process. I guarantee you. I know that you've probably got a unique set of skills. You're unique person, you're an individual person, but you're facing many of the same challenges that everybody else breaking into the field is facing. So having to do one on one mentoring, cause it's just it's just not the optimal use of time for mentors because many people at once. So it'll be many to one just like this. Like I'm convinced just that joining today on the office hours. But he's like not a mentor, a part of the platform, so you wouldn't have all sides with him. So you ask him questions today, which you have an opportunity here. But yeah, it's many to one. And, you know, it's usually about seven to ten people per office hours. So they last over an hour. So you'll get ample opportunity to get one on one assistance in that respect. But you'll also be able to hear other people's questions and hear the responses that people get. And you can learn much more that way than if it was just one on one.

[00:09:15] Yeah, I mean, I understand that.

[00:09:17] But like, from my own perspective, how the question was for me was like I wanted to know how or a few days to our of like, how does it work in a group mentoring have something that the same thing.

[00:09:35] You pull up your resume and I'd look at it and, you know, it'd just be like this, like in a group setting. So you'd share your screen and I'd look at your resume and I provided feedback. And obviously everybody else would be looking at your resume as well, which is completely OK, because then somebody else might have another perspective or somebody else might have some insight to share that would not be in such that I would share and it would benefit you. So you get that added benefit when there's many heads put together.

[00:10:06] Yeah, yeah. That's one of the questions I had about mentorship, because that's the main thing that I was looking for someone to, like, guide me with writing me or my LinkedIn or so we have we have full on modules for that as well.

[00:10:22] We've got modules dedicated to LinkedIn like how to properly do your LinkedIn profile, as well as modules for resumes. We've got resume templates and everything as well.

[00:10:37] I wanted to ask the you guys personally, like right now I know you're mentoring current students who are taking the course and also might have like gone through the course and have also like got their jobs. Something did change or did something change with the pandemic or how the job market is right now?

[00:10:56] That's a great question, actually. Vinda, want to chime in on that. What's your perspective on the job market due to covid and Data science?

[00:11:04] Know, it's interesting. There's some some parts of data science haven't suffered at all. But if you look at some of the more advanced. Sets things in the deep learning space, more in each towards research that's suffered. That's a lot of companies are scaling back. And what's interesting is they're splitting positions. So instead of having one very, very advanced Data science or machine learning researcher, they're splitting it into a couple of mid-level positions. So they're able to get they're still hiring. And the budget in most cases is about the same. So if you're looking for a mid-level or even an intra junior level position, it in some cases made it easier to get into the field. And remote working is another piece of that whole lot more opportunities with different companies because they're hiring for remote work. And even though there's a lot of people say six months or until it's over.

[00:12:01] But I hate to say it. I mean, we're looking at probably next year some time, but over I'm not going anywhere.

[00:12:09] But in terms of so so you mentioned that a lot of advanced research positions are kind of being slashed, I guess, for lack of a better term.

[00:12:21] Why do you think that is? Is that because funds are not being allocated in that direction anymore? Or is it because people are just kind of shifting their organizational focus from less exploratory type of work to more? How do we survive today?

[00:12:35] Yes, the projects that are getting pushed back. So essentially the budget stays the same. But you have a reevaluation of what projects that you're going to end up working on.

[00:12:44] And so as a business, a lot of the more forward looking projects where they started, where they would start infrastructure or they would start hiring for projects that they're going to do in a year or complete 18 months to two years, those projects are getting pushed out a little farther on the product roadmap.

[00:13:01] So the hiring isn't necessary with the budgets already there. It's already been allocated. A lot of these companies are losing money. And so in those companies, instead of really just saying this position's been eliminated or reducing hiring or doing a hiring freeze, what they'll do is they'll split the position because in a lot of cases, getting a really advanced deep learning expert is very expensive. And so if you're pushing those projects out eight quarters, six quarters and quarters into the future, it won't make sense now as to be able to increase your bandwidth for near-term projects. That's filling the gaps and that's being pulled in. That's more practical. That's shorter term revenue generating. And so those are the types of jobs that are opening up instead. Like I, the budget really is being reduced by that much in some cases will drop spending on infrastructure. That's one of the big pieces of those longer term projects that won't be buying as much software. They won't be investing as much in unprime or hybrid cloud. So there's some pieces that are being pulled back. But as far as staffing, everybody's looking at this as an opportunity to start coaching.

[00:14:09] Hmm, that's interesting. So that's a question that I hear people ask all the time, especially with mentors, mentees and days. And I screamed was like, oh, well, I heard Data science is going out of style, going out of fashion, like nobody needs data scientist anymore. What's your take on that? Because I feel like, if anything, this whole COGAT situation probably has increased the amount of data being generated in the world because everything's happening virtually now. So I feel like that part's not slowing down or we're not generating any less data, if anything, that probably has increased since the start of covid. What's your take on that?

[00:14:52] One of the big pieces of covid is that with a whole bunch of us stuck inside while doing research because our Workday's used to be two hours of commute, two and a half hours of commute sometimes, and we don't have any room. So there are some of us who have a little bit more time. We used to a lot of research. It's coming out now. That's the biggest impact from coal. That is that we're making just a little bit more progress, a little bit faster than we were before Kubernetes. So that's going to increase sort of the acceleration of skills required to get into the field. So do science is obsolete and that sort of way, when you look at what was the data scientist four years ago, what did you need to know four or five years ago? A lot of those types of data scientists just kind of clinging on and not learning, not increasing their skill set, not improving or moving into more advanced, more specialized areas. That's been a big impact. Now you're looking at companies who are starting to cycle through talent, looking for low performers, looking for people who have the data science job title, but haven't produced much, haven't been able to put something into production, haven't been a team contributor. A lot of teams are really reevaluating that piece of each individual look. How can I reallocate budget, could I reduce headcount and increase another part of the team? We are not a lot to this whole concept of cycling out and refresh and try to remember some of the buzzwords that means and cycling out, refresh that sort of thing. But what they're saying is that they're low performers. There are people who haven't advanced their skill set. So you're right, there's a piece of the seal which is sort of obsolete and has been for two or three years, not so much. The the way that the field does, the more basic projects is obsolete. But some of the talent, some of the thinking, those are really pieces that are obsolete. And we have to do this in a more rigorous way.

[00:16:55] So there's pieces that are also some a couple of things I want to touch on is very, very insightful and definitely dig it a little bit more. But first, I want to welcome Nicholas into the chat. Nicholas, thank you for for coming by again. Good to see you again. We've got the pleasure of having the there on today's officers. One of the top voices for Data science LinkedIn topless in Data science last year, super privileged to have him here. So if you got any questions, definitely feel free to butt in at any time. The same goes for you, Tunisia as well. If you have questions, feel free to jump in at any time. But going back to what you're saying earlier about a particular type of data scientists now, when you say a particular type of data scientist is that particular type in the sense of the skills that they have or particular type in terms of their mindset, personality trait, like, what did you mean by that?

[00:17:50] It's a little bit of both. There is a software developer, heavy data scientist, who's not so much involved in the actual model, design model, selection model, validation piece of it as they are building the models or optimizing. So there could have the science light model and algorithm knowledge, but like they do a lot of input from those. That's sort of an obsolete skill set. You're also seeing the analytics side of the field where we've over time and a lot of people who are analysts, we've sort of pushed them into service because data science and analytics weren't that different three years ago in most businesses, but really not a whole lot of Delta between the two. So they got pushed into the Data science role with an intent that they were going to grow. They're going to learn into sort of training to the world that those are the two engines that I see. There are more software development focus, more technical focus and more analytics focused, more Data focus rather than we kind of Data wrangling side of it really just really having Data wrangling, Data analysis, exploratory data, but not very strong on the modeling side of it, not very strong on the implementation deployment so far.

[00:19:21] Understand that track for there's the types of data scientists who may have gotten that title but have not really increased their skill set, haven't learned or developed in their journey, kind of resting on their laurels, so to speak. And then there's the type that all they have are these hard technical skills and can't necessarily navigate what it's you know, what it takes in terms of soft skills. I guess it might take at that. Right.

[00:19:55] They don't really understand the math behind the algorithms when you import from makes it easy, given there's a lot of great applications for imports from technician since quick, you can get a lot of value out of two weeks of development time if you're doing a lot of import from getting to customize a lot of the libraries of tensorflow so much control over customization. And that's the piece that a lot of the employees and the scientists that came from the technical side of the world just didn't didn't reskill the can sometimes make it so easy that you're reading on it too much. And so you're importing from and just doing generic train tests. There's no concept of validation post plus model development. There's no real concept of going back and understanding the Data a little bit better to understand where your jobs are gathered well enough. Did you really put in the work to make sure that your data represents what you think it does was pulled from the data source that you thought it was? Is it are you really modeling the data or are you actually building a model that represents a system under measurement? There's a lot of complexity there that.

[00:21:09] Someone somebody is very, very technical, sometimes doesn't see the value, and I say that's very, very insightful to to digest there as well. Actually, I was making a similar point last week during office hours to the effect that yet you don't necessarily need to know all the math to get your first job in Data science, but you had better learn and pick up the intuition behind the import statement. Otherwise, you are going to find yourself in a position where you just fucking shit up and not know. That's why it's fucking up and you're going to get fired.

[00:21:50] You know, it's funny. It's got to be monitoring and production notes and stuff.

[00:21:56] Yeah, yeah. I had a model so I'm like, you know, the first day scientists in an organization are literally the first one they hired, the only one I was working on on a model that got to put into production. And it took me, I think from the moment I first started looking at raw data to having this thing deployed in production with the help of a solutions architect and another software engineer took about six, seven months for that to happen, which I like to think was kind of fast was it's just a one person team doing everything except the actual integration into larger software. But just recently started to collect data for for model evaluation, model monitoring and stuff like that. And that's something that doesn't get talked too much anywhere. And the amount of literature that's out there, there's there's not much. And you kind of have to piece things together. And it's at this stage where it was, OK, well, there might not be books out there, but I know a shit ton of statistics and I know a shit ton of math because a graduate student in it let me fall back on. I learned in school to help me develop some way to adequately assess whether or not this this model is performing the way I intended it to.

[00:23:17] And I guess that's where it becomes really important that it is OK to look at I and say, I apologize.

[00:23:25] It's not a problem. That's actually exactly what I was going to ask about today, too. So I did a project, I took my personal Google Data, I downloaded it and then I did some topic modeling on it.

[00:23:39] And the project allows me to put in a start date and the start date, and it shows me the different topics of my search queries back between whatever date and time I put in. And it works fine on my own Data. And if I want to deploy that, I get how for a simple classifier you just save it. So using Python, you, you pickle it and then you deploy that, you use a Flash or Django or a web development framework and load the model into that and then predict on new data. But for unsupervised learning, you can't I don't know the way of exporting the model is that can you do the same process.

[00:24:23] So you're saying with an unsupervised model you want to put in production, like, what are you trying to do, try to cluster new observations into a new yes looking group?

[00:24:34] So the end goal is someone can upload their Google data and put it in start date and start because like I did my own and it would return them. Their topics are and I don't know why would I get I got the feeling that it's different than just saving and loading a linear regression model, for example.

[00:24:55] That's a great question. I don't have a answer to that. I've taught my head. Then what do you.

[00:24:59] That's it depends. How are you building the model? I mean, I know using Python so legibility right now, is it a supervised learning model that you're using just regression or using someone else?

[00:25:11] No, no. It was what I used. I used and MF and LDA and performed better, but they're both like NLP and I searched around the Internet for different ways of doing this. And there's not much there's tons of our Belbek classifiers and models. And then when it comes to deploying, there is less and even less for unsupervised.

[00:25:38] There really is. So like I said, you built it in Python. What format is it built model and what do you mean? So, OK, you do train tests and you have a there's a model output. What did you put it in? Said Pikul, if you use Tensorflow you can actually serve it out directly. These tensorflow serving to to serve it as a service in the restaurant service. So what did you get out once you were done? Because where are you at once? You finish the training and testing.

[00:26:10] I mean, they're trying to open up the river while you're putting that that man was one of welcome Curtis into the chat courtesies.

[00:26:20] He's joining in from from the U.K.. Curtis, menhaden.

[00:26:27] I'm good, man. Nice to meet everybody. Hello.

[00:26:30] Hello and good to see you again. Criticized him and are going to be chatting on Sunday. You can be on the podcast, an episode. Curtis has some amazing writing. Check out his work on towards Data Science and on Medium, I believe. Real nice. Yeah.

[00:26:47] Got some amazing articles that he's actually been ranked from the top contributors in A.I. and I mean, I wouldn't take it that far, but if you want to play that on, the medium has his own like ranking system where they give people like Buck little achievements for good reach and so shows. And so so in the artificial intelligence category, I'm ranked as one of the top scientists so far. I mean, not globally, just on media.

[00:27:22] That's still a huge man. Yeah, yeah, yeah.

[00:27:27] I'm looking at it right now and I used the IMF library in so I could learn. And what I was wondering is because it's a model I created on my Data specifically, I can't just save the model and process new data. And I can.

[00:27:45] I yeah. I mean, if it's a model that is trained using Data, it's not going to really generalize to me or live in or to Russia.

[00:27:56] So I guess I guess in general when you to clustering, how do you deploy a clustering algorithm?

[00:28:04] I've never deployed a clustering like an output from clustering. I've used the output from clustering as an input to another like as a feature in itself. So I've never like yeah. I've never I'm sure people do it, but I've never like deployed a clustering model.

[00:28:22] I don't think that many people do it because there wasn't tons of information online. Yeah. You can basically do a lot of a lot of times clustering models or classification.

[00:28:33] You're doing classification, right?

[00:28:35] Oh no. This wouldn't be classification. It would be more. I don't even it's topic modeling. Mm.

[00:28:44] Yeah. So I get where you're going. What you're actually doing is analytics so. Yeah. And that's what I was going to ask you what the format of the output was like. I had a feeling you were doing analytics but I didn't want to, I mean I knew about two minutes of its, I didn't want to make an assumption that what you're doing, the output of what you built is analytics. And so typically you would see this into a dashboard versus actually deploying this because you don't really have a model to deploy. It's nothing that it would really serve as far as inference. I mean, I guess you could if you extended it, you you could, but it's not really good to generalize that well, but you could improve it so that it did. And you're on a track with good points.

[00:29:30] Yeah. Yeah. So kind of like just one thing that's come at the top of my head. Right. So you can cluster your observations. Right. Using your clustering algorithm. Right. And then any new point that comes in, are you trying to say this new point? Which cluster do you belong to? Is that kind of the gist of what you're trying to get to?

[00:29:51] I could do that, but I didn't find that very interesting projects. I didn't do that.

[00:29:55] Yeah. Yeah, because then I that's like would that be considered semi super right now? Probably wouldn't. But I mean, because clusters aren't really ground truth. Right. So you have some clusters with your quote unquote training data and then anything that comes in, you're trying to classify that to belong to one of these different clusters.

[00:30:14] I suppose then at that point, probably something as simple as just like, OK, and then maybe but yeah, it just depends on what it is that you're using for extra C where you go with the topic modeling.

[00:30:27] I just say right now what you've done needs a little bit more data set into it and you're trying to figure out what topics based on your search history you said or just Google like Lifetime Google search. OK, yeah. If you're doing a Google search, Data, it's almost like you're you're doing a little logging search, right. Because you're just looking at what what you put into the search warrant or you're looking at the results to or I mean, I have the results too.

[00:30:56] But for this, I just I just did it based on like a clean query.

[00:31:01] Ok, and what were your like if you could name two example topics, what would they be?

[00:31:07] All of me. One of them was. Data science, one of them was fitness related and that the algorithm would output the top five or how many words specified that were the most common or characteristic of it, like the weight of each. All right.

[00:31:24] So you're trying to say that if somebody gave you the search history, you could say you're the the top categories of the terms you search for, like fitness and technology or something like that?

[00:31:35] Well, the way it works is I would put in I would improve my Data. I would say feed me all, show me my search trends over twenty eleven, all twenty eleven. And then it would show me the top X number of categories and X number of words in each category, and it would be based on my search history.

[00:31:56] So I could just put in different dates and see what I was interested in at that time. Yeah. Sounds like analytics type of stuff.

[00:32:02] Yeah.

[00:32:02] I don't think think it's mainly accurate analytics, but I mean it's not that's not the end of the world for that particular project. A lot of times that's that's, it's exploratory right there. So you're looking at your Data. You're going to understand what you got from your data set, what you might see in new data sets. This is kind of how it starts. That's not what I say. It's just analytics. But it really means it's like you're in the first phase of building a mall. You start to look at Data. You're starting to make some assumptions about what you might see in other data sets that you gathered more. You starting to think about how you might want to gather some more data and try to get some more people interested in giving you some data that you could then turn into a larger project. So you have the beginnings of a project here. So like I said when I said it, just analytics and research analytics, it's not that simple. You're doing something that's the very beginning of a larger project. So, I mean, definitely keep going. You're probably onto something pretty interesting.

[00:33:03] Yeah, got it. Thanks. I'm curious. Useful feedback.

[00:33:06] What's I guess what's the main question you're trying to answer with your with your project. Like what's the ultimate objective?

[00:33:16] Well, first I was just curious what data I could get out of my Google history, given that the California Data regulations that allow us access to whatever a lot of what they have on us.

[00:33:28] And so then I looked at it and they give you a nice big HTML file. It's not formatted well. So in order to explore how to convert that into CSV, and then I realized that there's actually a lot there and it's a similar format for all the for a lot of other products like YouTube also. And it shows not only what you search with the websites you visited in the exact location of the search and in GPS coordinates and of just playing around seeing what I could get out of that and realized that I can see my searches from today. What was I searching for a month ago? I can go back and look at all the searches, but that's a pain. Not easy to understand. So if I like, pocket it and the things I see.

[00:34:12] So you're trying to, like, get a sense of looking at where I have my interest been going, you know, month to month across time and whatnot.

[00:34:19] Yeah. Or even years back, say, well, I answered in 2011. 2012, I got it.

[00:34:23] That sounds pretty cool. This is about. Yeah, DNA can be input here.

[00:34:34] I'm still trying to get it done, but I see. Yeah, I'm, I'm just listening. This is cool.

[00:34:42] Yeah. It doesn't sound like I like the idea that product manager does sound cool. Like have my interests changed over time.

[00:34:50] Yeah. And I also rework my resume a bit and put that in the star format.

[00:34:54] That was the that was the S. Yeah. I'm going to pull it up and take a look at it and are now OK and everything else. Goman pretty good.

[00:35:06] Pretty good. I was listening to your podcast on predicting churn because I had actually done a term prediction project.

[00:35:13] Oh nice man. Hopefully I'm working for. Yeah, hopefully you enjoyed that. I interviewed Karl Gould, who's the chief data scientist at Zorah and that was that's cool. He wrote the book Fighting Turned or. Yeah. Fighting Chern with Data and his book is Super Super Gaited, Super Comprehensive. And he just really stresses the importance of feature engineering. Yeah. And I think that Hands Down is like one of the things that pays off the most. When you are building a model then what's here, what's your take on that on term modeling or feature engineering. Feature engineering.

[00:35:51] And in terms of the payoff you get for the for the effort you put in thing about feature engineering, what scares me is you're introducing your own bias into it, that the good side of future engineering is that you have a whole lot of insight into how your model actually works, which is great. You have to explain it a whole lot better if you've got that level of control over it. Unfortunately, you also introduced your own bias into it. If you're not, obviously you you have a very large group of people who do this correctly, a very large group of people who don't. And it worries me any time you put that much of your own thought into it without actually validating, without going in and doing, because I'm kind of thinking in the causal way, because Jadavpur has been active all of a sudden on Twitter and pointing out some good papers. And there's there's a lot of work that we've missed over the last five years. And so you start talking about feature engineering. I should bring it up. A totally different topic in my head. But yes, it talk to me some of the ways I've seen it done badly. I'd love to I'd love to hear about some of those things. I've seen actual hands on features like people looking at Data, analyzing it and building that as a feature and not doing any sort of validation at all. Just saying I saw the relationship, therefore it's there. And that's and that is not not the scariest thing that I've seen, because in a lot of cases, you can introduce that bias into feature engineering with using some of the more traditional approaches, using some of the more some of the approach that look rigorous on the outside. But really, what's happening under the surface is you're picking features and you're saying my model will have this this particular data point of this particular weight and just hand spinning rather than actually doing feature engineering so that you can get confused so easily.

[00:37:56] Yeah, so so that's when I think the feature. I think of it as a way to build out complexity that you can't necessarily get just from the raw data.

[00:38:06] Right. So for example, if we have transactional data that's on the granularity of one row per customer per transaction over a number of transactions. And that's kind of what the raw data looks like. And if we're trying to get to a place where we're trying to to model P, I don't know whether or not let's just say a customer is going to make a purchase and in the next whatever time period. So what I think a feature engineer, I think, OK, how can I take this data, this historical data from this customer that's on this granularity of one row per customer per transaction date into something that's just one row per customer? And how can I aggregate and capture that complexity in one row vector? And some some things you could do is calculate the number of times they've purchased the average amount of purchase, the average time between purchases, the length of time they've been a customer, the length of time since their last purchase. So that's kind of where I go to when I think about feature engineering. What are your thoughts on that? Would that be would that be me inadvertently injecting my bias into it or would that be kind of the correct way to do it?

[00:39:25] Know, it's funny what you're describing. I would call sample like you're creating your cohorts. Are you creating your segments? And I would actually I mean, I don't I don't know if that's feature engineering in my head. And maybe I'm thinking something totally different. But yeah, that sounds like what I would call a sampling where you're doing exploratory data analysis to figure out what what cohort's or what groups or segments go to break your sample up into. Then you might want to use that for more of a sampling methodology where you go through and making sure that your data actually has representation for each one of those groupings and each one of those cohorts, because you're aggregating a whole bunch of data points to to understand this person. But at the same time that aggregation can apply, you're going to see that aggregation applies across multiple customers and there'll be similarities there between those customers. And I look at I see that as more of a segmentation sample methodology, feature engineering. I'm thinking more like the old school, like by data mining. You know, there's a lot of that leftover mentality where future engineering, where you're talking about it has a whole lot of useful applications. But what I was saying was that it's like a segmentation piece.

[00:40:45] But old school data mining has this concept of future engineering where you're almost building an expert system and sometimes you become the expert. And that's a really scary way of doing future engineering, is because you make that assumption that you have expertize in futures matter because you've done some analytics or like I said, some old school by looking at a report. You think you know what the features that correlate well with each other or that have some some sort of impact on the system under measurement or whatever influence that you're trying to serve up. And a lot of people make that mistake of I know it therefore is person's going to use the data to prove a particular piece that I see intuitively just by looking at it. So instead of using their prior knowledge and as a starting point, they use it as the end point. Then they go back and look at the data and prove it rather than proving it, starting out with that as a hypothesis and then going forward gathering new data and proving it and not remember her name. Somebody who wrote a really wonderful post, chief scientist at Google, she wrote a couple of weeks ago about Data Data charlatanism, Data over SDLC.

[00:42:04] Yeah, I read that.

[00:42:06] I read that post has a really good one that every some of your future engineering like that's what I think of is somebody who does the process in reverse.

[00:42:16] Yeah, it said that's interesting. I get here feature engineering. To me, it just is a way to build out complexity from the raw data into something that can be used by a algorithm to pick up some signal, like, for example, like having just height and weight by itself, probably not useful, but if you take a function of height and weight and express it as something called BMY, that might be more useful for a predictive model.

[00:42:42] And like I said, that's when you use your intuition. But then you go back and you say, OK, BMI, I believe BMI is correlated with something called whatever you want to help a heart attack, you can use whatever it is that you want to say that BMI as some relationship to this outcome here. And a lot of times people just stop right there. BMI is put into their model and they never come back to validate whether that aggregation didn't anything. And a lot of cases, BMI is actually a great example because BMI has limited usage. There's a lot of flaws to BMI and there's a lot of new measurements that are out where you're looking at more complexity as far as subcutaneous fat versus visceral fat versus your BMR, the amount of muscle that you have, because there's bodybuilder's, massive, absolutely huge eight percent, nine percent body fat and their eyes are off the chart.

[00:43:43] And there's some evidence that that's unhealthy. There's some evidence that that is healthy. And you bring up this wonderful thing that just by aggregating that one to that one data point, I guess, BMI, you have all of the you've you've kind of unraveled all of these new questions.

[00:44:02] But if you go deeper into validating BMI, you're going to realize that there's more data points at play. There's other things that you didn't know. Maybe we're in the Data or that aren't in your data that may be better to use than BMI. Or you may look at BMI and say this is accurate enough. And if you go back and do that validation, we're using more of an explanation between your aggregation, this new this new data point that you come up with for future engineering and an actual outcome. Is there any relationship between the two? Is there something else that might have a better relationship to that? It's a better way of aggregating those to into a better feature. That's where I think future engineering is as powerful as you're talking about, because you really are. You're pushing complexity into something smaller. But you're also opening up a larger question. And I think that a lot of times what ends up happening is you don't explore that question. That's where I get scared about future engineering, is that the models build the assumptions. Back then, you just assumed BMI was useful. You develop and so you build the model and that assumption baked in comes up the other end and no one's sure. And so instead of sometimes substitutes for the rigor of Data science, rather than opening up the complexity that you're talking about and opening up all these questions, we'd say, was that a good idea?

[00:45:26] Was there something better? Are there other numbers? I gather you write Data. That's why I think future engineering is not a double edged sword.

[00:45:34] So to address that point of of, you know, we just we actually assessing and making sure that this thing is very validating. It would. Would you say that is kind of the feature selection aspect of it, or is that going down the wrong path here with respect to that?

[00:45:56] You know, I think that I think that I'm going to say this very clearly.

[00:46:02] I think that this is not a not a widely held opinion. You have to do experiments with them. What comes out the other end of the modeling process is kind of scary because there is this sort of validation that was done. There's no experimental process that was part of the model validation. You you build a model based on on a data set, even if you do that 70 30 split where you trained, training, testing, that's not done. That model is the problem that I'm talking about. The future engineering this. That's where people stop, train, just deploy. And there's no validate. There's no experiment that gets done. And sometimes it's a simple experiment just running in parallel production, see if it performs better than alternative. And that can be sometimes that's as simple as a company needs to go because models are really simple, really complex. Don't don't be unreasonable. But the more complex the model is, more Data and throwing into it, the more need there is for rigorous experiments that in the front and trained model tests, the model is essentially modeled the Data. You haven't really followed the system. You just modeled that data set that you ended up doing the training testing. And so now you start talking about all the sample, you start talking about validation. There's a ton of ways to do that. But I think the most rigorous leads then to come back and say, I'm going to control the data gathering, I'm going to control each piece of this to verify that the model build this experiment with the model is the hypothesis in the center. And sometimes that model creates multiple hypotheses that you are working on multiple experiments to validate each core component of the obvious. And the whole thing's going not going crazy, but the core components of the model of core, the core architecture has to be validated in each assumption. Each major assumption is baked into that model. And when you talk about feature engineering by doing feature engineering, making the assumption into the model that each one of those has to go through some sort of experimental validation, in some cases that's not possible.

[00:48:15] And that's the one that actually leads me down to where Gimbals has started to very actively take on machine learning and say that it's not rigorous enough because it's you can't do the experiment. So what then is proposed the entire framework for how to do the sort of connecting the dots between building a deep learning algorithm and proving it out and doing this rigor without being able to do an experiment. And I think that's where in my head that's where future engineering sits, is that you're making an assumption into the model that assumption needs a validation point beyond just training, testing, deploying. And sometimes, like I said, it's as simple as just running production and performance versus current or alternative, but that requires a lot of monitoring.

[00:49:05] You really have validated the assumption, just kind of go a step further to say this model performs well and it has proved it performs well with the assumption that you can do a feature when you do feature engineering or any sort of aggregation process. When you're aggregating data grouping people together, that assumption is not baked into the model. It has to be validated.

[00:49:26] So would you say we're modeling the data? So I guess that that. The argument there is that, OK, we've got this data set, let's build a model that is perfectly modeled, this data set, but not the real world data generating process that actually generated this data. So now when we do have real world data that comes into this real world data generating process, we have it our model and it gives us a result. That result would not be validated because of the way that we have built our model is just based on this one little sample of data that we understand that. Correct.

[00:50:08] And even sometimes when you do more complex validations, you're still validated in the same dataset. Even if you do hold out and and you have a more rigorous approach to it. If it's all if you're all you're doing is modeling one data set from one particular gathering methodology, even if it's over a long period of time, even if you have two years to build your model on two years worth of data and then you validate it against the third, you you you're gathering has some assumptions in there. And so you have to that's why I go back to an experiment that models a hypothesis and it's it's not explainable enough to build an experiment on. It's dangerous, especially if you're in feature engineering was kind of the micro. But looking at the macro feature engineering data gathered, you gather data are in some way, shape or form feature engineering because you've decided how you're going to gather the data. But a lot of cases, there's no category to the data. It's just sitting there in a repository somewhere putting together what's been done to it since, I don't know. And so you have this huge black hole of assumptions that there's already been future engineering done on Data. And so a lot of times what your model does is it's a representation of the data in a different form. That's all your model ends up being.

[00:51:34] And you have just these gaps because you assume that the space that you're that you're presented with in the data is exactly the same as what you need to model. And that's a piece into the model itself, because, like I said, it just it's just a representation of your data in a different form. And where that becomes problematic is it's not measuring a system. It actually has no connection to the system under measurement. And so if you're talking about something behavioral, especially you back to the customer, for example, if you're talking about something behavioral in nature, you're not modeling a complex system that has no connection in any tangible way to the data set that you have. You know, you have some purchase. You have a little bit of customer histories of interaction with the website. How often did this person come to the website off the grid? Check out this one particular product before they bought it. How long do they spend on page? How much read? How much does somebody, a recommender systems, help them to sell to someone to purchase all that's great data. That's behaviorally you are looking at emergent outcomes of we're looking at emergent outcomes of decision making process. You're not measuring that process from measuring in particular kind of emergent behavior of a complex system. So your data set hides what you're actually trying to model.

[00:53:08] That's like I said, that's the piece that you need to go back and rigorously validate is in a lot of cases for business. That's cool like that, give you enough value in return on investment. So that's totally cool. You can bake in some assumptions. It can be way, way off from perfect. That's all good. Which is all you need to do is increase margin by eight percent. And so you don't really need to understand the complexities of how Ryan or everyone like Ryan makes a decision. And so sometimes modeling that data, that's sufficient. As long as you understand the limitations of that model, what happens in other cases is you overextend your overextend your influence over, extend your accuracy metrics. You say this model is accurate to X percent, but it really is. We all know that because it's accurate to X percent based on this particular data. And unless I go through another step, I'm not going to be able to extend that to say really anything beyond that. And you can run it in production and maybe test it and make the customers angry because baby, that and so there's all of these different validation mechanisms that you use to make sure that what you're saying and what you're expecting the model to do, it actually does.

[00:54:25] And it'll actually live up to your expectations. Well, from a business perspective. And what. You're saying that because you're looking for a model, especially about customers are making statements about them, customer, you're telling people that in some cases and what often happens, this is the passenger referring to is that you're out over your shoes. You're not actually able to make the prediction that you're making about predicting which stabilized. And Data model is a complex version of analysis. And so you have to go back if you want to make some of these assessments about why a customer does what they do.

[00:55:02] And that's when you need to begin to go backwards, do new data gathered and begin to understand the system that you're modeling rather than the data that you've built a model on Deshu, a super profound man.

[00:55:15] Yeah. So, yeah, 100 percent agree with everything you're saying. So how can we. So which question. Don't ask man. OK, so let's start, we'll get a few more minutes left office hours. But I think one question people are really benefit from is when you say a model is a hypothesis. Explain to us what you mean by that. And if it if a model is a hypothesis, how can we conduct experiments to make sure that we are capturing the real world data generating process and modeling the system and not the data.

[00:55:50] So when I say the model is a hypothesis, this is your version one. Like I said, you you've gone through iterate over multiple models. You figured out which ones the most the most capable is a great accuracy.

[00:56:04] Which one is the most capable of modeling your data in an accurate way, becoming a best representation of that particular data that you have and you're going to validate against that data. Sometimes you're validated against the stuff that you held out that hasn't been part of the true test cycle. And so you're going to do levels of rigor when it comes to sort of supporting your model. You want to be able to say this model does what this model performs.

[00:56:30] Well, this model seems to be X, Y may get one. So that's your hypothesis. I've come out and I've said this model works. And now, you know, I don't know if you've been beaten up by science, but is destroyed. And my models in the past, because I didn't take this next step as a model works here. I've done all of this rigorous validation of what I've talked about, the whole Data with totally different set of production. It works really well.

[00:57:01] Why my models?

[00:57:05] My hypothesis now I have to prove to you it works. And to do that I have to understand why. So your model is your hypothesis. The architecture weights, all of those components make up your hypothesis. And my empathizes is this is why some behavior or some prediction that I'm making is accurate.

[00:57:28] Now, I have to go prove that you have to create an experiment and this gets very, very deep into explainable machine learning. If I don't if I don't understand and do a lot of deep learning models really hard, how to how do you pull one of those things apart when you have millions of features? Terrible. It's terrible. Terrible. It's terrible. It's a different data sets you fed into the state. But how do you unravel all of this to figure out any that you could actually test? And there's a complex and see model, no matter how complex your hypothesis improved, even if runs for six months of production, it doesn't matter. You don't think about a hurricane model. If I munged a whole bunch of data about hurricanes together, created a machine learning model, whatever your algorithm or ensemble or whatever, and I've thrown it out there and it's been accurate for six months, do you think anybody in that community would look at your model as any? Would they look at it as valid?

[00:58:26] And thank you very much. That would open the floor up to either a necklace or you guys got questions that have been man, go for it. Now is the time.

[00:58:35] Yeah. I'm curious about the experiment you've designed because you said feature engineering. And so even with industry expertize isn't enough and the model as a hypothesis. And sure, you have to prove that to prove it. Correct. And how do you go about doing that? Is it necessary for small scale, really, really small scale things, or is it's only for large corporate projects that are going to be hit like millions of times? Well, so I want to be reasonable.

[00:59:08] That's the that's the number one thing that I say, especially in corporate environments. I'm talking about the Rigaud because this has to be applied to everything you build. But there are levels of rationality that you have to also bring because your projects only got so much budget. You've got a time crunch much more times. And that business doesn't understand what sciences. So that's kind of get off the ground, whatever you do. But even in small projects, you want to make sure that your model functions somehow, you don't want to just throw it into production, you don't do train, just drop. So you want to do some sort of production validation. You want to verify that versus whatever alternative that they're using right now forms. And so that's kind of your MOOCs validation piece when you go all the way out to a more extreme point where you design an experiment, where you you now control all of the data because that's what you didn't get the first time. In most cases you didn't get to.

[01:00:07] You don't get the Data pedigree. And so you don't really get to examine your variables as any sort of relationship to each other to outcome. You don't understand how aggregating them may have lost some of the information that you need in some cases, like a built a pricing model where the basic base assumption that increasing price would increase margins, but that that's not true because there's more factors to margin. Now, I've exposed something very tried to validate my assumption. My assumption was wrong. OK, let's go back now. OK, so margins started. We got it. So we buy the product for this much. We sell it for this much. If I increase the price, the cost stays the same, the same, then I'm going to increase margin from.

[01:00:57] There's also other costs. There's other variables involved in cost. Aside from just what I bought it from in some cases could be shipping involved. In some cases there can be the time that sits on the shelf and space that picks up on a shelf. And so by verifying all of these assumptions, doing just these little simple experiments, if I throw in a mockup Data and I assume something's going to happen and it doesn't. And the majority of cases I've done an experiment that I've received in the hypothesis and my model or one of the hypotheses of my model.

[01:01:30] And so it can be that this could be a form of experimentation, be something where maybe you deploy what you go through a process, you come up with some some candidate models. And from this selection of candidate models, you identify some models that seem to perform really well based on your training test portion of the model building process and similar performance across, let's say, multiple models. And you decide to deploy those and serve the average of these model outcomes as the customer facing prediction, what have you and. Kind of assess how each individual model performs in production. Does that make sense? I'm kind of describing yeah, it's like shadow testing, essentially a champion model that happens a lot.

[01:02:27] As far as how valid is it scary. Any time you serve the average of multiple models or it's almost the decision tree that's starting to happen. There's so many holes here that I don't know about. And that's like I said, it's on average for a bunch of models. I'm assuming I'm going to get a better model out of it or a better result out of it. And there's some cases where you're not going to because you're one of them, could be garbage in production. I'm trying to I'm trying to put it into words. I'm losing it right now. But yeah, yeah, you could have one of the absolute garbage all the time and you wouldn't notice it because the other ones are just kind of beating it out. Yeah. I mean, how do you gauge which one is right? What's the right answer. We started talking about optimization. A lot of cases and a lot of cases. It's not as easy as that's a monkey.

[01:03:28] That's somebody's hair. That's a pair of headphones. It's not so cut and dry on the validation side especially and well of pricing as an example. What's your optimal price? Was what was the highest amount that you can charge that individual and what's the right answer? Like, how do you figure out you can't do an experiment like that? That's one of those experiments where you don't you can't really do that because there's no way to get them to not lighting.

[01:03:52] And so you're now in sort of this position where you have all these assumptions baked into the pricing model and now you get to validate against some sort of idea of best optimal measure. It's really hard, especially if you have multiple models now playing together and serving up different prices, because as normal pricing models do, is you've got a ton of different models doing a ton of different things. Sometimes they're both serving a price.

[01:04:19] And that price, it's not really average, but the aggregate aggregate of both predictions then turns into a feature and gets sort of served up as a price. And there's no way to verify what's often. So verifying that case. It's a nightmare.

[01:04:37] Is it ever OK to build a biased model?

[01:04:41] Yeah. Oh, yeah. I think a hiring and I don't want to bias towards the candidate. I'm not. That's a great bias. I want to pick the best candidate. And so that's a score. It's a ranking. But at the same time, I want to give every candidate the best possible school because relative to each other, that bias is fine that way out.

[01:05:09] But it ensures that I'm not taking points away from somebody because of something small in the NLP side of the room, you know, just because I didn't work for things perfectly, that that ended up being some sort of feature competition further down the road. I don't want to discriminate against that person because the way that they use language, I want to make sure that everybody gets the maximum credit. So I want to bias that algorithm during training to rank the candidate as high as possible rather than trying to look at eliminating the candidate. And so trying to rank that person as low as possible in order to eliminate as many candidates as possible. So that's that's an instance where I've used bias towards the candidate. I've made the model give the candidate as high as possible so that I was including as many candidates into the into the review and selection process, rather than eliminating as many of them as possible from that process.

[01:06:08] Where are you getting the pricing model example? Right. You want to set the highest price will result in the thing being sold. So you want to bias your model towards higher prices.

[01:06:19] You know what's weird? No, no. Well, this is where OK, so models there are multiple pricing models out there, like Walmart has an Amazon has base and they all kind of talk to each other. And we don't know this rule. This is kind of misunderstood. Space algorithms talk to each other. We communicate with each other through their output. And if you look at the stock market, same thing happens that an algorithm buying and selling is in a way communicating with another algorithm. And in a lot of cases, these algorithms collaborate. And so sometimes raising a price has a cascade effect that is not necessarily positive. That can be because of an interaction between models, may cause a competitor to undercut you. And through undercutting you, they will kill more business and use. Lost revenue, lost margin, because that other model responded to your change, it saw you increase price and its rules, its hypothesis, let it's a sort of inference that this price would be able to pull customers away because you're competing for the same dollars. And so now you're raising the price because their model, who is looking at your prices on a regular basis to lower their price. And so they ended up getting customers because their price is better on this particular item and they're marketing around.

[01:07:45] That was better because it was targeted, understanding your pricing model and raise your price too much. The other example where something like that could be bad is that a lot of times you have a recommender at the bottom of your screen saying, hey, what about buying this? We're not buying that. And those items can be higher margin. So you might want to give somebody a deal on one product so they have enough left in the budget to buy a higher margin product. And if you look at electronics or TV sets, very, very low margin, which enables those types of accessories that you buy, tables are very, very high margin. So you want to take margin into consideration. And so if you cut a couple of bucks off of the television set, you're not losing much margin. However, if that causes the person to be more likely to buy a surge protectors of cables, now you've increased your overall margin sale even though the total price of sales has remained the same. So there's this is the you know, the the reason why machine learning is so much more complex and why we need so much more explained ability is because these interesting impacts that you only see if you understand the larger system.

[01:08:59] When you said earlier that really resonated with me, is that merely by choosing what data you're going to collect, you're inherently biasing the hypothesis and what you're looking to solve. So how do you go about preventing that?

[01:09:16] That's an art. So in some of the projects that I've worked doesn't really like that. I needed people way smarter than I'll ever be to help me not do stupid things. What you're hearing me say is a member of one people have sort of taught me through the years and Data gatherings and art control and the way that you gather data and not only the data that you've gathered, but also the conditions that you've gathered, you name. So every data point has metadata that creates provenance for that particular data point. And so you can have one number. That's the data point that you gathered and you could have five data points around it that describe how when what else was going on that might at some point in the future be important with respect to that data point that a lot of cases are probability of, you know, one thing doing another thing is also relative to a third variable. And if you don't have that third variable, what you're going to want to do in the future, you won't be able to because you're going to find out all I needed that other variable in order to do more of a causal analysis on this.

[01:10:32] So there are some really the artists that is the right way to do it and the scientific smarter people than I. But the way that you gather your data is really, really important. And more important than data. Point is, all the other pieces that you gather around it in order to transcribe that data point in the way that someone else can use it in the future.

[01:10:54] What's your philosophy on this? Because to me, when I'm building a machine learning model, like I'm cognizant of the fact that reality is fucking complex. There's a lot of shit going on. Right. And all I have to work with are artifacts from the real world data generating process that just happen to live inside of Data database. And I'm trying to make an inference from these artifacts from the real world that we somehow managed to capture to then make a statement about how the process that generated this Data behaves so that when a new data point comes in, that is a byproduct of this thing that I'm trying to model. I'm able to make an accurate prediction. That's kind of my philosophy around how I'm building or what I'm doing when I'm building a machine learning model. What's what's your take on that?

[01:11:46] I describe machine learning and the I guess the objective of machine learning and the reality of what happens as somebody just randomly picks somebody off the street. I get somebody to buy books and that person walks around San Francisco's a park or Central Park and in New York and just start pulling leaves off trees.

[01:12:12] And using this basket and now the company wants to explain the trees and I have random baskets where she believes that's that's my Data, that's really what it is. That's all you got to be lucky. If you might know, trees have a wood lucky. That person might have been nice enough to gather something that was not what you told them to, which may lead you to a deeper understanding of trees than just the weeds. That's how I that's how I explain machine learning as you first get leaves and then you ask the question, well, looks like these were attached to something. It looks like somebody tore something off here. Maybe we need to go back and gather some more data to figure out what what are these leaves attached with sticks and sticks looking very torn off. And you kind of do this. It's this unravel. And if you unravel enough, you're deep enough. Most I mean, you have to stop at some point because you got a budget, there's a time limit and there's a budget. See if you're really smart about where you stop. This is a radical, but at some point you have to start dealing with the greater complexities of the system.

[01:13:21] And that's this slow unraveling process of gathering data that tells you there's more data to be gathered, going to get that data and it'll tell you there's more data to be gathered, making mistakes and gathering dirt when that's not a true that literally lives in the dirt. It's a system that interacts with the tree, but it's not part of the tree. And so sometimes gathering mistaken data and having to do the analysis and the experimentation to realize, OK, that's dirt, that's the tree. We're talking about two separate things. But also at the same time, sometimes all the business cares about, as we're called, leaves will be next. And all you need for that is leaves and temperature. And you can start figuring out very quickly, even though those two really aren't that well related. Those aren't great data points. If you give me a temperature trend and it does Data about what color leads happen, what color temperature trends, I'm going to give you a not bad model. My model is going to not suck that hard. It's not going to work. True. It's not going to work. So again, what I talk about this was we know they're there as long as we know where the flaws roughly are and we know this model isn't great, but it doesn't suck. And as long as we don't overextend, that's where I like that's my metaphor for machine learning, is that you have to at some point stop. You will not get the full system because we we have a hard time with real complex systems getting enough data to model them correctly and verifying that a model really works and isn't going to get slapped in the face one day. So that's where I can especially in business. That's when I compare it to this one model that doesn't suck. You want to know where it stops, you don't understand and you can never Data forever. But you're going to have to stop at some point and you will sometimes gather data that doesn't have anything to do with what you're actually trying to model. And sometimes you have to not fall in love with that Data I think.

[01:15:13] Amancio, open it up for Nicholas and Curtis. You guys got questions. Go for it. If not, then we'll go ahead and end the office hours. So, Curtis.

[01:15:24] Yeah, once again, thanks for inviting me Harp of such short notice. I mean, when I'm involved in conversations like this, I feel like I don't know a about what is I do every day. So thank you. I believe in you speaking about like your module's your hypotheses and and I am basically emphasizing unexplainably I, I, I'm not saying that you completely rule out deep learning then as you know, those the models used for deep learning are quite difficult to explain and and get into.

[01:16:04] I had a really hard time with deep learning models that aren't explainable, but I don't completely discount them again, they're useful. As long as we can have some concept, you don't have to 100 percent understand how they work. They don't have to be 100 percent explainable. But you have to have some concept of where they stop working and so that you don't overextend sort of the inference that you're using, convince the business that this model will do something that ultimately it won't and it could cost the business money and could be disastrous. So you don't have to understand everything on the flip side of that. I do believe that deep learning actually in many cases leads to a component of causal model. And so I think deep learning can be a stepping stone towards causal emelle. And that's a long conversation. But I think deep learning has a role to play. The deep learning models. We're beginning to find out some fundamental patterns about deep learning models and some governing dynamics that are universal across different models that can be validated using things like calculus and that whole process there that like I said, girls kind of got me on this tangible, I think deep learning models are I think they're useful and I think they are sort of the intermediary step that we need. So I don't think all learning models are worthless. And even if you don't understand the model entirely, I think that it can be reliable enough as long as you know where the gaps don't overextend what it can, we can predict or class or whatever. And then, like I said, I think it's a stepping stone to something more complex and more causal.

[01:17:50] Yeah, right on. Well, then, thank you so much, man, for for hanging out and dropping some knowledge bombs on today's officers. Man, I really, really appreciate you swinging by and have given us all such an intimate lesson on your philosophy in machine learning, man. Really appreciate that.

[01:18:10] Thanks for having me. I appreciate the I appreciate the invite. I was. But, you know, I'm thinking through my process. When you guys asked me these questions, I'm hearing myself talk. Sometimes I'm hearing your feedback on it. And maybe you haven't thought this through all this. So I really appreciate you refraining, asking some the questions, sharing your insights to it. Like I said, it always makes me think of a lot of I missed what I messed up.

[01:18:37] Yeah, man, that's the one thing that I think it's I mean, there's many one things, but that mentality of, OK, what am I getting wrong? What am I missing here? Like, just a slight hint. OK, so how do I say this? You don't want to be unsure of your ability and your ability to get shit done, but you always want to be having this kind of yourself looking over your shoulder. Does that make sense? Like there's got to be a part of you that's always like I what am I really do? And do I really get this? Am I really understanding this the way you work? And hopefully I don't fucking sound like I'm crazy when I said that, but that's what I'm trying to get across there.

[01:19:17] Although I know what I was I, I said I know I can play anything Harpreet Sahota that's a crucial skill to develop and that will all refine our thinking over time. And it's just a matter of iterating on our current beliefs.

[01:19:33] Yeah, absolutely.

[01:19:35] I mean, like, for instance, questions about me when I started talking about the causal, you know, the connection between learning causal, I should be able to summarize that better in two lines and it couldn't. And that for me, you get out of my understanding. So those questions are great, because when I hear myself give a half assed answer, it's one of those. Wait a minute. I don't give that answer, but I have a better you know, there's something I don't understand.

[01:19:57] Obviously, we have been sitting from here, my perspective, and I feel like you understand this shit so well. It's definitely it's evident in the way you speak about it. These and gentlemen, thank you. I helped me in thinking then for coming on the show. It's really an honor and a privilege to have you on it. Now, like I said, you've got a permanent invite any time on Friday.

[01:20:21] During this time, if you're free, you are more than welcome to stop by at any office hours. This will be up on the podcast on Sunday morning, as well as the YouTube clip. Guys, be sure you tune in on Monday. I've got an episode releasing with Annie Duke, Annie Duke's book Thinking and that's changed my life. And I am beyond excited to release that episode. You know, when I when I when I was writing down a dream list of guests that I want to have, number two was any do no understand really number two of any Duke. So. Didn't get Dan Ariely yet, I did get Jeff Chrysler, who co-wrote a book with Dan Ariely. So maybe I might make that happen somehow. But having somebody that was like the number two on my list on the show to me is such a huge deal. So I hope you guys check that episode out. She goes deep for about good 15, 20 minutes about about elections, essentially, and all that goes wrong with polling for elections and decision making in that context. So it's going to be very well timed with the US elections happening on the very next day, the following Tuesday. So, again, guys, thank you so much for hanging out, man. Appreciate you coming on.

[01:21:44] And I will see you around and looking forward to that. Any dark episode. All right. I'll send you a link for sure. For sure. Take care. But.