HH51-24-09-21_mixdown.mp3-from OneDrive Harpreet: [00:00:08] What's up, everybody, welcome, welcome to the artist and Data science, happy hour is Friday, September twenty four. Oh my gosh, the end of September. This is Data science. Happy hour number fifty one! That means next week is Data science. Happy hour number fifty two, which means I've been doing this thing for a year, one year doing these these office hours. Obviously, I've been doing the podcast for a little bit longer. The podcast had its first year Harpreet: [00:00:34] Anniversary back in Harpreet: [00:00:36] April. I didn't start doing Harpreet: [00:00:37] The happy hour Harpreet: [00:00:38] Sessions until about five months after that. So yeah, man, next week is is the the the first one of these? Hopefully, you guys got a chance to tune into the episode I released with Dennis Will, who is a Data engineer based out of Berlin, had a great chat. I actually met Dennis through Link, not LinkedIn, but Instagram. He is Azure Harpreet: [00:01:03] Will on Harpreet: [00:01:04] Instagram. He hasn't posted in a while, but his content is amazing. It's been great. So hopefully, you guys get a chance to connect with him. I don't know if anybody caught the episode with with Lex Friedman this week, Harpreet: [00:01:18] Not with me. But Lex Harpreet: [00:01:19] Friedman had an interview Harpreet: [00:01:20] With Travis Harpreet: [00:01:22] Oliphant, who is the creator of Nampai and Sci-Fi, and that was just a really cool episode, man. Really great way to get some history on. On the stuff that we use on a daily basis, so those really cool interview, hopefully guys get a chance to check it out. I'm about like halfway through it and it's been amazing so far. Shout out to everybody in the room. Tom, what's going on? We got Eric. Eric is back. Eric, congratulations on the move. How does that move? Vu Nguyen: [00:01:52] Good to be here, good to be back. You know, life's good, can't complain. Harpreet: [00:01:55] Awesome, man. Awesome shot out the coast of joining us from Down Under Russell out there [00:02:00] in the UK. Matt and A.A., I forget where he's at, but also somewhere in Europe. A lot of countries in Europe, I can't remember them all. Harpreet: [00:02:08] Well, I'm happy Harpreet: [00:02:09] To have all you guys here. Let's go ahead and get started. So if anybody has questions, whether you are watching on LinkedIn on YouTube. Even right here in the room, if any of you guys got questions at all, feel free to drop your question right there in the Harpreet: [00:02:23] Chat or the comment Harpreet: [00:02:25] Box, wherever it is that you are, and I'll be happy to to to get your question. But until then, man, let's let's go ahead. Kick this off. Oh man, we got we got an old friend of mine who lives in the building, so I used to work with vou way back in the days. The first job I had during and after grad Harpreet: [00:02:46] School at Harpreet: [00:02:47] This company called the warranty group. That's what is Harpreet: [00:02:51] Called. Good to see you Harpreet: [00:02:52] Here, my friend. But yeah, man, let's go ahead and get started. For anybody, ask questions. Let me know. I came super unprepared. I don't even have like a question to kick us off and get it started with. Harpreet: [00:03:02] So if anybody Harpreet: [00:03:03] Wants to take responsibility for that, I'm more than happy to Harpreet: [00:03:06] To Harpreet: [00:03:07] To pass that responsibility to you. Otherwise, I'm just going to keep on just talking and filling up dead air until somebody gets a question, which I'm sure nobody wants any of that stuff. Vu Nguyen: [00:03:17] It's my first Officer Hour do people like drink beers or anything during, yeah, Harpreet: [00:03:23] Yeah. Typically just me. I think Russell might be having beer. I'm not sure who else is, but I we just hang out, talk about Data science related stuff. Vu Nguyen: [00:03:32] Sounds good. I'm going to grab you, one man. Yeah. Kausthub: [00:03:38] For me, Harpreet: [00:03:39] What time is it over there? Kausthub: [00:03:42] Seven. Thirty. Wow. Maybe a bit soon for a beer, I think. Harpreet: [00:03:46] Yeah. I mean, depends. Harpreet: [00:03:47] Depends. It depends what your lifestyle is like, man. Matt Damon asking if I'm a Niners fan, Yes, I Harpreet: [00:03:53] Am, and I understand I love the Harpreet: [00:03:56] 49ers. Absolutely love them. And [00:04:00] he's asking, Have you guys? Vu Nguyen: [00:04:02] You go on, Matt? That's great. That's impressive all the way out in Canada. And I'm on I-80 right now. That's I'm happy to see that. Harpreet: [00:04:12] Yeah, man. Well, not I don't know if people know Harpreet: [00:04:14] Or not, but I'm actually from Sacramento, California, born and raised in Sacramento. Harpreet: [00:04:19] That's my hometown. So. Vu Nguyen: [00:04:21] Oh, no kidding. I know the I.D. I am on my way there right now. Harpreet: [00:04:25] Oh, to Sacramento. Matt: [00:04:27] Yeah, I'm driving there right now. Harpreet: [00:04:29] Yeah, right on. Well, just tell Harpreet: [00:04:31] Everybody I say, what's up? I think Mark's actually from Sacramento as well. Yeah, man, let's go ahead and let's Harpreet: [00:04:37] Let's get this started. Harpreet: [00:04:39] All right. So the question I would want to kick off with, I guess, is how are you guys like handling? Like, I guess. Ok, so I've been Harpreet: [00:04:51] The heads down thinking and talking and Harpreet: [00:04:54] Writing about experimentation management, Harpreet: [00:04:58] You know, just this Harpreet: [00:04:59] Part of part of my job. And I'm wondering, you know, before you guys? Really started doing hardcore machine learning and really building out models like how were you managing your experiments before? Let's start with Tom because I'm sure Tom has some great insight because that's typically what happens when you get white here. Is he a good insight? Thom: [00:05:22] Yes. Each of these hairs that are gray, Harpreet: [00:05:26] Which seems like Thom: [00:05:27] It's all of them now. They each turn grave because of some thing, I learned the hard way for the stupid way, but I come originally from doing mostly physics based modeling and. When you meet an engineer that does a lot of predictive modeling, even if they do very similar to what we do, which they would do if they were doing empirical modeling or design of experiments Harpreet: [00:05:56] Or factor Thom: [00:05:58] Analysis with ANOVA very, [00:06:00] very close Harpreet: [00:06:01] To Thom: [00:06:02] Data science type modeling, but very different lingo, too. And so when you're talking to them, you have to just be patient like, OK, there's going to be semantical differences. Stuff like that. Harpreet: [00:06:14] But what was key Thom: [00:06:15] Is there was always a methodology. And when I started migrating to more and more data science work, I started looking for the methodologies and realizing, Oh, this is so new, everyone's not necessarily communicating Harpreet: [00:06:32] The wisest Thom: [00:06:33] Methodology in a machine learning pipeline development. And so I just started collecting them. And over time, I saw how a cyclic they could be when you're developing machine learning problem, too. And then it was quite freeing. It was it was almost upsetting at first to realize. Oh, you mean we actually use models Harpreet: [00:06:56] That aren't in the Thom: [00:06:57] Ninety five plus percent accuracy range? And then it dawned on me, Wait a minute, I'm spoiled by the engineering realm if you get a model in place and you had no model before. Sixty five percent accuracy is a godsend, then you can improve things from there as you collect more data. Harpreet: [00:07:17] And so it was it was Thom: [00:07:18] Quite a bit of a change, mentality wise, but really striving to capture the concepts that that a data scientist has to operate in that are different from an engineer that models things. That was the first big thing. And then. Once the concepts were there, really mastering a Harpreet: [00:07:41] Methodology that was key, I found. Harpreet: [00:07:44] And then about methodology real quick, I got a question on that. So I like it. There is many, many methodologies out there. Is it important that we all have the same methodology or is it just important that when somebody approaches a problem and solves a problem, that their [00:08:00] methodology makes sense and is coherent? So his methodology is something that is written down in stone? Is it flexible? Is it problem dependent? Thom: [00:08:10] Well, forgive me for Harpreet: [00:08:12] Leveraging Thom: [00:08:13] Wisdom from another field, but I'm going to lean over to Chuck Norris. If we were learning martial arts, he'd say, OK, go study one of the ancient tried and true. Disciplines first, like maybe he did taekwondo, for example, and then say, OK, now once you're good at that, at those basics, then you can try to do some new fancy things. So what I'm getting at. If one of us was getting into car mechanics. There may be someone in our group that say, Oh, let me tell you what tools you need initially and what things you need to focus on for the basics so you won't get lost when you're working on cars. Yeah, you're going to work on something unique every once in a while. But if you know those basics and you know how to look things up, then the world's your mechanic or oyster. So it's a spirit of if you Harpreet: [00:09:12] If you build your shop and you get the right Thom: [00:09:14] Tools in there and you get the right manuals and the right learning Harpreet: [00:09:17] Resources, you can go at Thom: [00:09:19] Anything. I think that's where new data scientists need to get, understand the concepts, get your tool sets that you like, get your basics. And yeah, every once in a while, you're going to have to go learn something new. But if you know those basics, they're going to really help you. And I just want to give a shout out to my top Data science student. He approached me months ago now and now we're like brothers, and that's Greg Gokwe. He's an outstanding Data science student, but obviously he's a leader leading integrator of data science into the business realm. Harpreet: [00:09:55] And that's where I Thom: [00:09:55] Lean on him myself. Harpreet: [00:09:58] Thank you very much, Tom. Great, [00:10:00] great insight there. Harpreet: [00:10:01] I'm wondering, I would love to hear from Coast about this. So pretty much the question Harpreet: [00:10:07] That I'm asking is pertaining to Harpreet: [00:10:09] Methodology is like, is there a methodology that's just like, this is the methodology that we should Harpreet: [00:10:16] Use or is Harpreet: [00:10:17] Methodology dependent? Does methodology just need to be sound and reasonable from one practitioner to the next is a problem. Dependent is the industry dependent. You know, let's pick it up from their coast up girlfriend. Then after this, I'd love to hear from either Greg or Joe. Also shout out to some new people that joined Harpreet: [00:10:35] In Joe Harpreet: [00:10:36] Being one of them. What's up? The Vivienne is in the building. I'd love to hear about Vivian's new job and how that's going. Also, we got Matt Housley joining in, so I'd love to ask him this question as well. But Kosta, my friend, go for it. By the way, everybody listening. We are taking your questions, so let Harpreet: [00:10:51] Them drop them in Harpreet: [00:10:53] The chat. Drop them in the comment section. Keep an eye out for them. Kausthub: [00:10:56] So I guess, considering we've got so many Star Wars themed backgrounds. I simply go by the, you know, the axiom of earlier Sith deals and absolutes, right? I mean, everything is a measure of shade of grace. So when you're talking about methodologies and approaches, it's really what works for that particular business, like what I found is working across a few different companies in mind. I mean, I'm only pretty early in my Harpreet: [00:11:22] Career, but I've worked across Kausthub: [00:11:23] Three or four different companies, and they're all different sizes, different mentalities Harpreet: [00:11:27] And their mentality Kausthub: [00:11:29] Specifically works for them. And that's what makes them companies that work out well. And the same thing you can kind of apply to your approach as a data scientist or your approach as an engineer. What works for Harpreet: [00:11:42] One set Kausthub: [00:11:43] Of problems like what worked in robotics may not work specifically in a non robotics related data science area, right? Sorry, guys. Harpreet: [00:11:53] Right, appreciate that, coast. Thank you so much. Let's go to Joe for this one. And then by the way, if you guys [00:12:00] are watching on LinkedIn, do me a favor, go ahead and hit share. Share this with your network. Let people know that this is going down, Joe. Go for it. Joe: [00:12:10] Oh, hello. Yeah, so I think it's an interesting question. Joe: [00:12:14] I mean, I'm not sure what spawned this. I sort of showed a bit late to the party, but can I get some context on what prompted the discussion on methodology to begin with the better user? Harpreet: [00:12:24] You know how it goes. Someone says something and then it sticks Joe: [00:12:27] Out to you? Harpreet: [00:12:28] Yeah, yeah. So, yeah, Joe: [00:12:30] You know, methodology is an interesting thing. I mean, because as you know, my background Harp been a lot of different things, and I would say there's what I've realized. There's not a universal methodology for anything, what works in one area. I would say what works really well, the best methodology is to adopt a lot of different methodologies and use them on a situational basis. You know, Harpreet: [00:12:50] It's sort of like, you know, Joe: [00:12:52] Tom mentioned Chuck Norris, and I'll mention Bruce Lee sort of the tao of Gee Quando, you know, but I, you know, he's an early adopter of what I guess is now I Harpreet: [00:13:02] Am a may and I kind of Joe: [00:13:03] Approach things like that where one style is great, except, Harpreet: [00:13:08] You know, it doesn't really work Joe: [00:13:09] All the time if you're so you know the thing I think being dogmatic and having one methodology is actually it works counter to, especially in a world that changes as quickly. And I think Harpreet: [00:13:21] It Joe: [00:13:21] Revolves around so many different mental models, you know, so I think the more you know, the more mental models you can adopt, the more methodologies, the better, actually. So because the thing is, it also prevents, I think, you know, not to belabor this, but it also provides a competitive advantage to where if you're in a room full of people who only know one thing and you know how to approach things from like 20 different ways. Who do you think's going to have a better outcome? Harpreet: [00:13:45] So I don't think the Harpreet: [00:13:48] Great flexibility, Kausthub: [00:13:49] Doesn't it, like your ability to see that a situation specifically doesn't react well to a particular methodology is that we're Sanjay. Joe: [00:13:59] Exactly. You [00:14:00] know, I got this from my, you know, another thing from Charlie Munger, Warren Buffett's partner, but he's he's sort of like, you Harpreet: [00:14:06] Know, Joe: [00:14:09] Attributed the kingpin of like mental models. You know, he I think he says you need Harpreet: [00:14:12] About 90 mental models Joe: [00:14:14] To be effective in this world. Kausthub: [00:14:17] So my question for you on that, Joe. Is it more important to know which mental model to use or more important to know when the mental model you Harpreet: [00:14:25] Are using isn't working? Both. Kausthub: [00:14:30] If you can have one rather than the other, which we can. Joe: [00:14:33] Both. So I mean, the truth are two sides of the same coin, really, it's like, you know, I need to know which one to apply to this situation. I also need to know the limits of my mental model to know what I don't need to use it. Harpreet: [00:14:43] So, yeah, and just for everybody listening out there, a quick definition mental model, Harpreet: [00:14:49] Personal internal Harpreet: [00:14:50] Representations of external reality that people use to interact with the world around them that are constructed by individuals based on their unique life experience, perceptions and understanding of the Harpreet: [00:15:01] World. That's great. Harpreet: [00:15:02] Let's let's switch the conversation to mental models now since we're on that. So talk about mental models. What are some mental models that we should probably keep Harpreet: [00:15:09] In mind as as Harpreet: [00:15:11] Data scientists like as you're doing your work, is there something that you like, maybe a mantra or a just mental model? Harpreet: [00:15:17] I guess Harpreet: [00:15:17] That that you kind of apply regardless Harpreet: [00:15:21] Of what the problem Harpreet: [00:15:22] Is, I guess a universal mental models. I guess we start with Joe on that. Then I love to hear from Matt and there's a bunch of people popping in. We'll get to all of you guys. And if you have questions, let me know I'm keeping track in the comments and in the chat. Joe: [00:15:37] One thing I also took from Charlie Munger, if you can't tell you like my favorite person on the planet. But the mental model I took away from him was simply just invert everything. Harpreet: [00:15:44] So if you if you Joe: [00:15:45] Hear a question or a statement, what if you flip it inside out? What is that? I think inversion is the most underrated and most powerful tool you can find out there, and so that's how I would approach any [00:16:00] problem to start out with at least. So if you don't take anything at face value, like flip it Harpreet: [00:16:06] On its head and what does that look like? Kausthub: [00:16:08] I mean, that's basically the tenets of Harpreet: [00:16:10] Predicate logic, right? Like if you put Kausthub: [00:16:13] Something by negation, Joe: [00:16:15] I mean, you want to talk about proofs for a bit. Not necessarily a lot of the math professor. Harpreet: [00:16:25] So real quick, though, just that can we get a concrete example of inversion or maybe concrete example of how you use inversion when you're faced with, like with a Data problem like? Harpreet: [00:16:34] Like how Harpreet: [00:16:35] Should we think about that Harpreet: [00:16:36] When we're working with, you know? I mean, Joe: [00:16:38] It is a simple thing. Isn't somebody makes an assertion, right? Like this? This is Harpreet: [00:16:43] Like, this is what Joe: [00:16:44] I'm you know, this is what I propose, right? Well, you know, I mean, what? You know, or what's a good example that comes out? Give me like a Data science question, for example, of comes up a lot. Maybe I can try and invert it for you. Harpreet: [00:16:57] So yeah, how about this? Which which algorithms are used to solve this binary classification problem? [00:17:05] Mm hmm. I. Joe: [00:17:10] So I would probably like Harpreet: [00:17:11] Flip it on its head and ask, like what Joe: [00:17:12] What what approaches would not work for binary classification? Harpreet: [00:17:16] Ok, nice. All right. Harpreet: [00:17:17] So yeah, yeah. Awesome. Thank you. Thank you. Greg, go for it. Vu Nguyen: [00:17:24] I totally agree with Joe in that sense, inversion, so this is definitely a mental model that I apply when I build road maps. Because you enter this framework where you're listing all of these great ideas that you feel will transform based on feedback that you receive from the world, and you're excited about these ideas. But you don't realize that you're putting yourself in the corner in this bubble where that idea is very best-case scenario. [00:18:00] So to do that, to refrain yourself from, you know, staying in the bubble for too long, you have to kind of invert each of these things that you come up with by coming up with things that would go against these ideas that you come from. You're coming with you, coming up with. So you come up with Idea X y z. You create a list of things that, you know, kind of convince you that something else can work better or something much more simpler can work better in order for you not to spend too much time in these big ideas that you think will change people's lives when something much simpler could have done the job in the first place. So when you're building that roadmap, you have to constantly question yourself and evaluate the things. I would go against it to make sure that you're hitting on the right points. And that's one mental model that I always keep when it comes to defining what the future needs to be for a technology or a tool, et cetera, et cetera. Harpreet: [00:19:08] And I go for it, Tom. Thom: [00:19:12] Just briefly, this whole talk actually relates very closely to the my favorite talk that I give called integrating brilliance. And I gave it a day's go with your help on the question, answers Harpreet, and I'm giving it again here soon at future data driven. But I got inspired by tracking the growth of math and science thinking over the centuries and wanted to kind of see was there a pattern that caused the big jumps? And so but a big key when we're integrating brilliance is. Harpreet: [00:19:52] To say, you know, wow, Thom: [00:19:54] Control system design, that's really cool. But if I abstract it and really understand [00:20:00] the concepts, if I understand that abstraction of those concepts very well, I can apply it to other areas of my life in other areas of math and science. The general principles so Harpreet: [00:20:13] I like a lot Thom: [00:20:15] What Joe was saying about, well, you know, we only need X number of models to really make it well in the Harpreet: [00:20:20] Site. But the the models are Thom: [00:20:22] Patterns to help you think their thing, their patterns we've seen over and over again. But I like what Russell's saying to don't be ironclad in thinking there's two problems too much trust in the Harpreet: [00:20:35] Model and misapplication Thom: [00:20:37] Of the model. I think the biggest problem with logic is people thinking they're being logical and not applying logic Harpreet: [00:20:43] Correctly, and I'm saying that Thom: [00:20:45] Of myself too. We need to always be suspect. Well, yeah, I've got this great model, but am I really applying it? Well? Yeah, we know the pain of troubleshooting our own code, even when we're importing modules from SQL. Learn so it happens. We must supply things all the time. We have to hold ourselves suspect constantly, but we sure make a lot better progress when we take this wisdom from the ages and abstract it and try to leverage from that rather than just shooting from the hip all the time. Harpreet: [00:21:24] Thank you, Tom, thank you. Go step, go for it. Kausthub: [00:21:27] So as someone who's more new to all this and don't have all the gray hairs of experience, right, Harpreet: [00:21:34] How do I, I mean, is there somewhere where Kausthub: [00:21:38] I can start looking into to learn more of these mental models that specifically apply to Data science now? Obviously, I'm going to pick them up as I go through my career through experience, right? But how do we fast track that like you guys spent years learning all these mental models, right? How do you how Harpreet: [00:21:54] Do we then send that Kausthub: [00:21:55] Data? I mean, anytime we develop some Harpreet: [00:21:57] Kind of knowledge, we pass Kausthub: [00:21:59] It on through teaching [00:22:00] and through learning, right? So how do we how do we do that? How do we get to mental models for the same thing? It's not a technical skill, right? Harpreet: [00:22:10] Go for it, Greg, because it was Harpreet: [00:22:13] Beeping like crazy. Vu Nguyen: [00:22:15] Oh, I'm not sure what it means. Go for it. Go for it, Greg. Yeah. So I mean, I'm assuming your question, I think is valid for whether you're a data scientist or not, right? So to me, it's about pulling these mental models based on your experience as you go through through things, right? So it's out of your professional life or personal life. If you go through things that teach you certain lessons, you create mental models this way or the other way to is to talk to people who has been there before. Right. So one of the ways you can do that is come into a platform like this, talking to folks who have been here before and has seen success failures and created their own mental models. And you can learn from those to inform your own when you're in a working environment. You, you're talking to your peers, your manager, your mentors. They will give you things that will help you create your mental model. And I think those are the things that are not. They shouldn't be they should constantly be evaluated. Right. So you may forge ones that don't help you progress throughout your career, and you may forge one sums that do help you progress and you will have to pull some in your hat depending on the situation that you're facing. And to me, it's it's about, you know, learning on the go. Talking to people and standing on the shoulders of giants and in you make your own and you move forward. Harpreet: [00:23:52] Greg, thank you very, very much. Joe: [00:23:54] I would also add to, you know, the way Munger describes developing mental models is just read [00:24:00] a ton, right? And read it in areas that are outside Harpreet: [00:24:02] Of your Joe: [00:24:03] Normal discipline, like that's where he, you know, got a lot of his mental models. It wasn't like he had a book Harpreet: [00:24:11] On his Joe: [00:24:12] Mental models for dummies or something. I mean, there's Harpreet: [00:24:15] Also like, I think, one of the Joe: [00:24:15] Smartest people on the planet. But that notwithstanding, it's like you just have to have a natural curiosity and read outside of the normal stuff that you're typically reading. So if you're doing Data the science, I would say like. Right stuff out of left field, too, I don't know what that could be anything really, but it's about developing the habit of developing a natural curiosity and letting there's a notion of the compound interest of knowledge, right? Actually, it's a really good book here. I got to the I called the joys of compounding. I actually can't really stick my stupid background with the Star Wars guys over here. Harpreet: [00:24:49] So but in all Joe: [00:24:52] Seriousness, the. The whole the whole notion is just, you know, develop compound knowledge over time. I mean, that is your biggest investment. There's no shortcut that you're not going to get like one hundred mental models in a day or even a, you know, a year or something. It's like this takes an insane amount of time to build. So but there's no one direction to choose, either. But I know Monger, who bring up things like know the basics of chemistry, know the basics of physics, know the basics of psychology, you know the basics of all these. All these like the big ideas in the world, right? That's what shapes human knowledge. And to him, that creates ultimately what he calls a Lollapalooza effect, where all of a sudden, because you have all these different ideas, you able to synthesize new ideas and nobody's ever thought of before. Because you have such a. But this is a completely individual, there's no one way to do it. I read a ton. I probably read one or two books a week in addition to like a ton of articles, because I that's just how I've been wired Harpreet: [00:25:50] Since day one. Joe: [00:25:53] Not everyone's going to do that, but I think the most important thing, you just make the investment to learn every day. Is it even if it's 20 [00:26:00] minutes, you're still better off than you were, you know, the day before, depending what you're learning and just don't do like chewing on or some crazy shit like that. So well, just to start with, Harpreet: [00:26:09] You are I mean, we read what you love until you love to read, right? That's it's a good way to, you know, develop a habit of reading. Eric says he likes listening to Alan Watts to think about Harpreet: [00:26:20] Life in any way. Alan Watts Harpreet: [00:26:22] Is awesome. You should listen to like the Harpreet: [00:26:24] 10 year of the Don Harpreet: [00:26:25] And Alan Watts albums that are out there. They are friggin phenomenal. I'll send you. I'll send you a good one in a second. Joy is a compounding Gautam Harpreet: [00:26:35] Babe to add that Harpreet: [00:26:36] To the, you know, get delivered this weekend at some point. Eric has a question. Let's go to Eric. Eric has the actual data science question, but you know, I'm about these philosophical discussions. I love this shit. But Eric gets back on course. Vu Nguyen: [00:26:50] Yeah. So let's see here a little bit of background. So my question is about Bayes theorem and the explain it like I'm five Harpreet: [00:26:58] Version of it because I have like I Vu Nguyen: [00:27:02] Have a little bit of exposure to it, but I always get posteriors and priors turned around in my mind or whatever. Harpreet: [00:27:10] And then the Vu Nguyen: [00:27:10] Reason that I'm curious about it is because I read a really interesting article, I think from a few years ago about LinkedIn uses a Harpreet: [00:27:19] Or used. I'm sure Vu Nguyen: [00:27:20] They've updated it by now, but they have an algorithm for detecting spammy accounts Harpreet: [00:27:27] By the names, by just by Vu Nguyen: [00:27:29] Using your name. And I guess they supplemented it with email, but they got really good results just by using your first and your last name, so they don't need a lot of other Harpreet: [00:27:37] Information about you. And the paper that Vu Nguyen: [00:27:40] They shared was them Harpreet: [00:27:41] Using a naive Vu Nguyen: [00:27:42] Bayes classifier, Harpreet: [00:27:44] And they broke down the words or the Vu Nguyen: [00:27:47] Names into three grams, including like a start character. Harpreet: [00:27:52] So if your name is Eric, it would be like. Start dollar sign air than dry. Vu Nguyen: [00:27:58] I see and then I see [00:28:00] slash or something for an end, a beginning and an end character. Harpreet: [00:28:03] And so I'm Vu Nguyen: [00:28:05] I don't understand how a Harpreet: [00:28:06] Naive Bayes Vu Nguyen: [00:28:07] Classifier works, and I was hoping to get the explain it like I'm five version of that from somebody here. Harpreet: [00:28:15] And to Matt Horsley, if he's still in the building. Joe: [00:28:19] Is he? On the bill, I'm in the building, I don't have a super good explanation, actually. Maybe I can prepare something for next week. But the original Bayes theorem is actually just set theory. Harpreet: [00:28:31] I won't try to Joe: [00:28:31] Go through that now, but you can basically figure out the original base theorem just by using diagrams. Harpreet: [00:28:37] But yeah, Joe: [00:28:38] Someone have a better explanation of what, like naive Bayes classifiers and how those work. Harpreet: [00:28:43] Yeah. Harpreet: [00:28:45] That Tom or Andrew or anyone? Just an ELI5 of naive B. Thom: [00:28:53] Well, prior probability represents what is originally believed before new evidence is introduced and posterior probability takes this new information into account guilty of Google searching that. Harpreet: [00:29:10] So I guess. Let's drill it down a little bit further, Eric. Is there a specific part of the naive base algorithm which is Harpreet: [00:29:18] Making you have Harpreet: [00:29:19] A headache? Harpreet: [00:29:21] I guess I just don't I guess I just don't Vu Nguyen: [00:29:23] Understand how it. I don't understand how an AI based classifier is. Classifying, I guess, I just I just don't understand like, is it taking if we have? Yeah, I just don't even I can't figure out from what pieces of those three grams or something or anything, it's like taking in to update its update, its model every single time to Harpreet: [00:29:48] To train, Vu Nguyen: [00:29:50] To train them on, you know what I mean? Like, I get the idea with like a. Uh, linear regression and you got your and the minimizing your [00:30:00] errors in that, but I just don't understand what's happening. I just don't understand what's happening with me. Joe: [00:30:04] Do you understand what's happening in Bayes theorem itself? Yeah, that's good because I would use that as well Harpreet: [00:30:12] As I wish I did, right? Joe: [00:30:13] So I start Harpreet: [00:30:14] There. Joe: [00:30:15] Basically, you're just using Bayes to classify something at the end of the day. I mean, it literally is that right? But in order to understand Bayes, you need to understand that. Harpreet: [00:30:25] I think sort of what Joe: [00:30:26] Matt was talking about and Thomas talk about, which is Harpreet: [00:30:30] So Joe: [00:30:31] How much a probability do you understand? Vu Nguyen: [00:30:35] I'll tell you when Harpreet: [00:30:36] I get stuck. Ok. Joe: [00:30:39] How would you describe this man? I mean, sat there, I think is a good basis for this. Maybe we don't need to get too complicated, but if you want to explain like it's five anyway, right? So. Thom: [00:30:48] Just real quick, real quick. Eric, let's say you're just looking at a brand new field of Data and you've got all these possibilities. But then one of those possibilities happens with that possibility just kind of narrowed the field of now what's possible after that? So when you get down to it, base is like saying, what's the probability of something happening given something that's already happened? Now with these engrams is just a chain of that. Ok. Harpreet: [00:31:27] And so this Thom: [00:31:28] Prior. Oh boy, I can hear myself anyway, is is kind of saying, well, Harpreet: [00:31:34] Based on what evidence I Thom: [00:31:36] Have, I'm going to make a guess. But the posterior is like saying no, a look back now I'm going to use hindsight now I'm going to I'm going to borrow Harpreet: [00:31:45] Some great wisdom. Thom: [00:31:48] And I'm sure he got it from someone else, but Bruce Lee. You know, when he started out, a kick was just a kick. And then as he wanted [00:32:00] to improve his knowledge. Oh, a kick was so yeah, exactly. Thank you, Harp. A kick was so much more than a kick. Can you hear this thing about? I don't fear the man that is that knows 10000 moves. I fear the man that has practiced one move ten thousand times. Harpreet: [00:32:20] You know, it's this spirit. So Eric, Thom: [00:32:24] What we're sharing with you today is going to help. Harpreet: [00:32:27] But when you really get in there Thom: [00:32:30] And sweat and bleed through coating it from scratch with the math and everything without modules and stuff, and then you get to the end of it, you're going to look back at this conversation and go, Oh, it's like what Joe and Matt and Tom were saying. It's just the possibility of something happen giving this to other things already happened, but now you'll be seeing it with the fog cleared. You've you've marched through the details of that Harpreet: [00:32:56] Canyon and gone through all Thom: [00:32:58] Those obstacles numerically. And now you're looking back and you're going, Oh, oh OK. Joe: [00:33:06] You start noticing bays around you all the time, though, right? So like when you when you look outside your window, for example, right, you look at the weather, right? So I mean, is it classic kind of classic beginner Harpreet: [00:33:17] Examples when you talk about Joe: [00:33:18] Baz's? Well, OK. So it rained yesterday. So what's the chance it's going to rain today? What's the chance? It's going to rain if it rained today was going to be sunny tomorrow and so forth. Right. So because you're basing a prediction, you know, an outcome based upon Harpreet: [00:33:33] Past Joe: [00:33:34] Probabilities and what's happened, so you want to be it. But are you familiar with what conditional probabilities are? So I'm going to condition Harpreet: [00:33:41] Basically like there's a probability Joe: [00:33:42] Something may happen based upon something else. Mm hmm. Right. So coin flips are a good example, right? Harpreet: [00:33:49] And so forth. Sure. Vu Nguyen: [00:33:51] So so there's always. I was going to say I always see it the same way, too, like for days, my my [00:34:00] go to use case, it's kind of like what are the probabilities that I will bring my umbrella out today or tomorrow, right? And this is going to be based on the probability that it will rain. Right. So if the chances are raining is high, then probably the probability of me bringing an umbrella would be high. So that's how I understand it in the way you're describing your use case, and it seems like they're looking at. Uh, the probability of an account being a fraud based on the structure of the account or past structure of accounts being fraud, something like that. So it's, uh, it's something I've never tested myself in terms of understanding, but it's it's cool that you bring that up. It makes me reinforce my understanding of that, of that notion. So thanks for that, for that question. That's really cool. Harpreet: [00:34:53] So if I'm understanding it's kind of saying Vu Nguyen: [00:34:57] Thinking back on the names thing, if we're taking little three Harpreet: [00:35:01] Grams, then it would be Vu Nguyen: [00:35:02] Saying. We're looking at these little chunks of Harpreet: [00:35:07] Three could be four or five Vu Nguyen: [00:35:08] Letters or whatever, right? Harpreet: [00:35:09] But let's just say three. So we'll be Vu Nguyen: [00:35:11] Taking these chunks of three and looking at 60 million names worth, which is going Harpreet: [00:35:16] To be Vu Nguyen: [00:35:17] Well over a hundred million little inputs. And then is it essentially just making the model more and more confident with each one of like saying like, this is a good one, this is a good one. This is a good one. This person's name, that's J.H. J. It's probably it's a bad one. And then that way, when somebody so it's like training it as to building up that probability so that it recognizes those strings of letters, is that just kind of the basic Harpreet: [00:35:41] Ideas like with like Vu Nguyen: [00:35:43] What you said of like the naive basis it's seeing, it's looking back at what it's seen before and then updating itself with new information that it's received Joe: [00:35:50] By positioning on Harpreet: [00:35:51] Basically, yeah. Joe: [00:35:52] I mean, what maximizes the probability of this incoming Harpreet: [00:35:56] Of occurring right is what you trying to Joe: [00:35:58] Get at? So [00:36:00] if you don't understand basic, I would say like a really good a really good example that I think hits home in today's world is study how bays is used to to determine false positive COVID tests, for example. Hmm. Or false negatives, either one. I think that's so false positives and false negatives, which expected base. It's a very good way to, I think, concretely learned the concept. And then naiveté is once you understand Bay's now, it's just a matter of applying that to multiple instances of of probabilities and classifications. So that's how I would do it. But to say a lot of the descriptions I've seen a I don't over the over the years. I don't really like how is described, really, because it makes it seem more complicated than it really is. So that's just me. Matt may have a different opinion. Harpreet: [00:36:50] Just think some Venn diagrams all day? Joe: [00:36:51] I guess so. Well, I mean, I think Harpreet: [00:36:54] Stepping Joe: [00:36:54] Back from naive Bayes to the broader use of base theorem in probability and research, one of the concerns is that sometimes Bayes theorem is used in research papers to sort of assert snake oil by using a really ridiculous prior that's not justified, and you have to watch out for that and say, Well, what happens if I tweak this prior? But you know, using the neighbor's classifier algorithm, that's more Harpreet: [00:37:14] Robust because you're doing iteration Joe: [00:37:16] In other processes that are meant to solve this problem. So you also find if you dig into this, there's religious camps of frequencies and evasions. Yeah. They like to have holy wars with each other. Vu Nguyen: [00:37:28] It's hilarious. I have read a fair bit about that. Harpreet: [00:37:32] Yeah. Harpreet: [00:37:34] But it looks like there is such thing as a Bayesian frequent test. Yes. Here's the interesting paper that you can read about it. Joe: [00:37:44] It sounds like an interesting. A schizophrenic or something. Is that heard in the chat? I want to read this. That's a great mental mental disorder. I don't believe you. I'm just kidding. Yeah. But yeah, to what Tom was just saying about the mathematical explanation. I mean, the Venn diagram is Harpreet: [00:37:58] Really just, you know, assuming two Joe: [00:37:59] Things [00:38:00] are true you're both in your both circles or you're in one circle or the other, and it's just counting at that point. Like the theorem itself just comes from counting Harpreet: [00:38:08] In a Venn Joe: [00:38:08] Diagram. That's it. I think he wants to be like studying peace or something. When he came up with this, I think it was something like, I think he was like, Was he a Thomas Bayes? He was some something like a priest or something, and he was I thought he was doing something as simple as just like counting something in garden. Or you're getting confused with Mendel Mendel. Ok, so Mendel, OK, yeah, I can't keep my people straight. Vu Nguyen: [00:38:31] So I think I think Bayes Harpreet: [00:38:33] Was a priest, but I don't think he was Vu Nguyen: [00:38:34] The peas guy. Harpreet: [00:38:35] Yeah, that was Mendel. Vu Nguyen: [00:38:37] Yeah. Actually, Carrot's guy, Harpreet: [00:38:42] Eric, got Gattis got stumped and further proof that you don't actually have to know every single thing about Data science to be a data scientist. We still look shit up sometimes and that figure it out, Harpreet: [00:38:55] But it helps. Thom: [00:38:57] No one can do it. Harpreet: [00:38:58] It helps. Harpreet: [00:38:58] Yes, right? Harpreet: [00:39:01] Does anybody else have a question? I mean, I got a question that, Oh, actually, Matt, go for it. I see your hands up. Vu Nguyen: [00:39:10] Yeah, so I've been getting confused with PCA like principal component analysis lately, I'm not very clear on how it works and how we do it, use it for dimensionality reduction. Kind of like Eric, does anyone have like a short explanation or something Harpreet: [00:39:26] That can or Vu Nguyen: [00:39:27] Boil it down or something? Harpreet: [00:39:29] Thomson is up. So let's go, Tom. Harpreet: [00:39:33] All right, this is going to Thom: [00:39:35] Be more like a fairy tale, I hope you don't mind. So do all the work you possibly Harpreet: [00:39:41] Can in original Thom: [00:39:43] Space. But when you're struggling with original space, there is a magical space called eigen space. And in that space, you would be hard pressed to not find that all your features have become magically decoupled. But please don't make the mistake [00:40:00] of thinking that because all the features that are in eigen space are decoupled, and you can figure out which of the features that are IPCA features that can be removed because they're eigenvalues are really small. Don't let that fool you into thinking that if you translated, those PCA features back to original space. That you've reduced your dimensionality, you've only reduced it in eigen space, which again, is this magical fairyland space. Harpreet: [00:40:33] But it the Thom: [00:40:35] The PCA is super powerful because it will Harpreet: [00:40:39] Decouple, it will remove Thom: [00:40:41] All co linearity. That's what I'm saying, but it doesn't mean you've Harpreet: [00:40:46] Eliminated original Thom: [00:40:47] Features. And if you ever read a blog post that says and you don't need to worry about the eigenvectors, they don't really tell us anything. Please don't believe that. Yeah, you need to use those eigenvectors because when you're your stakeholders, say, Oh, that's cool, you found this magic space. But what does that tell us about the original space? Use the eigenvectors to say, Well, this this PKA feature is composed of these original space features, and this PKA feature is composed of these. Oh wow, that gets complicated. Yeah, but it sure had a lot of good modeling benefits, so it just adds to the burden of what you've got to explain when you go to PC, Harpreet: [00:41:32] But it can sure Thom: [00:41:34] Save you in a clinch situation. I hope that help and let me stay on to answer any questions you Harpreet: [00:41:40] Have after that, but Thom: [00:41:42] That's kind of a fairy tale Harpreet: [00:41:44] Intro. Vu Nguyen: [00:41:46] Okay, so focus on the Harpreet: [00:41:47] Eigenvectors eigen space Vu Nguyen: [00:41:49] Instead of just instead of just like just running the PCA just randomly. Thom: [00:41:55] Just understand that when you go into the eigen space which you've applied PCA [00:42:00] and you've transferred, it's Harpreet: [00:42:01] Not the original Thom: [00:42:02] Space. It's a it's a new perspective completely. And it it takes explanatory energy to make it clear to any stakeholders it. It increases the burden of our Data storytelling about the machine learning development pipeline or the machine learning pipeline development. Excuse me. Harpreet: [00:42:28] Yeah. Please, it might be Joe: [00:42:30] Helpful to step back a little bit as well and just talk more broadly about the problem of linear algebra. What tends to happen Harpreet: [00:42:36] And that is, Joe: [00:42:37] Say, I have 80 features and I just toss them into my machine learning model. It turns out that a lot of them might Harpreet: [00:42:42] Be correlated with each other in various ways. So if you Joe: [00:42:47] If you take a bunch of measurements of various types of objects, then that you might have correlations between those different types of objects or measurements because really the objects all have the same shape and so you increase the length. It also increases the width, for example. So you're looking at cubes and taking measurements, Harpreet: [00:43:02] The different and then Joe: [00:43:03] Throwing all those in as parameters in your model. And so fundamentally, Harpreet: [00:43:06] What this whole Joe: [00:43:07] Class of techniques is designed to do is find Harpreet: [00:43:11] Is to get rid, sort Joe: [00:43:12] Of compress and remove that excess Data that's correlated in terms of linear algebra. A nice way to think about this is that you have suppose you have three dimensional space and you have a plane that's inside that three dimensional space. And so if you look at the coordinates, there are three coordinates for each point in that plane. So it looks like, oh, I have clearly have like three coordinates or three features that I care about. But really, it's just two dimensional behind the scenes. Harpreet: [00:43:35] And so in Joe: [00:43:35] Some sense, what Pisa is designed to do is find that plane for you, like find the subspace that really counts. It's more statistical than that because you're not looking for an absolute plane, you're looking at correlations and things. But fundamentally, the idea is to reduce Harpreet: [00:43:49] Out correlated features Joe: [00:43:51] Where possible. There are a couple of other important techniques like this from dimension reduction. If you've heard of manifold learning, masterful learning is a similar [00:44:00] idea, except Harpreet: [00:44:00] That you assume Joe: [00:44:01] Instead of your plane passing through the origin, you can have any kind of surface. It could be a sphere or something else, but it still exists. You've got a surface inside a higher dimensional Harpreet: [00:44:10] Space, and you're trying to Joe: [00:44:12] Find that the surface itself and kind of Harpreet: [00:44:14] Project down Joe: [00:44:15] Another one that's very popular in test design. So when they design the act Harpreet: [00:44:21] Or the Joe: [00:44:21] Sat, they use something called factor analysis. Once again, it's Harpreet: [00:44:24] Another dimension Joe: [00:44:26] Reduction technique, and I don't totally remember what the differences are. It seems like Harpreet: [00:44:31] Factor Joe: [00:44:31] Analysis is way less popular in Data science, but I don't remember the reasons for Harpreet: [00:44:35] That exactly. I'd have to go back and look them up. Joe: [00:44:37] So because it's not deep learning. Harpreet: [00:44:40] Well, true. Joe: [00:44:43] And the other way to think about is with the. So I mean, in Matt's example, if you had 80 features, right, and so you needed it narrowed down to three, you're trying to find the three that have the most variance. Right. And disregarding Harpreet: [00:44:54] The other Joe: [00:44:55] Maybe seventy six seventy seven that are mainly correlated. Harpreet: [00:45:00] So. Harpreet: [00:45:03] So do a coast step, then then Tom. Harpreet: [00:45:07] So I guess Matt and Tom and everyone Kausthub: [00:45:10] Else has been commenting specifically on the stakeholder side of this is when you're explaining it to someone who doesn't really enjoy getting into the linear algebra of it, it doesn't really enjoy getting into. Harpreet: [00:45:23] I mean, the word eigen maybe Kausthub: [00:45:25] Scares them a little bit, right? Is it is it reasonable to start talking about your eigen space features as effectively a proxy for their Harpreet: [00:45:36] Real world features? So you say, like, I mean, would it make sense Kausthub: [00:45:39] To explain to them that I'm taking all of your 80 to 90 different concerns that all of these different data points that are coming through? And I'm generating these simplified proxies for some of them. And then we do our actual clustering or our classification based on these proxies that inform us about your 80 to 90 features at a resolution that makes more sense [00:46:00] for the model. Harpreet: [00:46:01] This is how if I had to build a stakeholder who was asking me about would work, I would say, Like this look man, like I'm trying to build your machine learning model to answer your questions, solve the problem. But this is a ton of features. They're all kind of useful, but they're all useful in a lot of different ways. So what I did was I just kind of, you know, instead of having 80 features to deal with, like 80 columns Harpreet: [00:46:22] Like, that's too much for Harpreet: [00:46:23] Me to think about. Harpreet: [00:46:23] I just kind of compressed Harpreet: [00:46:24] It down to this space Harpreet: [00:46:25] Where I got like three or four. Harpreet: [00:46:27] It's much more manageable. Not only that, if we use this smaller space of features, it captures a lot of the information that that we would get if we had, like all of them. Harpreet: [00:46:36] So I just did that Harpreet: [00:46:37] Right down to this thing and. Made something from that, which is much easier to visualize and for us to, you know, kind of get answers quickly. Right. But that's how you Kausthub: [00:46:50] Naturally want to know about a specific feature and say, OK, so how does that affect this particular lever that I pull is there, you know, in sales or whatever? Harpreet: [00:47:00] It meant I'd be hard pressed to find a salesman that would actually care about that. Joe: [00:47:06] Salesforce's very bad ass for asking that question, so Harpreet: [00:47:10] I'll be like, damn, bro. You might want Harpreet: [00:47:11] To Joe: [00:47:12] Think you're on the wrong team. Vu Nguyen: [00:47:14] So I was going to mention stakeholders that actually asked specifically about PCAs. That's pretty impressive. That happens. I have an experience that Joe: [00:47:25] We had to get Aaron Hunsicker on here. He works for us now and he he has really good stories about working with stakeholders, so we'll have to have him on next time. Yeah, you're like, Yeah, he's cool. Harpreet: [00:47:37] Yeah, absolutely. Bring him on. Joe: [00:47:40] He also opened for the Wu-Tang Clan back in the day, so he's even Harpreet: [00:47:43] Cooler than you think so. Harpreet: [00:47:45] Hey. Thom: [00:47:48] I loved where Matt was going, and because he's giving the explanation, we would want to know as fellow data scientist. The explanation I was giving was trying to help a stakeholder understand [00:48:00] why we had to resort to PCA. Harpreet: [00:48:02] And I probably wouldn't Thom: [00:48:03] Even use the words PC necessarily, but I think it's important for us to all. Remember, when you first applied PCA, you are not at all reducing the number of dimensions. But once you get in magic space, I'm going to keep calling it that for a bit because now you have strength values called eigenvalues. You can realize, Oh, if I look at the if I pereda Harpreet: [00:48:31] Order by Thom: [00:48:32] Magnitude the eigenvalues and I do accumulation and I see I get up to these last ten and they're adding less than one percent to the value of the model of just dump those. But what you can't count on is just because you dump those. If you went back to your original space, Harpreet: [00:48:52] You'll find you have just as Thom: [00:48:54] Much co linearity or more. It's like Matt was saying that unique perspective that going into eigen space gives you, it's just looking at the whole. Hyperspace in a way that gets rid of all the linearity, which is magical, it's wonderful, but again, if you really want to get insights from what features are most important. Harpreet: [00:49:17] Well, you can't just say, Oh, this Thom: [00:49:19] Pca feature is more important, that won't really mean anything to the stakeholders, but if you say this most important feature that we have for the model is a combination of these now that's valuable. Harpreet: [00:49:36] Thank you very much. Very excellent discussion. Picture from a bunch of different angles. Matt Matt Blaze that was thank you for kicking that off any of the questions for follow up on that PC discussion. Harpreet: [00:49:54] Yeah. Like, I mean, Harpreet: [00:49:55] When it comes to stuff like Harpreet: [00:49:56] That, like for me, Harpreet: [00:49:57] It's good enough to kind of Harpreet: [00:49:58] Intuitively understand [00:50:00] or know Harpreet: [00:50:00] The explanation. But fuck man, if I was in this officer or myself and somebody asked me that question, I would not know how to answer. So thank you guys for being here and helping with that. Let's go to Mark. Mark. Mark has some. The distress. Vu Nguyen: [00:50:16] Yeah, I I don't really have a question for, say, a huge thank you to this whole group as like a big win at work and is based on all the feedback you all provided me essentially hearing from you all just like, hey, don't ask, just do for for certain things. That's essentially what I did, and also just reframing my hackathon into something positive. So essentially for my for my hackathon, you know, I've been asking engineering for the past year, like, hey, give me access to to create data Harpreet: [00:50:45] Sets within our system Vu Nguyen: [00:50:47] To give context. Our data is really complex. We have like our regular database system and then the homegrown kind of layer on top for additional security for things. And so it's not really straightforward. And so I basically just taught myself or my Hyperpop on Harpreet: [00:51:05] Just how did engage and create Vu Nguyen: [00:51:07] My own data sets for for for my project. And then when I presented that everyone was like, Oh my God, data science can do this. That's amazing. So the head of engineering approved my my pull request to get it merged, but also they finally gave me read and write access to create data sets, which I've been asking for for like a year. So I finally have this create our own datasets, which is, like, really cool. But more importantly, after like my pitch, I'm focusing on the positives from my hackathon. My manager was like, Yo isn't a top ticket item like spend the rest of the month just focusing only on this hackathon project and like, make it to the end. So that was just really cool. So shout out to all y'all's feedback and really redirected me to the right path to really have success. Harpreet: [00:51:53] And that's awesome, and that's that's so dope, man. I'm super, super excited for you and yeah, big shout out to everybody that that Harpreet: [00:51:59] Comes, [00:52:00] you know, every weekend, Harpreet: [00:52:01] Share their their wisdom, share their knowledge and just Harpreet: [00:52:04] Literally help and shape Harpreet: [00:52:05] People's careers, helping shape people's journey, trajectory and all that stuff. Every one of you guys that show up. I mean, it's been it's been a while. I think it's been like since you guys started come in. Everybody started coming. It was probably like mid-October of last year. So for a year straight, people have been coming on to this thing every Friday and just dropping amazing knowledge. Harpreet: [00:52:28] And and it's so awesome Harpreet: [00:52:30] To hear the positive effects of that. Congratulations, Mark. Looking forward to a, you know, some more good news from from you and everyone else. And all my friends, any of the questions. Joe: [00:52:46] But I have a question for people. Yeah, I mean, it's I'm curious, what are people using these days to learn data science Harpreet: [00:52:56] And machine learning? Harpreet: [00:52:59] What do you mean using in terms of like software? I mean, Joe: [00:53:03] Cereals, learning materials, Harpreet: [00:53:04] Courses, books, I Joe: [00:53:06] Mean, what what are people finding effective? What are people finding not effective? Harpreet: [00:53:11] That's a good question. I mean. But me personally. I like to read Harpreet: [00:53:17] About it and Harpreet: [00:53:19] Like the way I would mean it's probably not asking a question, but the way I learn about stuff is I'll read about it at a high level, find some hands on example, started working Harpreet: [00:53:26] Through it through it, see Harpreet: [00:53:28] Where I get stuck, where something is happening, and I'm like, What the fuck? Why is that happening? How is that happening? And then I'll go back to learning materials, try to dig deeper, try to get underlying concept stuff like that. But maybe it's mostly books. I like books. I'll probably like my process like this. I'll probably watch a few YouTube videos just like. Prime, the pump, I don't know if that's the right word, but it's like get get me in the mood for this particular topic. Watch a few videos to see. Okay, great. This is this is what let's say, a [00:54:00] recurrent neural network is great. I see it. I got it. Then I'll go to the books and then I'll start reading about it in books. I don't want to show you the books or go back and go through the examples in the books, and I'll drill deeper wherever I'm just like, Harpreet: [00:54:15] Unconvinced of Harpreet: [00:54:16] What's happening. I'd love to hear from other people. Kausthub: [00:54:21] I guess, like for me, I'm a I'm a I'm a conversational slash audio kind of learner, so I learn faster when I'm in a conversation like this one, I learned faster when I'm talking to someone one on one about something they're working on. I learned faster when I'm listening to something. So for me to generate that level Harpreet: [00:54:40] Of light, just Kausthub: [00:54:41] Interested in investment in a book to go in and learn about this chunk of stuff, right? Harpreet: [00:54:46] I kind Kausthub: [00:54:47] Of start from a YouTube video or from a Harpreet: [00:54:49] Udemy course because that's cheap and easy Kausthub: [00:54:51] To access, right? So you access the Udemy course like that? And then I watch it. I'm like, Okay, I get the high level. Now let's dig deeper and deeper, deeper. By then, I'm convinced that I need to go to read papers about that particular topic or go down and drill into a book on that topic. And it's just going to vary depending on what kind of a learner you are. Harpreet: [00:55:08] I find that those conversations Kausthub: [00:55:10] That I'm having at work, I've learned we have once in two weeks. We've got like a a paper read kind of session where we share, OK, we've read this paper, others, what we think about it and we discuss it right? I learn so much more in that session than I do in the like three to five hours of that week that I spent, you know, reading other Thom: [00:55:27] Papers that are out of my area. Harpreet: [00:55:28] But I'm actually Kausthub: [00:55:29] Curious as like a woman who's really invested in the education piece of it. Monica, what are your thoughts on? How do you identify how you best learn and how do you map that Thom: [00:55:41] The resources available? Vu Nguyen: [00:55:43] Oh, for sure. This is really weird, because I literally posted something on learning different types of learning styles yesterday. So depending on if your visual auditory hands on, you can go to different resources [00:56:00] and also depending on if you have a specific question versus if you're just generally curious about something. So if you have a specific question, I usually Google Number one resource to go to Stack Overflow is something super helpful as Harpreet: [00:56:16] Well or Vu Nguyen: [00:56:18] Any more smaller, like mini courses that are more scoped out that can answer that question Harpreet: [00:56:27] Versus if Vu Nguyen: [00:56:28] You're just generally curious, you don't know anything about a specific topic you can have. They have courses that are like specialty like data science, specialty course, which covers like nine different topics and all of that. I also Harpreet: [00:56:44] Like tutorial Vu Nguyen: [00:56:45] Like sandbox sites. W3 Schools Harpreet: [00:56:49] Mode is another Vu Nguyen: [00:56:51] One where you can just go in there and just start doing something. Breaking things I think is a really good way to learn as well to figure out how things are working on the back end. Harpreet: [00:57:06] Mark, go for it. Vu Nguyen: [00:57:09] So for for me, like how I kind of quickly try to learn things, especially for like machine learning or just Data in general, I try to start really high level. It's drilling really fast. And so for me, it's like a matter of repetition. So like I get a new concept, I try to find like, what's the intro five minute Harpreet: [00:57:27] Quick like, explain like Vu Nguyen: [00:57:29] I thought, like, I'm five explanation of a concept. Then I'll go find like a more in depth than maybe like a lecture or something like that. And then the Harpreet: [00:57:37] Next step Vu Nguyen: [00:57:38] I go find articles like towards data science, something like really high level or like Harpreet: [00:57:43] Analytics. Vu Nguyen: [00:57:44] Vidya, I'm butchering that name. But but essentially, that's the next step. And then from there I will go find textbooks. So now I have, like all the key terms, the words I go through, the textbooks and things like, it sounds like a lot, but I'm like, I'm watching things two times fast. I'm [00:58:00] skimming. The thing is not to like, really comprehend Harpreet: [00:58:02] Deeply, it's just to Vu Nguyen: [00:58:03] Get repetitions over and over again. So that way, I have a set of resources to know where to look for and then where. The real learning is is I try to implement project. Implementing a project is where I really learn the most. But doing all the steps ahead of time gives me a plethora of resources to go back to, but also like gives me a good foundation to move forward with. So I'm not just like randomly applying code from Stack Overflow, I can actually think through the problem Harpreet: [00:58:27] For it and then to Vu Nguyen: [00:58:28] Solidify it. And I'll do this every time because it's very intensive. I like creating tutorials, so like I have a few on GitHub tutorials and things to do. And like, teaching others and mentoring others is like where I really learn a lot because I like to share to explain to someone else. I actually have to learn and understand it and be prepared for questions. Harpreet: [00:58:48] I think that's where that conversational aspect that Coastal was mentioning really kicks in just having to talk about it and, you know, bounce ideas around. Harpreet: [00:58:57] Andrew got some great response Harpreet: [00:58:59] Here to for anybody else. Wants to answer this question? Let me know. Let's hear from Harpreet: [00:59:02] Andrew Andrew Troth. There's multiple Harpreet: [00:59:04] Andrews. That's what I refer to before everybody. Thom: [00:59:08] Yeah. Harpreet: [00:59:09] Just quickly Thom: [00:59:10] To some of the points, I think, to which Harpreet: [00:59:14] Mark just Thom: [00:59:15] Said whenever I have been asked to do a brown bag on a particular topic, Harpreet: [00:59:20] I mean, I'm learning a lot out of Thom: [00:59:22] Interest and Harpreet: [00:59:23] Casting a fairly wide net. But when Thom: [00:59:25] I have to then put that into a training resource for Harpreet: [00:59:29] A brown bag Thom: [00:59:30] Or even explain to our CEO or some of our engineers, that is when I really go down Harpreet: [00:59:37] The rabbit hole and I will come Thom: [00:59:39] Ready. I was doing one on NLP recently and I had history going back to like the 50 scrip target. I went down a very deep rabbit hole, but it was enjoyable. And so I think that with the combination of exploring various GitHub repositories, I've been very pleased with some of the training materials on oDesk. They open [01:00:00] data science conference, particularly some of the analytics. Harpreet: [01:00:05] They have some really good Thom: [01:00:06] Professors, actually. So I'm based in North Carolina. We have at North Carolina State University, the Institute for Advanced Harpreet: [01:00:11] Analytics guy by Thom: [01:00:12] The name of Mark Harpreet: [01:00:13] Barr. Thom: [01:00:14] He actually taught some of these courses Harpreet: [01:00:16] Really into Thom: [01:00:17] Some very interesting Harpreet: [01:00:21] Programs Thom: [01:00:21] For fraud analysis Harpreet: [01:00:22] Like I Thom: [01:00:23] Think it's Cade five eight six one is a destiny character. I forget which one. Harpreet: [01:00:29] And so those Harpreet: [01:00:30] Materials have been Harpreet: [01:00:31] Great. Thom: [01:00:32] I really like, I think, as some other folks have mentioned, kind of starting on YouTube, getting excited about things that build some momentum, Harpreet: [01:00:40] Then you Thom: [01:00:40] Can get into reading some of the materials, exploring GitHub repos and then sharing that with colleagues and getting conversations, getting other people excited about it, and you can feed off of that. So that's been that's been great on our end. Harpreet: [01:00:56] And then I will after this, if I Thom: [01:01:00] May ask if anyone has implemented any proper graph databases because I've been tasked on that recently. And so I'm curious if anybody views those in their data analytics. Harpreet: [01:01:13] Linda, thank you very much, graph databases I've seen. That was his name. David Knickerbocker, I haven't seen in a while, David Harpreet: [01:01:19] Appeared tuned in by chance. Harpreet: [01:01:21] Let me see you. Come, come, hang out. Let's see, there's a. A lot of new comments coming in. And Mark says Odessa is how he got his first Data science job, that conference holds a special place in his heart. Harpreet: [01:01:39] Speaking to conferences, Harpreet: [01:01:40] Don't forget to sign up for dedicated happening on October 5th. Harpreet: [01:01:43] Be sure to be there. Harpreet: [01:01:45] I'll be presenting a dedicated. I'll also be presenting at the ML conference on October Harpreet: [01:01:49] 15th, so hopefully Harpreet: [01:01:51] You guys get to tune into that one. We'll get your graph databases. Harpreet: [01:01:56] Question, Andrew. Harpreet: [01:01:58] But first, let's go back to mark. Marcus said [01:02:00] he had something interesting to share with us and I'd love to to see the which we're talking about the technical Harpreet: [01:02:06] Aspects of something. Harpreet: [01:02:07] Go for Mark. Vu Nguyen: [01:02:09] I definitely say I wasn't planning on having this question because I was literally coding this in the middle of this and trying to get some. I wasn't expecting my output to actually come out as quickly because it's like a million rows. But essentially, if you all remember, I'm working on a imbalanced classification problem for predicting neonatal death within 28 days. And this project, I'm kind of doing with my mentees together. And so we finally have output. Last week, we kind of started like, you know, here's just a plain random forest model. We haven't done anything special to it. It's going to be a bad model. I've now implemented smoke, which is a sampling method to account for the imbalance. And again applied the random forest model. No tuning yet. And so I got my output. I thought, maybe it might be interesting so you can actually see the graphs I have and the output. And just to get some feedback because I have some ideas for next steps, because it's still it's not at the level I want after doing smoke, which is expected because I haven't done any tuning. But before I go into tuning and whatnot, I'm just curious if there's anything more like simpler steps that I should consider. Like, you know, I can go down the rabbit hole and make it more technical, but it's always like, how can I simplify the problem? Or maybe simplify the data in a way that might make things better? I've already done some feature importance. I've already talked to stakeholders as well who are, like subject matter experts, a good Harpreet: [01:03:35] Idea on some important Vu Nguyen: [01:03:37] Values. But I was just curious one just to see output and then also have a cool conversation. Is that sound good to you all? Harpreet: [01:03:46] Recall that sounds that sounds like an October Data science. Yeah. Yes, yes. Somebody is AIs, right? Yeah. Vu Nguyen: [01:03:53] So so share my screen. Yeah. Can you all see my screen? Yes, absolutely. Awesome. So essentially [01:04:00] going back up. So here's the model without doing smoke. This is the training data. So of course, it's going to look nice. And then this is my validation after I split. So I have train, validate and test. So this is my validate set and so area under curve confusion matrix and then the various precision recall F1 scores. I'm going to get past this because again, that didn't count for imbalance. I implement smoked and as you can see, you know, before smoke, this was the the distribution of events very imbalance after smoke. Now they're both balanced for for the outcome variable. So I retrained it again. And so now here is my curve again. Training set sets of not really expecting much. So here is the random forest model. The area under curve increased for that. But interestingly enough, for my F1 score, it actually decrease compared to the other model, which is really interesting. And the reason why I'm choosing the F1 score is because because like health care, you know, there is a price for false positives and false negatives. False negatives will be the worst, but false positives be bad because like we say, like this, newborn is going to have at risk for dying. They go under like unnecessary procedures to put them at higher risk, right? So that's why I want the F1 score to kind of balance that. And so for me, I'm like, Wow, I'm really surprised that the F1 score went down after doing smoke, Harpreet: [01:05:41] And again, I haven't Vu Nguyen: [01:05:42] Done any tuning. But based on what you're seeing, I can Harpreet: [01:05:45] Show a specific other grass. If you like, Vu Nguyen: [01:05:48] You know what would be your next steps sneaking through this problem? You know, and I literally just discarded this, and it's got an output, so I haven't had a chance to think about it, but I thought like that would be great to talk to to figure out next steps. [01:06:00] Harpreet: [01:06:01] Yeah. Tom, let's go. Let's let's go to you and just step away for a quick second. But Tom, Vu Nguyen: [01:06:07] Go for almost stop sharing. Thom: [01:06:08] Yeah, just briefly. Great explanation, Marc, and I'm working on something related just for fun. And so I don't know if you saw my comment yet, but if you're trying xG boost, which isn't always the best, but it's frequently very good. It has a balancing thing in it. But regarding the metrics, I think we can over focus on F1, especially in medical stuff. To me, all is the bomb precision. Precision just means how tightly clustered are things really, how well clustered are they? But recall is actually the individual recalls for each case, that's your predictive accuracy when you get down to it. Vu Nguyen: [01:06:59] Can you briefly describe recall again? I always makes that and recall it's like I had to review it every single time. Thom: [01:07:05] In fact, I want to encourage everyone to just look through my recent post feed. I've been dealing with these things because I got frustrated myself like, wait, let's put it in in just conceptual terms. So when you think of recall, it's what we've predicted correctly Harpreet: [01:07:24] Over everything we would have Thom: [01:07:26] Predicted correctly if it was a perfect model. So the model we have for predictive accuracy Harpreet: [01:07:36] Over the model, the perfect Thom: [01:07:37] Model prediction, Harpreet: [01:07:39] Basically. Thom: [01:07:40] And but the F1 score is really just the harmonic mean, and I'll make that simpler. Vu Nguyen: [01:07:48] I'm actually reading your posts on the on that post. It was really good. Thank you. Thom: [01:07:53] Well, this is a little different when when you're talking F1 score, it's just the harmonic mean of [01:08:00] precision and recall. But I'm like, sometimes we make too big a deal out of precision. And so it's kind of good to break. You can use what's called the EF Beta Score, which helps you weight those against one another. But quite frankly speaking, I think what you really care about is the recall on each of those. And but Harpreet: [01:08:25] If you if you tell Thom: [01:08:26] When the precision, by the way, Harpreet: [01:08:28] Is is just saying Thom: [01:08:30] The number we got right over, the total number we predicted as positive. Vu Nguyen: [01:08:37] Well, that's all. Thom: [01:08:39] Yeah, go ahead, Greg. Vu Nguyen: [01:08:41] Sorry about that. I think for me, I think I empathize with what Mark is saying because I've been in that situation to where both recoil and precision are important and you have to resort to F1, for example, in the health care piece. False positive can hurt you as much as false negative. Of course, you're going to have one that has strong risk, stronger risk than the other. But it's kind of like hard. Like one of the explanation, I guess great theory I have for why F1 went down is because you gave the model more Data to look for for errors right to to to commit more errors. So you're committing more false positives, you're committing more false negative. I'm assuming that's why it went down, but Harpreet: [01:09:33] It's kind of Vu Nguyen: [01:09:33] A difficult thing to say, right? Once you know F1 is your best metric, how do you ensure that's what you need to go by? Or how do you ensure that? No is not recall, it should be precision. Thom: [01:09:49] So if you tell us exactly Harpreet: [01:09:51] What I'm struggling with, Thom: [01:09:53] Yeah, for each problem, you really need to take the time to understand what's most important. [01:10:00] And then once you Harpreet: [01:10:01] Do, once you Thom: [01:10:02] Look through the myriad of metrics that are available for confusion, matrices do take the time to really think about, Well, what does that Harpreet: [01:10:10] Really mean? For example, when Thom: [01:10:11] You look at the accuracy equation, it's kind of overwhelming looking at first until you realize, Harpreet: [01:10:18] Oh, it's just the number Thom: [01:10:19] Of correct predictions divided by all cases. But. The accuracy alone and accuracy to me is more important than F1 score. I'm not saying F1 score is not good. You just got to put it in the context of what is the real need of this particular classification problem. Kausthub: [01:10:42] But I mean, that would vary based on the specific problem, right? Harpreet: [01:10:45] Like so like, yes, I Kausthub: [01:10:47] Mean, at the moment, I'm doing a semantic segmentation like kind of task where I've got significant class and balance accuracy gets thrown because if you've got a multi class problem and you've got one like a real imbalance where there's like only 10 20 of your pixels on one particular class and everything else is Harpreet: [01:11:02] Another, you might get Kausthub: [01:11:03] 99 percent accuracy, but completely incorrectly, you know, classify those 10 20 Harpreet: [01:11:09] Like outlier pixels. Thom: [01:11:12] Yeah. What you said, cost of bit, it really depends like for a certain case, getting a false negative could be like life threatening. Whereas a false positive. Ok, you're going to get the treatment, even though you Harpreet: [01:11:31] Didn't need it, but to not Thom: [01:11:33] Get the treatment or the procedure when you really needed it. So you've been that model to reduce those false negatives as much Harpreet: [01:11:41] As possible that might Thom: [01:11:42] Actually lower your accuracy. But if you really lowered your false negatives? Awesome. Kausthub: [01:11:48] Right, and that's coming from the perspective of like a screener being an important tool in that way. Vu Nguyen: [01:11:56] Yeah. A quick question I have to is like, so the reason why you don't want to use [01:12:00] addresses because like in balance, you could just like, say, like everything's false, right? And we'll have a high accuracy because it's super imbalance, right? Well, after smoke, things are balanced now. So with accuracy, then be an OK metric for just the training data set. My my hunch is no, but I'm just trying to think through it. Thom: [01:12:20] It depends on your problem. It depends Harpreet: [01:12:23] On your on your Thom: [01:12:24] Goals. And I was about to say business goals, but yours aren't business goals. Yours are medical goals or something related. If you really got to define those goals with classification Vu Nguyen: [01:12:37] And then one more thing, just like thinking through kind of like getting this kind of goes back to the business use case, I'll share my screen real quick. So again, we have our true negative, which is good, our true positive and then our false negative false positive, which is a really bad. The argument I'm making is like thinking about the use case, like the clinical workflow. We could argue that we can just actually just throw out false positives completely because we say, like if we think that like. I'm actually I'll take a step back, I had an idea, but after I say it out loud, that doesn't make sense. Never mind. Yeah, because I would say by the time you figure out it's a false positive you've already administered, you know? Yeah, yeah. No. To me, right? Let's go to location. That's what matters. Harpreet: [01:13:26] Yeah, it might have some insight for us. Here you go for a man. Yeah, I was going to say like, I might have you tried like up scaling and down the sorry up up sampling and down sampling. And rather than Vu Nguyen: [01:13:40] One that is real quick for me. Harpreet: [01:13:42] Well, it's more like more advanced version of that like to balance the Data when you have like not 50 50 percent upscaling is basically use upscale Harpreet: [01:13:53] The minority to the levels to balance Harpreet: [01:13:56] And then the downscale, you downscale the majority to [01:14:00] the minority level. If it makes sense. Vu Nguyen: [01:14:04] Definitely, I think I actually wrote a note in here, I think that so I think I've heard that as like oversampling. So I said like the difference from oversampling is that attempts to recreate the variance seen within the Data set as mainly oversampling can result overfitting. I remember pulling that from an article, and I'm reusing code from like from like a year ago, so I don't know how true that is. But that was the reason being Harpreet: [01:14:31] Mark Vu Nguyen: [01:14:31] From a year ago said that oversampling led to overfitting, but Mark from a year ago also knew less so. I'm curious if that's I come across for other people. Harpreet: [01:14:42] Yeah, no, I'm saying that because I was working in this one customer churn problem, and I had the same issue because it's imbalanced, Data said. So I did try all three. I started with oversampling under sampling and smart. Surprisingly for my case, I got better results again, like that in terms of metrics from under sampling. So it depends like you have to try and then you can figure it out if it's working for you. Vu Nguyen: [01:15:11] So I think that's a really great point because I was asking like, what's a simpler thing than going straight to the future, Tony? And I think a simpler thing would be for me, like, actually go back to like the sampling method and like just like re copy the the kind of the the positives to balance that. So I really like that. Harpreet: [01:15:29] Yes. If I tell you, if I conceptually think Harpreet: [01:15:31] Of smoke, it's like Harpreet: [01:15:33] We have all these features and there's Harpreet: [01:15:35] Feature space, right Harpreet: [01:15:37] And dimensional space. And we draw a hybrid plane through this end dimensional space. And here Harpreet: [01:15:43] Is one of our samples that Harpreet: [01:15:45] We care about. Minority sample and what we're trying to do with smote is create a synthetic point that is very close and similar to that particular or general minority. Is. Harpreet: [01:16:00] Right. [01:16:00] I mean, Conceptualist, how I like to think of smoke. Harpreet: [01:16:04] I don't know if that's right or not, but. I'm wondering, like there must be some thing that implements it, must have some name, but. Would you be able to synthetically generate rows of Data Harpreet: [01:16:17] By looking Harpreet: [01:16:18] Only at the minority class and then for each feature of the minority class, each Harpreet: [01:16:23] Column say, OK, Harpreet: [01:16:25] Here is the Harpreet: [01:16:26] Closest distribution Harpreet: [01:16:27] For this particular. You know, column yamoah could be normal, it could be whatever. And then. When you populate a new row, just pick a random value from that distribution that you fit on that column and then do that for every single. Thom: [01:16:48] That is a worthy experiment, you're describing, Harpreet, that that's the curiosity, the experimentation that we have to do sometimes don't get completely locked into the current tools. So like if you own a great table saw you're going to have to build a jig every once in a while to make a certain cut. And that's what you're talking about building a new jig for four balancing classes. I don't know if anyone. Oh yeah, I've already said it. Sorry. But again, Mark, I think you're getting it. If you take the time to really figure out what's Harpreet: [01:17:22] Important to this Thom: [01:17:23] Particular classification problem, then it'll become apparent very quickly in very simple terms. Oh, I need to really reduce false negatives. And and if I increase my false positives, that's not as big a deal as decreasing my false negatives. I'm not saying that's what it is. I'm just saying in many cases that could be the goal. And I think we can get so lost in the confusion matrix Harpreet: [01:17:54] Metrics, and they get Thom: [01:17:55] Confusing and we lose insight on what the heck [01:18:00] we're really trying to do and that classification problem. Vu Nguyen: [01:18:03] Yeah. Wow. Now this conversation has been so, so helpful. I literally have like two hours. I'm meeting with my mentees to go over this. So speaking of like doing to teach and learn. Oh, so this is super helpful, y'all. Equip me with a lot of thoughts I can just bring to my mentor assistance to have a really good conversation with them about this. Harpreet: [01:18:26] Ron? Great question mark. Let's go to will consider Andrew's question the the final question if anybody has any insight into this. I think thematically it is around graph databases and I've seen I've seen a few people talk about graph databases. Eric looks like it's still here. Yes, I think you might be able to speak to that. I saw you post something about that. Andrew, go for it. Thom: [01:18:53] Yeah, it's just we've had some Harpreet: [01:18:56] Interest Thom: [01:18:57] From some of the folks that we work with on several things like different stages. The first is basically your enterprise knowledge graph, which is not Harpreet: [01:19:07] Really a data analytics Thom: [01:19:09] Question at the Harpreet: [01:19:09] First Thom: [01:19:10] Glance of it, but just kind of capturing some of the institutional relationships that a lot of the managers don't have Harpreet: [01:19:17] Ais on. But then the Thom: [01:19:19] Second part of that is to feed in some of Harpreet: [01:19:22] The institutional Thom: [01:19:23] Data and try and derive and infer certain relationships around some of the projects and essentially to kind of cut back on some of the research Harpreet: [01:19:32] Time that certain folks in the company Vu Nguyen: [01:19:35] Have been spending on. Harpreet: [01:19:37] Like looking at what we've done in the Vu Nguyen: [01:19:38] Past and then rewriting things that have Thom: [01:19:41] Already been written, essentially. So we've done some experimentation Harpreet: [01:19:47] Around adding these Thom: [01:19:50] Documents as entities also to the graph database, which kind of includes our more of our talent in human capital management component right now. But we're also looking at [01:20:00] some supply chain management logistics. And we'd also want to be running analytics over that. So I'm just Harpreet: [01:20:08] Curious if anyone Thom: [01:20:09] Has experience Harpreet: [01:20:10] With that. What has been Vu Nguyen: [01:20:11] The experience, what kind of Thom: [01:20:14] Product you've used, how successful it's Harpreet: [01:20:16] Been? What have been the pain Thom: [01:20:18] Points in deployment? Maybe. Harpreet: [01:20:22] Opening this up to anyone that's got any insight or wisdom to share with Andrew here. Vu Nguyen: [01:20:31] So I've never used a graph Harpreet: [01:20:33] Database and definitely never built a graph Vu Nguyen: [01:20:35] Database. Harpreet: [01:20:36] But I know that I've attended a Vu Nguyen: [01:20:39] Couple of things with Tiger Graph, and I don't know if you ever use Tiger Harpreet: [01:20:43] Graph or if you've watched or Vu Nguyen: [01:20:46] Seen any of the resources from Harpreet: [01:20:47] Neo4j, but Vu Nguyen: [01:20:49] Both of them Harpreet: [01:20:50] Are. I mean, I'm sure you've heard Vu Nguyen: [01:20:51] Neo4j if you do anything with Harpreet: [01:20:54] Graph database stuff, but like they have a lot of really Vu Nguyen: [01:20:57] Helpful and interesting resources that I've just found interesting, but otherwise my stuff's mostly just been doing my own little analyzes rather than building actual database. Thanks for that, yeah. Harpreet: [01:21:12] Yeah, we've been trying to Thom: [01:21:14] Migrate from Neo4j into the Amazon Neptune system, and it's a completely different animal too, so that's also introducing pay points. Harpreet: [01:21:24] So Andrew, do you follow do you follow a David Knickerbocker or are you familiar with him? Thom: [01:21:29] The names come up a number of times if you ask me what is tidal or where he's at, I Harpreet: [01:21:34] Could tell you, but the name's familiar. Harpreet: [01:21:36] Yeah, he's, you know, a member of the community. Let me, Harpreet: [01:21:38] I'll send you a Harpreet: [01:21:39] Link to his LinkedIn. Speaking of David, we haven't seen David LinkedIn forever, David lately where you been at man with you? Or go ahead and drop that link theory, I connect with David micro-budget, he's he's cool. You know, tell him that the artist of Data science century during their happy hour, that it was collective Harpreet: [01:21:59] Voice that said [01:22:00] that he Harpreet: [01:22:00] Is the go to guy and I'm sure he'll be able to help you. Eric also wants to share something and or let people know about something. So go for Eric. Harpreet: [01:22:10] Yeah. So I Vu Nguyen: [01:22:11] Dropped something in the chat here, Harpreet: [01:22:13] But I also shared on Vu Nguyen: [01:22:15] Linkedin as well. Harpreet: [01:22:15] We're hiring on my team. So I'm pretty excited Vu Nguyen: [01:22:18] About that because I didn't know we were going to Harpreet: [01:22:20] Be hiring until just earlier today. And so Vu Nguyen: [01:22:23] It's a basically it's Harpreet: [01:22:25] The same as my job looking Vu Nguyen: [01:22:27] For someone in strategic business analytics. Harpreet: [01:22:29] And so like Vu Nguyen: [01:22:31] I said, I'll drop the job description. But like. Kind of a real overview rather than job description, overview of what I do on a regular basis, it's like I work really closely with stakeholders. I use a crap ton of Harpreet: [01:22:43] Sql, Vu Nguyen: [01:22:44] Also a lot of tableau. There's definitely room for like R and Python. We have like. Usually we have like the data science team Harpreet: [01:22:51] That they do a lot of like the Vu Nguyen: [01:22:53] Model stuff. And so I'm doing very, very like Harpreet: [01:22:55] Business, business embedded Vu Nguyen: [01:22:57] And business oriented. And so the way that it works is you'll be involved with a one or two different verticals like I work with Harpreet: [01:23:04] The small business, small business Vu Nguyen: [01:23:06] Loans and then also like investments, products and so that this person would work with. I don't know, credit cards or deposits or mortgage refinance or something like that, right? And so it's a great way to get involved and learn about a specific area of the business really well and be involved Harpreet: [01:23:20] With the sales Vu Nguyen: [01:23:21] Team or marketing or product or some of all of it, you know, Harpreet: [01:23:25] With a B testing and talking about Vu Nguyen: [01:23:28] Just really how to grow and improve the business and focus on impact, that's actually one of the big things that really drew me towards lending. Harpreet: [01:23:34] Tree is like, everybody's Vu Nguyen: [01:23:36] Like, OK, like really focused on Harpreet: [01:23:38] Impact and anyway, work there for a few months now. I really like it. Good place to be. So, yeah, let Vu Nguyen: [01:23:43] Me know if you have any other questions, feel free to shoot me a message on LinkedIn or whatever. Harpreet: [01:23:47] Ok, thank you so much, yeah, I appreciate you spreading the word about that. Is there like a preferred method? He said LinkedIn should they message you with like a resume, should they, you know, have portfolio projects already like? Harpreet: [01:23:57] Yeah, yeah. Vu Nguyen: [01:23:58] So I'm not. Ok, [01:24:00] good. That was me too. So I am mostly just the messenger right now. And so the biggest thing would be if you have if you have Harpreet: [01:24:08] Questions or something like I said, feel free to shoot me a message. I mean, you can send your resume if Vu Nguyen: [01:24:13] You want to, but it will be more effective if Harpreet: [01:24:16] You message Vu Nguyen: [01:24:17] Me and say hey. But then also apply for the job because I can't apply for you. Yeah, be Harpreet: [01:24:22] Perfect. If you guys are listening on LinkedIn, those you catching this on Sunday, when the episode of at least has a podcast shout out to Eric, you guys probably already know how to get in touch with them. Mark, you were also doing a thing here. What's going on? Vu Nguyen: [01:24:38] Yeah, we just got more headcount, which is exciting. So we just put our position a couple of days ago of my company, Umoh essentially think about it as organizational psychology to drive behavior, change through our product and make work better, improve disperse habits at work. We're specifically looking this is a really cool role. It's like a senior analyst role for marketing and storytelling. So essentially, we want you to dove through our product, Data, our survey data and potentially some customer data bring that all together. Find some really interesting key highlights and tell a really compelling story. So we'll be working with the content team and the marketing team, but you'll be under the data science team. It's a really cool position. I can also share the link to the to the role. If you are interested in it, please reach out to me on LinkedIn as well and as U.S. space so. So of course, we can't have other people outside the country apply. Mark, thank Harpreet: [01:25:45] You so much. Speaking to Data storytelling, Brant dike's sent me a copy of this book. I've yet to take a picture and post it on LinkedIn, but it's a great book. Data storytelling. It's dope. We're actually going to be live on LinkedIn [01:26:00] on October 2nd, interviewing him LinkedIn live. We'll see a lot of me in the month of October. Not only am I doing the office hours every Friday, but you will see me live on LinkedIn doing interviews for the podcast with with Brant Dikes. Joe Reese, Harpreet: [01:26:19] Brittany Doe got Liana Harpreet: [01:26:23] Lou, Andrew Harpreet: [01:26:24] Jones. We got Harpreet: [01:26:26] The Data professor himself, Harpreet: [01:26:28] Mr. Chanin, I cannot Harpreet: [01:26:29] Say your last name Natalie Nixon, Harpreet: [01:26:34] Danny Mudd as Harpreet: [01:26:35] Well. We got the one and only Danny Moore finally coming on the podcast. I think Harpreet: [01:26:38] I'm officially cool Harpreet: [01:26:40] Enough for him to come on to my show. It's going to be awesome. So and then you'll see me presenting on Data. So October is going to see a lot of me on LinkedIn. Hopefully, you guys will all be Harpreet: [01:26:50] There and hanging out. Harpreet: [01:26:55] To check out the podcast episode that we released with Dennis will tales from a Data engineer dropping a lot of Data engineering knowledge. Don't forget that Sunday we've got the happy or rather office hour session with Comet. Those office hour sessions are going to be moving during the week. Obviously, like I work at Harpreet: [01:27:16] Comet now, so they don't want to Harpreet: [01:27:18] Sponsor the podcast Harpreet: [01:27:19] Anymore. Harpreet: [01:27:20] Make sense because you're paying me now to, like, work for you guys. But I'll be doing it in my normal work hours instead of like, you know, time when I should be kicking it with my, with my family. Harpreet: [01:27:30] So sometime during the week, Harpreet: [01:27:33] We'll be having the the comet ML office are still same link still broadcasted live on LinkedIn through my profile, but it will likely be Wednesday around 10:00 a.m. But once we settle on that, I'll let you know for sure. What other news I have to share. Harpreet: [01:27:52] Well, I mean. Harpreet: [01:27:54] My course is coming Harpreet: [01:27:55] Along just fine, you know, I always Harpreet: [01:27:57] Feel kind of sleazy shouting out my course, but I feel like I'm really [01:28:00] building something nice and I'm feeling like it's going to be beneficial and help a lot of people shout out to the people in this chat that put in so much time and effort to review it and provide me with that valuable feedback specifically. Harpreet: [01:28:12] Mr. Blazer. Harpreet: [01:28:14] Mark Freeman and Tom, thank you so much for for reviewing. They give me valuable feedback that I'm actioning Harpreet: [01:28:19] On, so Harpreet: [01:28:21] I'll probably be launching that course. Ready for the holiday season. It is going to be awesome in my eyes Harpreet: [01:28:31] And not Thom: [01:28:31] Sleazy, but other people need to know about it. Harpreet: [01:28:34] Thank you so much. I appreciate that. Yeah. Harpreet: [01:28:36] Yeah, you Harpreet: [01:28:37] Know, like I figure, you know. Look, obviously why, why, why create, of course, will create a course, Harpreet: [01:28:42] Obviously teach you Harpreet: [01:28:43] Guys awesome stuff Harpreet: [01:28:44] And hopefully spread my Harpreet: [01:28:46] Philosophy and ethos and the way I Harpreet: [01:28:48] Work out there. But I've spoken to literally Harpreet: [01:28:52] Thousands of aspiring Data scientists right through Data science dream job. Like, I've been doing this mentorship professionally for years, and I see a lot Harpreet: [01:29:02] Of issues Harpreet: [01:29:04] And a lot of problems with these aspiring Data scientists who are looking Harpreet: [01:29:08] To break into the field. But not only Harpreet: [01:29:10] That, like my mentees get jobs and start progressing and they start moving up and leveling up and they face Harpreet: [01:29:15] Challenges. So I have like taken Harpreet: [01:29:18] What I've learned from helping them and just kind of bundling it up into a course. So hopefully you guys enjoy it. Any last minute closing questions or comments? I feel like I killed enough time, if there was something, Harpreet: [01:29:33] Then I would have came through. Does not look like it. Harpreet: [01:29:35] Thank you so much, my friends, for hanging out. And making this an awesome Friday evening. Be sure to also join the Slack channel if you haven't already, you can see awesome projects like the one Harpreet: [01:29:45] Arpit just completed Harpreet: [01:29:47] Code and everything out there just for you to look at and see how he did Harpreet: [01:29:51] Stuff so he can inspire Harpreet: [01:29:52] Your own project. My friends, remember you've got one life on this planet. Why not try to do some big cheers, everyone?